how to publish a TOG paper

In celebration of finally getting my work into a decent journal, I decided to have a look back to see how the paper developed over time.
City Architecture Generation ( Master’s Dissertation 2006 | pdf )
Way back in the past I was doing a degree at bristol, and wrote my dissertation on an interesting concept for architecture generation. This original document is very flawed, and the results quite rough, however the code (just about) worked. I seem to remember it being a few months work. I got a decent enough grade for it at the time, so was happy enough.
The important thing is it gave me enough of an insight into a new & useful technique. I went off into work in the real world for a few years, and thought about the project again every now and then. Then I decided to head back to academia, and start over again with a PhD…
The fun results are on page 75.

The Extrusion Skeleton for Procedural Architectural Modeling ( Eurographics 2009 | pdf ) rejected

For the 9 months or so, I messed around with a few systems, and finally got the straight-skeleton system working to the point where I was reasonably happy with it.
We decided a couple of weeks before the deadline to really go for the Eurographics submission. We made great progress in two weeks, but the reviewers rightly rejected it as being sloppy.

The summary we got back from CGF:
“The paper presents a simple interactive procedural method to create a variety of architectural shapes. The core of the method generalizes the existing ”straight skeleton” algorithm for computing roof shapes, by (a) allowing different slopes on each edge of the input polygon, and (b) by allowing edits on the polygon ”during” the algorithm. An easy to use interface is proposed.

The reviewers consider that this paper proposes a number of incremental but very useful additions to existing methods in the area. These additions are carefully selected and blend well together, the method is nicely integrated in a functional interactive systems and produces impressive results.

Unfortunately, the paper has major problems: the presentation is confusing, the algorithmic description is sloppy and contains several errors, and the method has not been carefully evaluated. Cleaned up, the reviewers think this paper could be useful to the community. However the required revisions are major and the paper cannot be considered for the Eurographics conference.”
Other comments:
“I definitely miss references to interactive sketching systems, such as SESAME and Sketch-up, and to have a discussion on this issues. Since the main focus of the paper is on interactive design, the real comparison is with interactive systems, not only grammar-based systems.”

“The proposed extensions to straight skeletons seem to be of limited novelty. Some of the extensions (like multiple footprints and the avoidance of ”angle restrictions”) appear to be done before. Other extensions (multiple wall profiles, plan edits and robust computation) might be novel. However, they are mostly straightforward and represent only a minor contribution for a Eurographics paper.”

“This is actually very good except where it isn’t.”

Interactive Architectural Modeling with Procedural Extrusions ( Siggraph Asia 2010 | pdf ) rejected
The next deadline we wanted to try for was Sigg-Asia. The big step here was getting it to work over a large input set to make the pretty first figure. We made an attempt to add some comparisons to existing modeling techniques. As ever I ran out of time trying to submit, and didn’t manage a video (the editor kept crashing at the last minute).
21 feet

In the reviews I took a lot of heat for the lack of an evaluation and the picture of my co-author hidden in the paper. But the thing that really killed it was us pretending that we had a perfect geometric algorithm when there were (rare) situations where it would fail. I think these are the original reviews.

Reviewer summary:


“1. The paper doesnt provides an algorithm in the proper sense of the word. You present a set of heuristics that works on a large set of models. This can be very useful from the modeling perspective as the user is in the loop to fix the degenerate results. Please try to make this clear early on, and in the review cycle we want to agree on this change in exposition style.
2. In computational geometry the stress is to present provable algorithms. Hence, in many cases the algorithms end up complicated to implement to address various degenerate inputs. This is not the case for the modeling scenario where the user can update the inputs in case of failures. Again this distinction should be clear in the paper.
3. Please remove figure 25, it weakens the paper.
4. Please perform a proper evaluation. Take a large number 50-100 floor plans from some publicly available repository, and use your framework to come up with buildings there. It will be excellent if you can use google streetview or like to take existing building profiles and try to replicate them. In your rebuttal you already mentioned about cities where such complicated roof structures are common. We feel the work will have an important effect in this area, so we want to see this in a form that is easy to understand and judge.”

Other comments:

You need to be very careful about using a single word or term for each concept: you can’t have a priority queue that becomes a priority list; you can’t have a wall-angle that becomes a weight, and so on. “

“I’ve suggested that the introduction should discuss an experiment that tests the generality of your system. Providing this support for your claim of a good working system — support with solid factual evidence — would be the best service you could possibly perform. The reviewers agreed that the way to do this is to take a collection of house-plans (for example, go to XXX.com”

The algorithm is interesting, but the scope is rather restricted. While it is true slanted roof structures and similar extrusions (figure 22) make the models realistic, but it is unclear if such a specialized and general algorithm is required. It may be sufficient to have a template based model. “

“Overall this paper presents a powerful system able to generate complex building geometry that can be edited interactively. My concerns are with the amount of manual effort that is needed and with how well all (most?) special cases are handled. The former is empirically shown to be reasonable – ok. The latter is more unclear.”

“The main problem with this paper is in the GOAL — not of the work, but of the paper itself.”

(I apologize for any typos or unclearness in these notes; I have wrist problems which make me not want to take the time to do any more editing than necessary.)”**

Interactive Architectural Modeling with Procedural Extrusions ( Transactions on Graphics 2011 | pdf ) accepted
We got an offer to publish our work in TOG after the Siggraph rejection, if I made the suggested corrections. This meant that I spent a lot of time on evaluation (playing with our application) – however when I was done I had a much better idea of the weaknesses and strengths of the system. I followed the suggestions of one of the previous reviews to use an online library of floor plans. In hindsight this was probably a mistake because:
  1. Online libraries only seem to exist for plans without locations – and without interesting boundary constraints, many features of the project weren’t appreciated.
  2. The online library didn’t give us copyright permission to also show the library of plan/images we used for the modeling process. The reviewers were able to see it, but we weren’t able to publish them. Available here (40Mb).
There was also a lot of effort done to tighten up the technical writing in the middle of the paper. One challenge was to choose an appropriate depth of explanation – too deep and you end up explaining too much, too shallow and you invite many additional questions.
The other challenge was keeping the terminology consistent throughout(!). This becomes non-trivial after so many paper drafts, and the fact the implementation (the source code) uses an entirely different set of terminology.

Of course, there are always new problems that come up with the concepts when you keep digging, one is detailed in a previous blog post.

Just for reference, here are the edits made by the ACM copy editors before the final article was drawn up. Apparently paragraphs should start on the same line, and never use “Fig. X” in the body of text, always “Figure X”:

Finally I would like to thank the many reviewers who helped us scrape this paper together. Especially the few guys that stood up for this work! It’s been an exercise in learning how these things are really done, and how they turn out under time pressure. Hopefully the next one will be a little easier on us all šŸ™‚

** – never say this in anonymous reviews, it leads to whoever-you-are being known as “wrist guy” for the months while we redraft the paper.
Posted in phd

Recursive weighted skeleton

Given the weighted straight skeleton, and two features

  • The ability to change the speed of an edge at a certain time (height/offset depending on your model) (lets call this an edge direction event)
  • The ability to add edges on a particular edge at a certain time (lets call this a plan edit)

We can devise a language that allows the plan edits after a sequence of edge direction events. The question is – can this language be recursive? Turns out the answer is yes:

Here’s another more discrete variant.
This is interesting because of it’s similarity to two different types of automata – fractals (such as the Koch snowflake) and cellular automata (such as the game of life). Because the speed of edges can be forward or backward we can create edge direction events to model growth and decay – this is somewhat like the game of life. And because our language allows recursion we can model fractal shapes…
I get the feeling that system has the persistence and transmission of information to create high (Wolfram) classes of behaviour according to Wolfram. Will need a better implementation to go hunting for them…

ssd’s and ubuntu

So I got fed up with waiting for my new computer to boot or java to find the hundreds of files it needs, I decided to blow my spoils from a recent conference victory on a SSD hard disk for my desktop machine.

First surprise – it came in a flat envelope – thought it was a mistake at first, and they’d sent me an OEM cd, but sure enough it was an Intel 80Gb (SSDSA2MH080G2C1 to be precise…) . These things are really small, packaged in not much and must be quite robust.
Second surprise – well not really surprise, the speed boost is pretty noticeable. Overclocking takes second place to getting one of these on your development machine – top video is with my old HDD (320Gb WD Caviar drive), bottom with the Intel SSD.

So running ubuntu, there are two things I discovered people talked about – erase block alignment and trim support. Because I’ll forget what I did this time, I’ll note it in this blog, however I suspect that by the time anyone uses this, there will be a better guide available.
Erase block alignment
As explained here, when a SSD deletes data, but nuking entire erase blocks. On my intel disk disc drive the erase block size is 128K (128 * 2^10 bytes, different drives, such as the OCZ use different e/block sizes), so we want the sectors of the drive to align to this boundary. I used the instructions here to use the live Ubuntu CD and fdisk to ensure this was the case.
Then I reinstalled ubuntu and booted her up.
Trim support

Trim is an operating system function that tells a drive when pages are no longer in use, and can be handed back to the SSD’s page-allocation system. But the default version of Ubuntu (currently lucid/10.04) uses kernel version 2.6.32, so there’s no trim support. The instructions here give a painless way to upgrade the kernel to 2.6.34, once ubuntu is installed.

exploring frequency responses for focus

In the name of badly rushed science forĀ conference deadlinesĀ I present the accumulation of a couple of weekā€™s evenings messing around learning about frequency space (pdf).

[tl;dr] This paper got accepted, and I ā€œpresentedā€ it at theĀ Sicsa 2010Ā conference. Hereā€™s theĀ video:

While Iā€™d always seen the little bars on music amplifiers, Iā€™d never thought of images being represented in the same way. The bars represent the frequencies being played back at any one time. The low frequencies (slower moving bars, normally to the right) are the deep sounds, and the high frequencies (fast bars on the left) are the high sounds. It turns out they have a nice analogue in the image plane, but because we donā€™t look at at every pixel in a photo in order from start to end over 3 minutes we never see them.

If we identify the important areas of an image for each frequency (DoG pyramid/ā€monolithā€), we can animate over the frequency (high frequencies first, then the low ones):

We can then see that a single point in the image has different intensities at different frequencies, as the shade of grey at a point changes. So thereā€™s one of the little bar-graphs for each pixel.

I built a little application that lets you see these graphs and the spatial frequencies in an image. Itā€™s quite fun to play with, you can start it by clicking here (java webstart, binary file, run at your own risk, source code below). Wait for it to load the preview images, select one, wait for it to build the map (lots of waiting hereā€¦) and then use the right-drag to move around, wheel to scroll, and the left button to show a frequency preview for a particular location. Move the mouse out of the frame to see the representative frequency map from the work.

As you drag the point around the lil bars change to show you what the frequency content is like in that area of the image.

This was neat, and I had to do something with it, so I built a hoodicky that takes a single image and recovers the depth map from the focus of a single image. I assume that stuff in focus is near the camera, and stuff out of focus is a long way away ā€“ photography just like your Mum used to do. It turns out that not too much work has been done in this region, these guys even got a book article out about it last year.

So just to late to be hot newsā€¦ but interesting none-the-less. So I decided to twist the concept a bit and use it for blur aware image resizing. The motivation being that you need big (expensive) lenses to take photos with a shallow Depth of Field, but when you resize these images, you loose that depth of field:

In the smaller images, more of the logo appears to be in focus, but itā€™s the same image, just scaled a bit.

So we want something that keeps that proportion of each frequency the same as you scale the image. So basically itā€™s a thing that keeps foreground/background separation when scaling an image. We can use the focus of an image (closely connected to itā€™s frequencies) to determine the depth, as the video shows.

In the following, a, b and d are normal scaling, while c & e use the depth map weā€™ve calculated using the frequency map.

This uses the frequencies on the edges of the photo to classify the segmented image. This shows the same kind of thing at the top, and the segment-frequency map bottom left, and the recovered depth map bottom right.

More results (SICSA are the people who pay half my rentā€¦):

Results are better than I expected for the few days I spent putting this thing together, but the basic problem is there are two or three parameters that need to be tuned for each image to take into account the noise of the image and the bias towards foreground/background classification. Good working prototype, lots of effort required to do this for real.

The write up looks like this (pdf, src below), Iā€™m a bit certain some of the equations are wrong ā€“ but this is computer science, no one actually reads equations do they?

Source code is here. Java. I think it uses Apache licensed stuff, that thatā€™s how it should be treated, but I scrounged a fair bit of code, who knows where itā€™s fromā€¦ Main class should be FrequencyMap, and after setting up an image in that class youā€™ll need to play around with the four constants at the top:

  • fMapThreshhold: Increase if not enough red (high freq) in the freq map, decrease if noise is being counted as red
  • scaleFac: usually okay to leave alone. If you want a frequency map with more levels, decrease this, keeping it above one.
  • filterRadius: noise reduction on edge classification. Increase to classify more edges as higher frequency
  • Q: increase to increase the number of segmentations.

It will write files out to the disk. Use at your own risk.[edit: 22/3/10]

While out riding this weekend I figured out that it should be possible to analyse the defocused edges for evidence of higher frequencies to determine if the edge is in front of, or behind the in-focus point. More importantly if we canā€™t see any high frequency edges in the bokeh, then it doesnā€™t matter if that part of the edge is in front of, or behind the defocused edgeā€¦

{edit:} woo! Itā€™s been accepted (probably because itā€™s one of the few graphics papers from Scotland).

{edit: 22/3/10} While out riding this weekend I figured out that it should be possible to analyse the defocused edges for evidence of higher frequencies to determine if the edge is in front of, or behind the in-focus point. More importantly if we can’t see any high frequency edges in the bokeh, then it doesn’t matter if that part of the edge is in front of, or behind the defocused edge…

Posted in phd

mudslingin’

The other project I’d forgotten about was Mudslingin’, the second year undergrad (5 year old), 5 person “group” project. It’s a two player, network enabled game in which you try to drown your opponent by throwing balloons full of mud or water at them. It contained a very rough fluid simulator! Your opponent has bombs to blow up your dams – great fun for the whole family. It has quite a lot of features now I look back on it, even it it was rough around the edges.
Ā 
Ā 

There are lots of bugs, many to do with a 40 minute conversion into the webstart format. Sound, save and networking probably won’t work in the webstart edition. The options menu seems to freeze everything. You also need another person to play against! Keys:

Menus: Up, Down, Enter
Player 1: Up, Down, Left, Right
Player 2: K,L,O,M
Quit: Escape

Go “start game” then pick one of the levels like ‘blood fountain’.

Ā 

Files are here. Github. To run the jar:

java -jar Mudsling.jar

Spot the (bizarre) perl server! I think we came top of the year, but my memory is hazy… Now this is published, I can pack my computer up so it can go into storage.

marvin the robot

I realised that some of my old projects never got put back onto the web, like they deserve. So presenting my UCSC CMPS160 final project marvin-the-robot

It was made in C++, with a meg/sec memory leak to show for its troubles (this was a work of graphics not engineering). The windowing code came from the fltk system. The only bit of code that might be useful to someone is the downhill iterative IK solver that’s in there, somewhere.

Controls:

 

  • R – change camera mode
  • 1..6 – ask marvin to do some actions
  • < > – move camera forward / back
  • [ ] – move camera in / out

Is an IK example of robot riding around a small moon endlessly. Was written in C++, source is here use it under the normal WTFPL.

The binaries for OS X are here. If anyone builds it, it’d be great to have some binaries for other OSes, also if you’ve got a mac powerful enough to run the app and do a screen capture at the same time, a youtube video would make me very happy šŸ™‚

stormr 0.2

What is it?

It creates videos of a guy presenting your flickr image stream like this:

It reads out the name and the descriptions and points out any notes added to the images.

How to I use it?

  1. Install and run moviestorm (400Mb download) to check it works. You have to sign up on the website to get it started.
  2. Upload any pictures you want to flickr (you might have to sign up with flickr)
  3. Find the Stormr source code here (sorry no binary any more).
  4. Enter your flickr username, select the set you want, then follow the instructions to start moviestorm, and load the newly created movie.
  5. At this point you can play with the presentation in moviestorm. To render it to video click on this button

then this button
Leave moviestorm running full screen while it renders, then click on one of the options near the bottom of the screen to view the movie or upload it to ta interwebs. (Click the blue underlined link to show the file location on disk).

Credits:

  • flickrj for a great flickr bridge
  • piddles for the audio
  • Matthias Pfisterer for some audio concatenation stuff

Bugs

  • Deal with html links/ markup looking funny
  • Fix the subtitles so they can’t be white on white.
  • Fix the compression artefacts on the images
  • Large movie’s (30+ image sets) will be usable and maybe crash Stormr
  • Doesn’t work on Java 1.5

Development notes:

I’ve taken the philosophy that it should work with the base pack and be fairly simple. I might put some fancy camerawork in, but I leave using assets from other addon packs to the user.

It’s an example of a bridge application – pulling the printed image into the video age. There’s a lot of other content these things could be built for very easily.

Flickr note’s are public – anyone can add one, so becareful what ends up in your video… šŸ˜‰ You can turn this feature off in flickr’s settings.

Here are some future ideas:

  • Add crowds to read out comments, and point to areas of the image they’ve commented on.
  • Add some indicator of how many views each image has
  • Add a license (Copyright, Creative commons etc…) license management system. Perhaps stormr should old work on CC derivative works allowed (or laxer) input images?
  • Add a text to speach system (although these always sound pants?)
  • Port it to the cloud and allow it to download a zipped movie to your computer.
  • Find a way to add web links to the video file, even if someone’s got a patent on it.
  • Figure out a way to get out of that damn volcano that moviestorm land is in…

Things I can dream about:

  • Having more content in the base pack to work with – just some background noises that aren’t outside would make a difference.
  • Shortfuze providing a service to render videos for me

Same video on different sites fyi

More development videos:

lighting now in (will be available ~Friday):

update: Imitation is the greatest form of flattery! Was great to see the Moviestorm team take Stormr in-house:

strandbeest

Strandbeest II: redux. Now with nearly enough legs to stand upright. It doesn’t have enough power to move by itself – I’ll wait for the next SHDC & Simon’s Mindstorms nxt for that…