eTRIMS - E-Training for Interpreting Images of Man-Made Scenes
 
eTRIMS
     
project results:

Towards Facade Structure Recognition

Jan Čech and Radim Šára
Center for Machine Perception
Czech Technical University in Prague
May 2008

In the world of rectified images of classical facades we work on a structural model that describes the images. We are looking for a computational procedure which segments the image to elementary units of a structural model, the primitives. The primitives must conform to global constraints which we call the structure. The constrains have some unknown common attributes which must be determined as a part of the segmentation procedure.

This result focuses on window detection. Windows tend to be aligned horizontally and vertically. The spacings between the columns or rows of a window array may be irregular and are unknown. The structure recognition task is then to find the maximum-probability aligned array of unknown attributes (common window width and height and spacing attributes per row/column).

To solve the task we have proposed an attributed 2D stochastic grammar. At this point, the production rules are designed manually.

The parsing procedure starts by AdaBoost classifier trained on exemplar window images. This produces what we call seed detections (see Figure below). This is done on all image scales, since it is not known what the window size is. Next, the seed detection is updated (its location slightly shifted) based on a maximum-likelihood procedure using parametric image likelihood model learned from a small set of exemplar window images. The likelihood model is currently independent of the AdaBoost classifier.

Next, the parser continues to grow a structural component from each updated seed detection. Part of that process is maximum-likelihood image search for a sub-structure (a single window or a row of windows) within an attribute range predicted by the current structural component. The same image likelihood model as for the seed update is used. Each elementary growing step concludes by updating the attributes of the whole structural component. We call the update process, or local attribute re-optimization, component shaking (the seed update discussed above is in fact performed by single-element component shake as well).

Each structural component, even partially grown, is always consistent with respect to the gramatical rules.

Finally, the structural component of maximum probability is selected for output. The current implementation of the parsing procedure assumes there is just a single starting symbol per image. Various generalizations are under development, including the possibility of multiple starting symbols, relaxing the need for rectification, a more complex structural model, and introducing appearance attributes.

seed detections maximum structural component seed detections maximum structural component
 
seed detections maximum structural component seed detections maximum structural component

References

[1] Jan Cech and Radim Sara. Modules for Structure Detection, eTRIMS deliverable D3.3. May 2008.

[2] Jan Cech and Radim Sara. Windowpane Detection based on Maximum Aposteriori Probability Labeling. In: Image Analysis - From Theory to Applications - Proceedings of the 12th Int Workshop on Combinatorial Image Analysis (IWCIA '08). April 2008