next up previous


Automatic Scale Selection as a Pre-Processing Stage
for Interpreting the Visual World 1

Tony Lindeberg
Department of Numerical Analysis and Computing Science
KTH, S-100 44 Stockholm, Sweden
tony@nada.kth.se, http://www.nada.kth.se/~tony


Date:

Abstract:

This paper reviews a systematic methodology for formulating mechanisms for automatic scale selection when performing feature detection. An important property of the proposed approach is that the notion of scale is included already in the definition of image features.

Introduction

Computer vision algorithms for interpreting image data usually involve a feature detection step. The need for performing early feature detection is usually motivated by the desire of condensing the rich intensity pattern to a more compact representation for further processing. If a proper abstraction of shape primitives can be computed, certain invariance properties can also be expected with respect to changes in view direction and illumination variations.

The earliest works in this direction were concerned with the edge detection (Prewitt, 1970; Roberts, 1965). While edge detection may at first to be a rather simple task, it was empirically observed that it can be very hard to extract edge descriptors reliably. Usually, this was explained as a noise sensitivity that could be reduced by pre-smoothing the image data before applying the edge detector (Torre & Poggio, 1980). Later, a deeper understanding was developed that these difficulties originate from the more fundamental aspect of image structure, namely that real-world objects (in contrast to idealized mathematical entities such as points and lines) usually consist of different types of structures at different scales (Koenderink, 1984; Witkin, 1983). Motivated by the multi-scale nature of real-world images, multi-scale representations such as pyramids (Burt & Adelson, 1983) and scale-space representation (Koenderink, 1984; Lindeberg, 1994; Witkin, 1983) were constructed. Theories were also formed concerning what types of image features should be extracted from any scale level in a multi-scale representation (Florack et al., 1992; Florack, 1997; Koenderink & van Doorn, 1992; Lindeberg, 1994).

The most common way of applying multi-scale representations in practice has been by selecting one or a few scale levels in advance, and then extracting image features at each scale level more or less independently. This approach can be sufficient under simplified conditions, where only a few natural scale levels are involved and provided that the image features a stable over large ranges of scales. Typically, this is the case when extracting edges of man-made objects viewed under controlled imaging conditions. In other cases, however, there may be a need for adapting scale levels individually to each image feature, or even to adapt the scale levels along an extended image feature, such as a connected edge. Typically, this occurs when detecting ridges (which turn out to be much more scale sensitive than edges) and when applying an edge detector to a diffuse edge for which the degree of diffuseness varies along the edge.

To handle these effects in general cases, we argue that it is natural to complement feature detection modules by explicit mechanisms for automatic scale selection, so as to automatically adapt the scale levels to the image features under study. The purpose of this article is to present such a framework for automatic scale selection, which is generally applicable to a rich variety of image features, and has been successfully tested by integration with other visual modules. For references to the original sources, see (Lindeberg, 1998a,b,1999) and the references therein.

An attractive property of the proposed scale selection mechanism is that in addition to automatic tuning of the scale parameter, it induces the computation of natural abstractions (groupings) of image shape. In this respect, the proposed methodology constitutes a natural pre-processing stage for subsequent interpretation of visual scenes.

The need for a scale-selection mechanism for feature detection

To demonstrate the need for an automatic scale selection mechanism, let us consider the problems of detecting edges and ridges, respectively, from image data. Figure 1 shows two images, from which scale-space representations have been computed by convolution with Gaussian kernels, i.e. given an image $ f \colon {
\mathbbR}^D \rightarrow {
\mathbbR}$, its scale-space representation $ L \colon {
\mathbbR}^D \times {
\mathbbR}_+ \rightarrow {
\mathbbR}$ is

$ L(x;\; t) = \int_{\xi \in {\mathbb R}^N} f(x - \xi) \, g(\xi) \, d\xi.$ (1)

where $ g \colon {
\mathbbR}^N \times {
\mathbbR}_+ \rightarrow {
\mathbbR}$ denotes the Gaussian kernel

$ g(x;\; t) = \frac{1}{(2 \pi \sigma^2)^{D/2}} \, e^{-(x_1^2 + \dots + x_D^2)/2t}$ (2)

and the variance $ t$ of this kernel is referred to as the scale parameter.

Edge detection.

At each scale level, edges are defined from points at which the gradient magnitude assumes a local maximum in the gradient direction (Canny, 1986; Korn, 1988). In terms of local directional derivatives, where $ \partial_v$ denotes a directional derivative in the gradient direction, this edge definition can be written

$ \left\{ \begin{array}{l} \tilde{L}_{vv} = L_v^2 \, L_{vv} = L_x^2 L_{xx} + 2 L...
..._x^2 L_y L_{xxy} + 3 L_x L_y^2 L_{xyy} + L_y^3 L_{yyy} < 0, \end{array} \right.$ (3)

Such edges at three scales are shown in the left column in figure 1. As can be seen, sharp edge structures corresponding to object boundaries give rise to edge curves at both fine and coarse scales. At fine scales, the localization of object edges is better, while the number of spurious edge responses is larger. Coarser scales are on the other hand necessary to capture the shadow edge, while the localization of e.g. the finger tip is poor at coarse scales.

Figure 1: Edges and bright ridges detected at scale levels $ t = 1.0$, $ 16.0$ and $ 256.0$, respectively.
\begin{figure}
\setlength{\unitlength}{1.2mm}
\begin{center}
\begin{picture}...
...hscale=67.2 vscale=67.2}}}
\end{picture} \end{center} \vspace{-2mm}
\end{figure}

Ridge detection.

The right column in figure 1 show corresponding results of multi-scale ridge extraction. A (bright) ridge point is defined as a point where the intensity assumes a local maximum in the main eigendirection of the Hessian matrix (Haralick, 1983; Koenderink & van Doorn, 1994). In terms of local $ (p, q)$-coordinates with the mixed directional derivative $ L_{pq} = 0$, this ridge definition can be written

$ \left\{ \begin{array}{rcl} L_{p} & = & 0, \\  L_{pp} & < & 0,\\  \vert L_{pp}\vert & \geq & \vert L_{qq}\vert, \end{array} \right.$        or        $ \left\{\vphantom{ \begin{array}{rcl} L_{q} & = & 0, \\  L_{qq} & < & 0, \\  \vert L_{qq}\vert & \geq & \vert L_{pp}\vert, \end{array} }\right.$$ \begin{array}{rcl} L_{q} & = & 0, \\  L_{qq} & < & 0, \\  \vert L_{qq}\vert & \geq & \vert L_{pp}\vert, \end{array}$ $ \left.\vphantom{ \begin{array}{rcl} L_{q} & = & 0, \\  L_{qq} & < & 0, \\  \vert L_{qq}\vert & \geq & \vert L_{pp}\vert, \end{array} }\right.$ (4)

while in terms of a local $ (u, v)$-system with the $ v$-direction parallel to the gradient direction and the $ u$-direction perpendicular, the ridge definition assumes the form

$ \left\{ \begin{array}{rcl} L_{uv} & = L_x L_y \, (L_{xx} - L_{yy}) - (L_x^2 - ...
...y^2 - L_x^2) \, (L_{xx} - L_{yy}) - 4 L_x L_y L_{xy} & > 0. \end{array} \right.$ (5)

As can be seen, the types of ridge curves that are obtained are strongly scale dependent. At fine scales, the ridge detector mainly responds to spurious noise structures. Then, it gives rise to ridge curves corresponding to the fingers at $ t = 16$, and a ridge curve corresponding to the arm as a whole at $ t = 256$. Notably, these ridge descriptors are much more sensitive to the choice of scale levels than the edge features in figure 1(a). In particular, no single scale level is appropriate for describing the dominant ridge structures in this image.

Proposed scale selection mechanism

The experimental results in figure 1 emphasize the need for adapting the scale levels for feature detection to the local image structures. How should such an adaptation be performed without a priori information about what image information is important? The subject of this section is to give an intuitive motivation of how size estimation can be performed, by studying the evolution properties over scales of scale-normalized derivatives. The basic idea is as follows: At any scale level, we define a normalized derivative operator by multiplying each spatial derivative operator $ \partial_x$ by the scale parameter $ t$ raised to a (so far free) parameter $ \gamma/2$:

$ \partial_{\xi} = t^{\gamma/2} \, \partial_{x}$ (6)

Then, we propose that automatic scale selection can be performed by detecting the scales at which $ \gamma $-normalized differential entities assume local maxima with respect to scale. Intuitively, this approach corresponds to selecting the scales at which the operator response is as strongest.

Local frequency estimation I:

For a sine wave

$ f(x) = \sin \omega_0 x.$ (7)

the scale-space representation is given by

$ L(x;\;t) = e^{- {\omega_0^2 t}/{2}} \sin \omega_0 x.$ (8)

and the amplitude of the $ m$th-order normalized derivative operator is

$ L_{\xi^m, max}(t) = t^{m \gamma/2} \, \omega_0^m \, e^{- {\omega_0^2 t}/{2}}.$ (9)

This function assumes a unique maximum over scales at

$ t_{max,L_{\xi^m}} = \frac{\gamma \, m}{\omega_0^2},$ (10)

implying that the corresponding $ \sigma$-value ( $ \sigma = \sqrt{t}$) is proportional to the wavelength $ \lambda = 2 \pi/\omega_0$ of the signal. In other words, the wavelength of the signal can be detected from the maximum over scales in the scale-space signature of the signal (see figure 2). In this respect, the scale selection approach has similar properties as a local Fourier analysis, with the difference that there is no need for explicitly determining a window size for computing the Fourier transform.

Figure 2: The amplitude of first-order normalized derivatives as function of scale for sinusoidal input signals of different frequency ( $ \omega _1 = 0.5$, $ \omega _2 = 1.0$ and $ \omega _3 = 2.0$).
\begin{figure}
\setlength{\unitlength}{0.8mm}
\begin{center}
\begin{picture}...
...\em$\omega_1$\ = 0.5}}
\end{picture} \end{center}\par
\vspace{-1mm}
\end{figure}

Scale invariance property of the scale selection mechanism.

If a local maximum over scales in the normalized differential expression $ {\cal D}_{\gamma-norm} L$ is detected at the position $ x_0$ and the scale $ t_0$ in the scale-space representation $ L$ of a signal $ f$, then for a signal $ f'$ rescaled by a scaling factor $ s$ such that

$ f(x) = f'(s x)$ (11)

the corresponding local maximum over scales is assumed at

$ (x_0', t_0') = (s x_0, s^2 t_0).$ (12)

This property shows that the selected scales follow any size variations in the image data, and this property holds for all homogeneous polynomial differential invariants (see (Lindeberg, 1998b)).

Necessity of the $ \gamma $-normalization.

In view of the abovementioned scale invariance result, one may ask the following. Imagine that we take the idea of performing local scale selection by local maximization of some sort of normalized derivatives (not specified yet). Moreover, let us impose the requirement that the scale levels selected by this scale selection mechanism should commute with size variations in the image domain according to equation (11) and (12). Then, what types of scale normalizations are possible? Interestingly, it can then be shown that the form of the $ \gamma $-normalized derivative normalization (6) arises by necessity, i.e., with the free parameter $ \gamma $ it spans all possible reasonable scale normalizations (see (Lindeberg, 1998b) for a proof).

Interpretation of normalized Gaussian derivative operators.

The idea is that the normalized scale-space derivatives will be used as a basis for expressing a large class of image operations, formulated in terms of normalized differential entities. Equivalently, such derivatives can be computed by applying $ \gamma $-normalized Gaussian derivative operators

$ g_{x_i^m, \gamma-norm}(x;\; t) = t^{m \, \gamma / 2} \partial_{x_i} \left( \frac{1}{(2 \pi \sigma^2)^{D/2}} \, e^{-(x_1^2 + \dots + x_D^2)/2t} \right)$ (13)

to the original $ D$-dimensional image. It is straightforward to show that the $ L_p$-norm of such a $ \gamma $-normalized Gaussian derivative kernel is

$ \Vert h_{\xi^m}(\cdot;\; t) \Vert _p = \sqrt{t}^{m(\gamma - 1) + D(1/p - 1)} \, \Vert h_{\xi^m}(\cdot;\; t) \Vert _p,$ (14)

which means that the $ \gamma $-normalized derivative concept can be interpreted as a normalization to constant $ L_p$-norm over scales, with $ p$ given by

$ p = \frac{1}{1 + \frac{m}{D} \, (1 - \gamma)}.$ (15)

The special case $ \gamma = 1$ corresponds to $ L_1$-normalization for all orders $ m$.

Another interesting interpretation can be made with respect to image data $ f \colon {
\mathbbR}^2 \rightarrow {
\mathbbR}$ having self-similar power spectra of the form

$ S_f(\omega) = S_f(\omega_1, \dots, \omega_D) = (\hat{f} \hat{f}^*)(\omega) = \vert\omega\vert^{-2\beta} = (\omega_1^2 + \dots + \omega_D^2)^{-\beta}.$ (16)

Let us consider the following class of energy measures, measuring the amount of information in the $ m$th order $ \gamma $-normalized Gaussian derivatives

$ E_m = \int_{x \in {\mathbb R}^D} \sum_{\vert\alpha\vert = m} t^{m \gamma} \, \vert L_{x^\alpha}\vert^2 \, dx.$ (17)

In the two-dimensional case, this class includes the following differential energy measures:

$\displaystyle E_0$ $\displaystyle = \int_{x \in {\mathbb R}^2} L(x;\; t)^2 \, dx,$
(18)
$\displaystyle E_1$ $\displaystyle = \int_{x \in {\mathbb R}^2} t^{\gamma} (L_{\xi}^2 + L_{\eta}^2) \, dx,$
(19)
$\displaystyle E_2$ $\displaystyle = \int_{x \in {\mathbb R}^2} t^{2 \gamma} (L_{\xi\xi}^2 + 2 L_{\xi\eta}^2 + L_{\eta\eta}^2) \, dx,$
(20)
$\displaystyle E_3$ $\displaystyle = \int_{x \in {\mathbb R}^2} t^{3 \gamma} (L_{\xi\xi\xi}^2 + 3 L_{\xi\xi\eta}^2 + 3 L_{\xi\eta\eta}^2 + L_{\eta\eta\eta}^2) \, dx.$
(21)

It can be shown that the variation over scales of these energy measures is given by

$ E_m(\cdot;\;t) \sim t^{\beta - D/2 - m(1 - \gamma)},$ (22)

and this expression is scale independent if and only if

$ \beta = \frac{D}{2} + m (1 - \gamma).$ (23)

Hence, the normalized derivative model is neutral with respect to power spectra of the form

$ S_f(\omega) = \vert\omega\vert^{- D - 2m (1 - \gamma)}.$ (24)

Empirical studies on natural images often show a qualitative behaviour similar to this (Field, 1987).

The automatic scale selection mechanism in operation

The results presented so far apply generally to a large class of image descriptors formulated in terms of differential entities derived from a multi-scale representation. The idea is that the differential entity $ {\cal D}$ used for automatic scale selection, together with its associated normalization parameter $ \gamma $ should be determined for the task at hand. In this section, we shall present several examples of how this scale selection mechanism can be expressed in practice for various types of feature detectors.

Edge detection.

Let us first turn to the problem of edge detection, using the differential definition of edges expressed in equation (3). A natural measure of edge strength that can be associated with this edge definition is given by normalized gradient magnitude

$ {\cal E}_{\gamma-norm} = t^{\gamma} L_{v}^2 = t^{\gamma} (L_x^2 + L_y^2).$ (25)

If we apply the edge definition (3) at all scales, we will sweep out an edge surface in scale-space. On this edge surface, we can define a scale-space edge as a curve where the edge strength measure assumes a local maximum over scales

$ \left\{ \begin{array}{l} \partial_t ({\cal E}_{\gamma-norm} L(x, y;\; t)) = 0,...
... < 0,\\  L_{vv}(x, y;\; t) = 0, \\  L_{vvv}(x, y;\; t) < 0. \end{array} \right.$ (26)

To determine the normalization parameter $ \gamma $, we can consider an idealized edge model in the form of a diffuse step edge

$ f(x, y) = \int_{x' = -\infty}^{x} h(x';\; t_0) \, dx',$ (27)

It is straightforward to show that the edge strength measure is maximized at

$ t_{{\cal E}_{\gamma-norm}} = \frac{\gamma}{1 - \gamma} \, t_0.$ (28)

If we require that this maximum is assumed at $ t_0$, implying that we use a similar derivative filter for detecting the edge as the shape of the differentiated edge, then we obtain $ \gamma = 1/2$.

Figure 3: The result of edge detection with automatic scale selection based on local maxima over scales of the first order edge strength measure $ {\cal E} L$ with $ \gamma = \frac {1}{2}$. The middle column shows all the scale-space edges, whereas the right column shows the 100 edge curves having the highest significance values. Image size: $ 256 \times 256$ pixels.
\begin{figure}
\setlength{\unitlength}{0.80mm}
\vspace{-3mm}
\begin{center...
...ge.num100.ps
hscale=68 vscale=68}}}
\end{picture}\end{center}\par
\end{figure}

Figure 4: Three-dimensional view of the 10 most significant scale-space edges extracted from the arm image. From the vertical dimension representing the selected scale measured in dimension length (in units of $ \sqrt {t}$), it can be seen how coarse scales are selected for the diffuse edge structures (due to illumination effects) and that finer scales are selected for the sharp edge structures (the object boundaries).
\begin{figure}
\setlength{\unitlength}{0.80mm}
\begin{center}
\begin{picture...
...few-scspedges.ps
hscale=48 vscale=48}}}
\end{picture} \end{center} \end{figure}

Figure 3 shows the results of detecting edges from two images in this way. The middle column shows all scale-space edges that satisfy the definition (26), while the right column shows the result of selecting the most significant edges by computing a significance measure as the integrated normalized edge strength measure along each connected edge curve

$ E(\Gamma) = \int_{(x;\; t) \in \Gamma} \sqrt{({\cal E} L)(x;\; t)} \, ds.$ (29)

Figure 4 shows a three-dimensional view of the 10 most significant scale-space edges from the hand image, with the selected scales illustrated by the height over the image plane. Observe that fine scales are selected for the edges corresponding to object boundaries. This result is consistent with the empirical finding that rather fine scales are usually appropriate for extracting object edges. For the shadow edges on the other hand, successively coarser scales are selected with increasing degree of diffuseness, in agreement with the analysis of the idealized edge model in (28).

Ridge detection.

Let us next turn to the problem of ridge detection, and sweep out a ridge surface in scale-space by applying the ridge definition (4) at all scales. Then, given the following ridge strength measure

$ {\cal R}_{norm} L = {\cal A}_{\gamma-norm} L = (L_{pp, \gamma-norm} - L_{qq, \gamma-norm})^2 = t^{2 \gamma} \, ((L_{xx} - L_{yy})^2 + 4 \, L_{xy}^2).$ (30)

which is the square difference between the eigenvalues $ L_{pp, \gamma-norm}$ and $ L_{qq, \gamma-norm}$ of the normalized Hessian matrix, let us define a scale-space ridge as a curve on the ridge surface where the normalized ridge strength measure assumes local maxima with respect to scale
  $\displaystyle \left\{
\begin{array}{l}
\partial_t ({\cal R}_{norm} L(x, y;\; t)...
...)) < 0, \\
L_{p}(x, y;\; t) = 0, \\
L_{pp}(x, y;\; t) < 0.
\end{array}\right.$   (31)

To determine the normalization parameter $ \gamma $, let us consider a Gaussian ridge

$ f(x, y) = g(x;\; t_0).$ (32)

The maximum over scales in $ {\cal R}_{\gamma-norm} L$ is assumed at

$ t_{{\cal R}_{\gamma-norm}} = \frac{2 \, \gamma}{3 - 2 \, \gamma} \, t_0,$ (33)

and by requiring this scale value to be equal to $ t_0$ (implying that a similar rotationally aligned Gaussian derivative filter is used for detecting the ridge as the shape of the second derivative of the Gaussian ridge) we obtain $ \gamma = 3/4$.

Figure 6: Alternative illustration of the five strongest scale-space ridges extracted from the image of the arm in figure 5. Each ridge is backprojected onto a dark copy of the original image as the union of a set of circles centered on the ridge curve with the radius proportional to the selected scale at that point.
Figure 5: The 100 and 10 strongest bright ridges respectively extracted using scale selection based on local maxima over scales of $ {\cal A}_{\gamma -norm}$ (with $ \gamma = \frac {3}{4}$). Image size: $ 128 \times 128$ pixels in the top row, and $ 140 \times 140$ pixels in the bottom row.
\begin{figure}
\setlength{\unitlength}{0.95mm}
\setlength{\unitlength}{0.80...
...gers.backproj.ps
hscale=72 vscale=72}}}
\end{picture} \end{center} \end{figure}

Figure 5 shows the result of applying such a ridge detector to an image of an arm and an aerial image of a suburb, respectively. The ridges have been ranked on significance, by integrating the normalized ridge strength measure along each connected ridge curve.

$ R(\Gamma) = \int_{(x;\; t) \in \Gamma} \sqrt{({\cal R} L)(x;\; t)} \, ds,$ (34)

Observe that descriptors corresponding to the roads are selected from the aerial image. Moreover, for the arm image, a coarse-scale descriptor is extracted for the arm as a whole, whereas the individual fingers give rise to ridge curves at finer scales.

Blob detection.

The Laplacian operator $ \nabla^2 L = L_{xx} + L_{yy}$ is a commonly used entity for blob detection, since it gives a strong response at the center of blob-like image structures. To formulate a blob detector with automatic scale selection, we can consider the points in scale-space at which the the square of the normalized Laplacian

$ \nabla_{norm}^2 L = t (L_{xx} + L_{yy})$ (35)

assumes maxima with respect to space and scale. Such points are referred to as scale-space maxima of $ (\nabla_{norm}^2 L)^2$.

Figure 8: Three-dimensional view of the 150 strongest scale-space maxima of the square of the normalized Laplacian of the Gaussian computed from the sunflower image.
Figure 7: Blob detection by detection of scale-space maxima of the normalized Laplacian operator: (a) Original image. (b) Circles representing the 250 scale-space maxima of $ (\nabla _{norm} L)^2$ having the strongest normalized response. (c) Circles overlayed on image.
\begin{figure}
\setlength{\unitlength}{0.80mm}
\begin{center}
\begin{picture...
...92-blobdet-3D.ps
hscale=36 vscale=36}}}
\end{picture} \end{center} \end{figure}

For a Gaussian blob model defined by

$ f(x, y) = g(x, y;\; t_0) = \frac{1}{2 \pi t_0} e^{-(x^2 + y^2)/2 t_0}$ (36)

it can be shown that the selected scale at the center of the blob is given by

$ \partial_t (\nabla^2_{norm} L)(0, 0;\;t) = 0 \quad \Longleftrightarrow \quad t_{\nabla^2 L} = t_0.$ (37)

Hence, the selected scale directly reflects the width $ t_0$ of the Gaussian blob.

Figures 7-8 show the result of applying this blob detector to an image of a sunflower field. In figure 7, each blob feature detected as a scale-space maximum is illustrated by a circle, with its radius proportional to the selected scale. Figure 8 shows a three-dimensional illustration of the same data set, by marking the scale-space extrema by spheres in scale-space. Observe how well the size variations in the image are captured by this structurally very simple operation.

Corner detection.

A commonly used technique for detecting junction candidates in grey-level images is to detect extrema in the curvature of level curves multiplied by the gradient magnitude raised to some power (Kitchen & Rosenfeld, 1982; Koenderink & Richards, 1988). A special choice is to multiply the level curve curvature by the gradient magnitude raised to the power of three. This leads to the differential invariant $ \tilde{\kappa} = L_{v}^2 L_{uu}$, with the corresponding normalized expression

$ \tilde{\kappa}_{norm} = t^{2 \gamma} L_{v}^2 L_{uu}.$ (38)

Figure 9 shows the result of detecting scale-space extrema from an image with corner structures at multiple scales. Observe that a coarse scale response is obtained for the large scale corner structure as a whole, whereas the superimposed corner structures of smaller size give rise to scale-space maxima at finer scales (see figure 10 for results on real-world data).

Figure: Three-dimensional view of scale-space maxima of $ \tilde{\kappa}_{norm}^2$ computed for a large scale corner with superimposed corner structures at finer scales.
\begin{figure}
\setlength{\unitlength}{0.6mm}
\begin{center}
\begin{picture}...
...sc-sel5/view5.ps
hscale=36 vscale=36}}}
\end{picture} \end{center} \end{figure}

Application to feature tracking and human computer interaction

We argue that a scale selection mechanism is an essential tool whenever our aim is to automatically interpret the image data that arise from observations of a dynamic world. For example, if we are tracking features in the image domain, then it is essential that the scale levels are adapted to the size variation that may occur over time.

Figure 10 shows a comparison between a feature tracker with automatic scale selection (Bretzner & Lindeberg, 1998a) and a corresponding feature tracker operating at fixed scales. (Both feature trackers are based on corner detection from local maxima of the corner strength measure (38), followed by a localization stage (Lindeberg, 1998b) and a multi-cue verification (Bretzner & Lindeberg, 1998a).) As can be seen from the results in figure 10, three out of the ten features are lost by the fixed scale feature tracker compared to the adaptive scale tracker.

Figure 10: Comparison between feature tracker with automatic scale selection and a feature tracker operating at fixed scale. The left column shows a set of corner features in the initial frame, and the right column gives a snapshot after 65 frames.
Initial frame with 14 detected corners Tracked features with automatic scale selection
[width=65mm]fig/tracking/phone-init.ps [width=65mm]fig/tracking/phone-comb-66.ps
  Tracked features using fixed scales
  [width=65mm]fig/tracking/phone-fix-66.ps

A brief explanation of this phenomenon is that if we use a standard algorithm for feature detection at a fixed scale followed by hypothesis evaluation using a fixed size window for correlation, then the feature tracker will after a few frames fail to detect some of the features. The reason why this occurs is simply the fact that the corner feature no longer exists at the predetermined scale. In practice, this usually occurs for blunt corners.

An attractive property of a feature detector with automatic scale selection is that it allows us to capture less distinct features than those that occur on man-made objects. Specifically, we have demonstrated how it makes it possible to capture features associated with human actions. Figure 11 illustrates one idea we have been working on in the area of visually guided human-computer-interaction. The idea is to have a camera that monitors the motion of a human hand. At each frame blob and ridge features are extracted corresponding to the fingers and the finger tips. Assuming rigidity, the motion of the image features allow us to estimate the three-dimensional rotation of the hand (Bretzner & Lindeberg, 1998b). These motion estimates can in turn be used for controlling other computerized equipment; thus serving as a ``3-D hand mouse'' (Lindeberg & Bretzner, 1998).

Figure 11: Illustration of the concept of a ``3-D hand mouse''. The idea is to monitor the motion of a human hand (here, via a set of tracked image features) and to use estimates of the hand motion for controlling other computerized equipment (here, the visualization of a cube).
Controlling hand motion Detected ridges and blobs Controlled object
[height=45mm]fig/tracking/hand-grey-2.ps [height=45mm]fig/tracking/hand-features-2.ps [height=45mm]fig/rotation/real/hand1-r106

Summary

We have presented a general framework for automatic scale selection as well as examples of how this scale selection mechanism can be integrated with other feature modules. The experiments demonstrate how abstractions of the image data can be computed in a conceptually very simple way, by analysing the behaviour of image features over scales (sometimes referred to as ``deep structure''). For applications in other areas as well as related works, see (Lindeberg, 1998a,b,1999) and (Almansa & Lindeberg, 1999; Wiltschi et al., 1998).

Bibliography

Almansa, A. & Lindeberg, T. (1999)
, Fingerprint enhancement by shape adaptation of scale-space operators with automatic scale-selection, Technical Report ISRN KTH/NA/P-99/01-SE, Dept. of Numerical Analysis and Computing Science, KTH, Stockholm, Sweden.
(Submitted).

Bretzner, L. & Lindeberg, T. (1998
a), `Feature tracking with automatic selection of spatial scales', Computer Vision and Image Understanding 71(3), 385-392.

Bretzner, L. & Lindeberg, T. (1998
b), Use your hand as a 3-D mouse or relative orientation from extended sequences of sparse point and line correspondances using the affine trifocal tensor, in H. Burkhardt & B. Neumann, eds, `Proc. 5th European Conference on Computer Vision', Vol. 1406 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Freiburg, Germany, pp. 141-157.

Burt, P. J. & Adelson, E. H. (1983)
, `The Laplacian pyramid as a compact image code', IEEE Trans. Communications 9:4, 532-540.

Canny, J. (1986),
`A computational approach to edge detection', IEEE Trans. Pattern Analysis and Machine Intell. 8(6), 679-698.

Field, D. J. (1987),
`Relations between the statistics of natural images and the response properties of cortical cells', J. of the Optical Society of America 4, 2379-2394.

Florack, L. M. J. (1997),
Image Structure, Series in Mathematical Imaging and Vision, Kluwer Academic Publishers, Dordrecht, Netherlands.

Florack, L. M. J., ter Haar Romeny, B. M., Koenderink, J. J. & Viergever, M. A. (1992),
`Scale and the differential structure of images', Image and Vision Computing 10(6), 376-388.

Haralick, R. M. (1983),
`Ridges and valleys in digital images', Computer Vision, Graphics, and Image Processing 22, 28-38.

Kitchen, L. & Rosenfeld, A. (1982)
, `Gray-level corner detection', Pattern Recognition Letters 1(2), 95-102.

Koenderink, J. J. (1984),
`The structure of images', Biological Cybernetics 50, 363-370.

Koenderink, J. J. & Richards, W. ( 1988),
`Two-dimensional curvature operators', J. of the Optical Society of America 5:7, 1136-1141.

Koenderink, J. J. & van Doorn, A. J. ( 1992),
`Generic neighborhood operators', IEEE Trans. Pattern Analysis and Machine Intell. 14(6), 597-605.

Koenderink, J. J. & van Doorn, A. J. ( 1994),
`Two-plus-one-dimensional differential geometry', Pattern Recognition Letters 15(5), 439-444.

Korn, A. F. (1988),
`Toward a symbolic representation of intensity changes in images', IEEE Trans. Pattern Analysis and Machine Intell. 10(5), 610-625.

Lindeberg, T. (1994),
Scale-Space Theory in Computer Vision, The Kluwer International Series in Engineering and Computer Science, Kluwer Academic Publishers, Dordrecht, Netherlands.

Lindeberg, T. (1998a),
`Edge detection and ridge detection with automatic scale selection', Int. J. of Computer Vision 30(2), 117-154.

Lindeberg, T. (1998b),
`Feature detection with automatic scale selection', Int. J. of Computer Vision 30(2), 77-116.

Lindeberg, T. (1999),
Principles for automatic scale selection, in B. J. (et al.), ed., `Handbook on Computer Vision and Applications', Academic Press, Boston, USA, pp. 239-274.

Lindeberg, T. & Bretzner, L. (1998)
, Förfarande och anordning för överföring av information genom rörelsedetektering, samt användning av anordningen.
Patent pending.

Prewitt, J. M. S. (1970),
Object enhancement and extraction, in A. Rosenfeld & B. S. Lipkin, eds, `Picture Processing and Psychophysics', Academic Press, New York, pp. 75-149.

Roberts, L. G. (1965),
Machine perception of three-dimensional solids, in J. T. T. et al., ed., `Optical and Electro-Optical Information Processing', MIT Press, Cambridge, Massachusetts, pp. 159-197.

Sporring, J., Nielsen, M., Florack, L. & Johansen, P., eds (1996),
Gaussian Scale-Space Theory: Proc. PhD School on Scale-Space Theory, Series in Mathematical Imaging and Vision, Kluwer Academic Publishers, Copenhagen, Denmark.

ter Haar Romeny, B., Florack, L., Koenderink, J. J. & Viergever, M., eds (1997),
Scale-Space Theory in Computer Vision: Proc. First Int. Conf. Scale-Space'97, Lecture Notes in Computer Science, Springer Verlag, New York, Utrecht, Netherlands.

Torre, V. & Poggio, T. A. (1980),
`On edge detection', IEEE Trans. Pattern Analysis and Machine Intell. 8(2), 147-163.

Wiltschi, K., Pinz, A. & Lindeberg, T. ( 1998),
An automatic assessment scheme for steel quality inspection, Technical Report ISRN KTH/NA/P-98/20-SE, Dept. of Numerical Analysis and Computing Science, KTH, Stockholm, Sweden.

Witkin, A. P. (1983),
Scale-space filtering, in `Proc. 8th Int. Joint Conf. Art. Intell.', Karlsruhe, West Germany, pp. 1019-1022.

About this document ...

Automatic Scale Selection as a Pre-Processing Stage
for Interpreting the Visual World 1

This document was generated using the LaTeX2HTML translator Version 98.1p7 (June 18th, 1998)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -split 1 fspipa

The translation was initiated by Tony Lindeberg on 1999-09-16


next up previous
Tony Lindeberg
1999-09-16