Automatic Scale Selection as a Pre-Processing Stage
for Interpreting the Visual World
|Department of Numerical Analysis and Computing Science
|KTH, S-100 44 Stockholm, Sweden
This paper reviews a systematic methodology for formulating
mechanisms for automatic scale selection when performing feature detection.
An important property of the proposed approach is that the notion of
scale is included already in the definition of image features.
Computer vision algorithms for interpreting image data usually
involve a feature detection step.
The need for performing early feature detection is usually motivated
by the desire of condensing the rich intensity pattern to a more
compact representation for further processing.
If a proper abstraction of shape primitives can be computed,
certain invariance properties can also be expected with respect to
changes in view direction and illumination variations.
The earliest works in this direction were concerned with the
edge detection (Prewitt, 1970; Roberts, 1965).
While edge detection may at first to be a rather simple task,
it was empirically observed that it can be very hard to extract
edge descriptors reliably.
Usually, this was explained as a noise sensitivity that could be reduced
by pre-smoothing the image data before applying the edge detector
(Torre & Poggio, 1980).
Later, a deeper understanding was developed that these difficulties
originate from the
more fundamental aspect of image structure, namely that real-world objects
(in contrast to idealized mathematical entities such as points and lines)
usually consist of different types of structures at different
scales (Koenderink, 1984; Witkin, 1983).
Motivated by the multi-scale nature of real-world images,
multi-scale representations such as pyramids (Burt & Adelson, 1983)
and scale-space representation (Koenderink, 1984; Lindeberg, 1994; Witkin, 1983)
Theories were also formed concerning what types of image
features should be extracted from any scale level in a multi-scale
representation (Florack et al., 1992; Florack, 1997; Koenderink & van Doorn, 1992; Lindeberg, 1994).
The most common way of applying multi-scale representations in practice
has been by selecting one or a few scale levels in advance,
and then extracting image features at each scale level more or less independently.
This approach can be sufficient under simplified conditions,
where only a few natural scale levels are involved and
provided that the image features a stable over large ranges of scales.
Typically, this is the case when extracting edges of man-made objects
viewed under controlled imaging conditions.
In other cases, however, there may be a need for adapting scale levels
individually to each image feature, or even to adapt the scale levels
along an extended image feature, such as a connected edge.
Typically, this occurs when detecting ridges (which turn out to
be much more scale sensitive than edges) and when applying an
edge detector to a diffuse edge for which the degree of diffuseness
varies along the edge.
To handle these effects in general cases, we argue that it is natural
to complement feature detection modules by explicit mechanisms for
automatic scale selection,
so as to automatically adapt the scale levels to the image features
The purpose of this article is to present such a framework for automatic
which is generally applicable to a rich variety of image features,
and has been successfully tested by integration with other visual modules.
For references to the original sources,
see (Lindeberg, 1998a,b,1999) and the references therein.
An attractive property of the proposed scale selection mechanism is that
in addition to automatic tuning of the scale parameter,
it induces the computation of natural abstractions (groupings) of image shape.
In this respect, the proposed methodology constitutes a natural pre-processing
stage for subsequent interpretation of visual scenes.
To demonstrate the need for an automatic scale selection mechanism,
let us consider the problems of detecting edges and ridges, respectively,
from image data.
Figure 1 shows two images,
from which scale-space representations
have been computed by convolution with Gaussian kernels,
i.e. given an image
its scale-space representation
denotes the Gaussian kernel
and the variance
of this kernel is referred to as the scale parameter.
At each scale level, edges are defined from points at which the gradient
magnitude assumes a local maximum in the gradient direction
(Canny, 1986; Korn, 1988).
In terms of local directional derivatives, where
a directional derivative in the gradient direction, this edge definition
can be written
Such edges at three scales are shown in the left column
in figure 1.
As can be seen, sharp edge structures corresponding to object boundaries
give rise to edge curves at both fine and coarse scales.
At fine scales, the localization of object edges is better,
while the number of spurious edge responses is larger.
Coarser scales are on the other hand necessary to capture the
shadow edge, while the localization of e.g. the finger
tip is poor at coarse scales.
Edges and bright ridges detected at scale levels ,
and , respectively.
The right column in figure 1 show corresponding
results of multi-scale ridge extraction.
A (bright) ridge point is defined as a point where the intensity assumes
a local maximum in the main eigendirection of the Hessian matrix
(Haralick, 1983; Koenderink & van Doorn, 1994).
In terms of local -coordinates with the mixed directional derivative
, this ridge definition can be written
while in terms of a local -system with the -direction parallel
to the gradient direction and the -direction perpendicular, the ridge
definition assumes the form
As can be seen, the types of ridge curves that are obtained are
strongly scale dependent.
At fine scales, the ridge detector mainly responds to spurious
Then, it gives rise to ridge curves corresponding to the fingers
at , and a ridge curve corresponding to the arm as a whole
Notably, these ridge descriptors are much more sensitive to the
choice of scale levels than the edge features in
In particular, no single scale level is appropriate for describing
the dominant ridge structures in this image.
The experimental results in figure 1
emphasize the need for adapting the scale levels for feature
detection to the local image structures.
How should such an adaptation be performed without a priori
information about what image information is important?
The subject of this section is to give an intuitive motivation
of how size estimation can be performed, by studying
the evolution properties over scales of scale-normalized derivatives.
The basic idea is as follows: At any scale level, we define a normalized
derivative operator by multiplying each spatial derivative operator
by the scale parameter
raised to a
(so far free) parameter :
Then, we propose that automatic scale selection can be performed by
detecting the scales at which -normalized differential entities
assume local maxima with respect to scale.
Intuitively, this approach corresponds to selecting
the scales at which the operator response is as strongest.
For a sine wave
the scale-space representation is given by
and the amplitude of the th-order normalized derivative operator is
This function assumes a unique maximum over scales at
implying that the corresponding -value (
is proportional to the wavelength
of the signal.
In other words, the wavelength of the signal can be detected from
the maximum over scales in the scale-space signature of the signal
(see figure 2).
In this respect, the scale selection approach has similar properties
as a local Fourier analysis, with the difference that there is no
need for explicitly determining a window size for computing the
The amplitude of first-order normalized derivatives as function of
scale for sinusoidal input signals of different frequency
If a local maximum over scales in the normalized differential expression
is detected at the position
in the scale-space representation
of a signal ,
then for a signal
rescaled by a scaling factor
the corresponding local maximum over scales is assumed at
This property shows that the selected scales follow any size variations
in the image data, and this property holds for all homogeneous
polynomial differential invariants
(see (Lindeberg, 1998b)).
In view of the abovementioned scale invariance result,
one may ask the following.
Imagine that we take the idea of performing local scale selection
by local maximization of some sort of normalized derivatives
(not specified yet).
Moreover, let us impose the requirement that the scale levels
selected by this scale selection mechanism should commute
with size variations in the image domain according to
equation (11) and (12).
Then, what types of scale normalizations are possible?
Interestingly, it can then be shown that the form of the
-normalized derivative normalization (6)
arises by necessity,
i.e., with the free parameter
all possible reasonable scale normalizations (see (Lindeberg, 1998b) for a proof).
The idea is that the normalized scale-space derivatives will be used as a
basis for expressing a large class of image operations,
formulated in terms of normalized differential entities.
Equivalently, such derivatives can be computed by applying -normalized
Gaussian derivative operators
to the original -dimensional image.
It is straightforward to show that the -norm of such a
-normalized Gaussian derivative kernel is
which means that the -normalized derivative concept
can be interpreted as a normalization to constant -norm
over scales, with
The special case
corresponds to -normalization
for all orders .
Another interesting interpretation can be made with respect to
having self-similar power spectra of the form
Let us consider the following class of energy measures,
measuring the amount of information in the th order
-normalized Gaussian derivatives
In the two-dimensional case, this class
includes the following differential energy measures:
It can be shown that the variation over scales of these
energy measures is given by
and this expression is scale independent if and only if
Hence, the normalized derivative model is neutral
with respect to power spectra of the form
Empirical studies on natural images often
show a qualitative behaviour similar to this (Field, 1987).
The results presented so far apply generally to a large
class of image descriptors formulated in terms of differential
entities derived from a multi-scale representation.
The idea is that the differential entity
used for automatic
scale selection, together with its associated
should be determined for the task at hand.
In this section, we shall present several examples of how this scale
selection mechanism can be expressed in practice for various types
of feature detectors.
Let us first turn to the problem of edge detection,
using the differential definition of edges
expressed in equation (3).
A natural measure of edge strength that can be associated
with this edge definition is given by normalized gradient magnitude
If we apply the edge definition (3) at all
scales, we will sweep out an edge surface in scale-space.
On this edge surface, we can define a scale-space edge
as a curve where the edge strength measure assumes a local maximum
To determine the normalization parameter , we can consider
an idealized edge model in the form of a diffuse step edge
It is straightforward to show that the edge strength measure is maximized at
If we require that this maximum is assumed at ,
implying that we use a similar derivative filter for
detecting the edge as the shape of the differentiated
edge, then we obtain
The result of edge detection with automatic scale selection based on local maxima over scales of the first order edge strength measure
. The middle column shows all the scale-space edges, whereas the right column shows the 100 edge curves having the highest significance values. Image size:
Three-dimensional view of the 10 most significant
scale-space edges extracted from the arm image.
From the vertical dimension representing the selected scale
measured in dimension length
(in units of
it can be seen how coarse scales are selected for the diffuse
edge structures (due to illumination effects)
and that finer scales are selected for the sharp edge structures
(the object boundaries).
Figure 3 shows the results of detecting
edges from two images in this way.
The middle column shows all scale-space edges that satisfy the definition
(26), while the right column shows the result of
selecting the most significant edges by computing a significance measure
as the integrated normalized edge strength measure along each connected
Figure 4 shows a three-dimensional view
of the 10 most significant scale-space edges from the hand image,
with the selected scales illustrated by the height over the image plane.
Observe that fine scales are selected for the edges corresponding to
This result is consistent with the empirical finding that rather
fine scales are usually appropriate for extracting object edges.
For the shadow edges on the other hand,
successively coarser scales are selected with increasing
degree of diffuseness, in agreement with the analysis of
the idealized edge model in (28).
Let us next turn to the problem of ridge detection,
and sweep out a ridge surface in scale-space by applying the
ridge definition (4) at all scales.
Then, given the following ridge strength measure
which is the square difference between the eigenvalues
the normalized Hessian matrix, let us define a
scale-space ridge as a curve on the ridge surface
where the normalized ridge strength measure assumes
local maxima with respect to scale
To determine the normalization parameter , let us consider
a Gaussian ridge
The maximum over scales in
is assumed at
and by requiring this scale value to be equal to
(implying that a similar rotationally aligned Gaussian derivative filter
is used for detecting the ridge as the shape of the second derivative
of the Gaussian ridge) we obtain
Alternative illustration of
the five strongest scale-space ridges extracted from
the image of the arm in
Each ridge is backprojected onto
a dark copy of the original image as the
union of a set of circles centered on the ridge curve
with the radius proportional to the
selected scale at that point.
The 100 and 10 strongest bright ridges respectively extracted using scale selection based on local maxima over scales of
). Image size:
pixels in the top row, and
pixels in the bottom row.
shows the result of applying such a ridge detector
to an image of an arm and an aerial image of a suburb, respectively.
The ridges have been ranked on significance,
by integrating the normalized ridge strength
measure along each connected ridge curve.
Observe that descriptors corresponding to the roads are selected
from the aerial image.
Moreover, for the arm image, a coarse-scale descriptor is extracted for
the arm as a whole, whereas the individual fingers
give rise to ridge curves at finer scales.
The Laplacian operator
is a commonly used entity for blob detection,
since it gives a strong response at the
center of blob-like image structures.
To formulate a blob detector with automatic scale selection,
we can consider the points in scale-space at which the
the square of the normalized Laplacian
assumes maxima with respect to space and scale.
Such points are referred to as scale-space maxima
Three-dimensional view of the 150 strongest scale-space
maxima of the square of the normalized Laplacian of the Gaussian
computed from the sunflower image.
Blob detection by detection of scale-space maxima of the
normalized Laplacian operator:
(a) Original image.
(b) Circles representing the 250
scale-space maxima of
strongest normalized response.
(c) Circles overlayed on image.
For a Gaussian blob model defined by
it can be shown that the selected scale at the
center of the blob is given by
Hence, the selected scale directly reflects the width
of the Gaussian blob.
show the result of applying this blob detector to an image of a sunflower field.
In figure 7,
each blob feature detected as a scale-space maximum
is illustrated by a circle,
with its radius proportional to the selected scale.
Figure 8 shows a three-dimensional illustration
of the same data set, by marking the scale-space extrema by
spheres in scale-space.
Observe how well the size variations in the image are
captured by this structurally very simple operation.
A commonly used technique for detecting junction candidates in
grey-level images is to detect extrema in the curvature of level curves
multiplied by the gradient magnitude raised to some power
(Kitchen & Rosenfeld, 1982; Koenderink & Richards, 1988).
A special choice is to multiply the level curve
curvature by the gradient magnitude raised to
the power of three.
This leads to the differential invariant
with the corresponding normalized expression
Figure 9 shows the result of detecting
scale-space extrema from an image with corner structures at
Observe that a coarse scale response is obtained
for the large scale corner structure as a whole,
whereas the superimposed corner structures
of smaller size give rise to scale-space maxima at finer scales
(see figure 10
for results on real-world data).
Three-dimensional view of scale-space maxima of
computed for a large scale
corner with superimposed corner structures at finer
We argue that a scale selection mechanism is an essential tool
whenever our aim is to automatically interpret the image data that arise from
observations of a dynamic world.
For example, if we are tracking features in the image domain,
then it is essential that the scale levels are adapted to
the size variation that may occur over time.
Figure 10 shows a
comparison between a feature tracker with automatic scale selection
(Bretzner & Lindeberg, 1998a) and a corresponding feature tracker operating at
(Both feature trackers are based on corner detection from local maxima
of the corner strength measure (38),
followed by a localization stage (Lindeberg, 1998b) and a multi-cue
verification (Bretzner & Lindeberg, 1998a).)
As can be seen from the results in
three out of the ten features
are lost by the fixed scale feature tracker compared to
the adaptive scale tracker.
Comparison between feature tracker with automatic scale
selection and a feature tracker operating at fixed scale.
The left column shows a set of corner features in the
initial frame, and the right column gives a snapshot
after 65 frames.
| Initial frame with 14 detected corners
|| Tracked features with automatic scale selection
|| Tracked features using fixed scales
A brief explanation of this phenomenon is that if we use a
standard algorithm for feature detection at a fixed scale
followed by hypothesis evaluation using a fixed size window for correlation,
then the feature tracker will after a few frames fail to
detect some of the features.
The reason why this occurs is simply the fact that the corner
feature no longer exists at the predetermined scale.
In practice, this usually occurs for blunt corners.
An attractive property of a feature detector with automatic scale
selection is that it allows us to capture less distinct features
than those that occur on man-made objects.
Specifically, we have demonstrated how it makes it possible to
capture features associated with human actions.
Figure 11 illustrates one idea we have been working on in the
area of visually guided human-computer-interaction.
The idea is to have a camera that monitors the motion of a
human hand. At each frame blob and ridge features are extracted
corresponding to the fingers and the finger tips.
Assuming rigidity, the motion of the image features allow
us to estimate the three-dimensional rotation of the
hand (Bretzner & Lindeberg, 1998b).
These motion estimates can in turn be used for controlling other
thus serving as a ``3-D hand mouse'' (Lindeberg & Bretzner, 1998).
Illustration of the concept of a ``3-D hand mouse''.
The idea is to monitor the motion of a human hand
(here, via a set of tracked image features) and
to use estimates of the hand motion for controlling
other computerized equipment (here, the visualization
of a cube).
| Controlling hand motion
|| Detected ridges and blobs
|| Controlled object
We have presented a general framework for automatic scale selection
as well as examples of how this scale selection mechanism can be
integrated with other feature modules.
The experiments demonstrate how abstractions of the
image data can be computed in a conceptually very simple way,
by analysing the behaviour of image features over scales
(sometimes referred to as ``deep structure'').
For applications in other areas as well as related works, see
(Lindeberg, 1998a,b,1999) and (Almansa & Lindeberg, 1999; Wiltschi et al., 1998).
Almansa, A. & Lindeberg, T. (1999)
- , Fingerprint enhancement by shape adaptation of scale-space operators with
automatic scale-selection, Technical Report ISRN KTH/NA/P-99/01-SE,
Dept. of Numerical Analysis and Computing Science, KTH, Stockholm, Sweden.
Bretzner, L. & Lindeberg, T. (1998
- a), `Feature tracking with automatic selection of spatial
scales', Computer Vision and Image Understanding 71(3), 385-392.
Bretzner, L. & Lindeberg, T. (1998
- b), Use your hand as a 3-D mouse or relative orientation
from extended sequences of sparse point and line correspondances using the
affine trifocal tensor, in H. Burkhardt & B. Neumann, eds,
`Proc. 5th European Conference on Computer Vision', Vol. 1406 of
Lecture Notes in Computer Science, Springer Verlag, Berlin, Freiburg,
Germany, pp. 141-157.
Burt, P. J. & Adelson, E. H. (1983)
- , `The Laplacian pyramid as a compact image code', IEEE Trans.
Communications 9:4, 532-540.
Canny, J. (1986),
- `A computational approach
to edge detection', IEEE Trans. Pattern Analysis and Machine Intell.
Field, D. J. (1987),
- `Relations between the
statistics of natural images and the response properties of cortical cells',
J. of the Optical Society of America 4, 2379-2394.
Florack, L. M. J. (1997),
Structure, Series in Mathematical Imaging and Vision, Kluwer Academic
Publishers, Dordrecht, Netherlands.
Florack, L. M. J., ter Haar Romeny, B. M., Koenderink, J. J. & Viergever, M. A. (1992),
- `Scale and the
differential structure of images', Image and Vision Computing
Haralick, R. M. (1983),
- `Ridges and valleys
in digital images', Computer Vision, Graphics, and Image Processing
Kitchen, L. & Rosenfeld, A. (1982)
- , `Gray-level corner detection', Pattern Recognition Letters
Koenderink, J. J. (1984),
- `The structure of
images', Biological Cybernetics 50, 363-370.
Koenderink, J. J. & Richards, W. (
- `Two-dimensional curvature operators', J. of
the Optical Society of America 5:7, 1136-1141.
Koenderink, J. J. & van Doorn, A. J. (
- `Generic neighborhood operators', IEEE Trans.
Pattern Analysis and Machine Intell. 14(6), 597-605.
Koenderink, J. J. & van Doorn, A. J. (
- `Two-plus-one-dimensional differential geometry',
Pattern Recognition Letters 15(5), 439-444.
Korn, A. F. (1988),
- `Toward a symbolic
representation of intensity changes in images', IEEE Trans. Pattern
Analysis and Machine Intell. 10(5), 610-625.
Lindeberg, T. (1994),
- Scale-Space Theory
in Computer Vision, The Kluwer International Series in Engineering
and Computer Science, Kluwer Academic Publishers, Dordrecht,
Lindeberg, T. (1998a),
- `Edge detection
and ridge detection with automatic scale selection', Int. J. of
Computer Vision 30(2), 117-154.
Lindeberg, T. (1998b),
detection with automatic scale selection', Int. J. of Computer Vision
Lindeberg, T. (1999),
- Principles for
automatic scale selection, in B. J. (et al.), ed., `Handbook on
Computer Vision and Applications', Academic Press, Boston, USA, pp. 239-274.
Lindeberg, T. & Bretzner, L. (1998)
- , Förfarande och anordning för överföring av information
genom rörelsedetektering, samt användning av anordningen.
Prewitt, J. M. S. (1970),
- Object enhancement
and extraction, in A. Rosenfeld & B. S. Lipkin, eds,
`Picture Processing and Psychophysics', Academic Press, New York,
Roberts, L. G. (1965),
- Machine perception of
three-dimensional solids, in J. T. T. et al., ed., `Optical and
Electro-Optical Information Processing', MIT Press, Cambridge, Massachusetts,
Sporring, J., Nielsen, M., Florack, L. & Johansen, P., eds
- Gaussian Scale-Space Theory:
Proc. PhD School on Scale-Space Theory, Series in Mathematical Imaging and
Vision, Kluwer Academic Publishers, Copenhagen, Denmark.
ter Haar Romeny, B., Florack, L., Koenderink, J. J. & Viergever, M.,
- Scale-Space Theory in
Computer Vision: Proc. First Int. Conf. Scale-Space'97, Lecture Notes
in Computer Science, Springer Verlag, New York, Utrecht, Netherlands.
Torre, V. & Poggio, T. A. (1980),
`On edge detection', IEEE Trans. Pattern Analysis and Machine Intell.
Wiltschi, K., Pinz, A. & Lindeberg, T. (
- An automatic assessment scheme for steel quality
inspection, Technical Report ISRN KTH/NA/P-98/20-SE, Dept. of Numerical
Analysis and Computing Science, KTH, Stockholm, Sweden.
Witkin, A. P. (1983),
- Scale-space filtering,
in `Proc. 8th Int. Joint Conf. Art. Intell.', Karlsruhe, West Germany,
Automatic Scale Selection as a Pre-Processing Stage
for Interpreting the Visual World
This document was generated using the
LaTeX2HTML translator Version 98.1p7 (June 18th, 1998)
Copyright © 1993, 1994, 1995, 1996,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 1 fspipa
The translation was initiated by Tony Lindeberg on 1999-09-16