Elsevier

Information Sciences

Volume 177, Issue 8, 15 April 2007, Pages 1878-1891
Information Sciences

Shape-based image retrieval using support vector machines, Fourier descriptors and self-organizing maps

https://doi.org/10.1016/j.ins.2006.10.008Get rights and content

Abstract

Image retrieval based on image content has become an important topic in the fields of image processing and computer vision. In this paper, we present a new method of shape-based image retrieval using support vector machines (SVM), Fourier descriptors and self-organizing maps. A list of predicted classes for an input shape is obtained using the SVM, ranked according to their estimated likelihood. The best match of the image to the top-ranked class is then chosen by the minimum mean square error. The nearest neighbors can be retrieved from the self-organizing map of the class. We employ three databases of 99, 216, and 1045 shapes for our experiment, and obtain prediction accuracy of 90%, 96.7%, and 84.2%, respectively. Our method outperforms some existing shape-based methods in terms of speed and accuracy.

Introduction

Shape-based image retrieval of similar objects from image databases has been extensively studied [13], [24], [26], [30]. Indexing is often used to search for a specific shape in a large database [4]. A simple approach uses the nearest neighbor in the database. This requires O(n) distance computations, where n is the number of items in the database. The approach becomes computationally costly if the database is large. Therefore, several improved nearest neighbor techniques have been proposed. For instance, in the spatial access method [16] each shape is represented by a finite set of labeled features. The distance between a pair of shapes is defined as the Euclidean distance between the moment invariant features [14] or between the Fourier descriptors [12]. The Euclidean distance is used to divide the input elements into spatial clusters, so the inefficient search of distinct clusters in query can be avoided [6], [7]. However, this method suffers from the drawback that its computational complexity increases exponentially with the dimensionality of feature space.

Two dimensional moment invariants for planar geometric figures have been investigated. Hu [9] introduced a set of moment invariants using a nonlinear combination based on normalized central moments. It is necessary to compute these invariants to engage all the pixels in the shape. Chen [2] proposed an improved moment invariant technique only based on boundary pixels to speed up the computation. In addition, Chen’s normalized moment invariants were designed to be invariant to scaling, translation and rotation [19]. The definitions of Hu’s and Chen’s moment invariants are briefly introduced in Appendix A.

Shape similarity measures are essential in pattern matching, where the strategy is to transform patterns and to measure their resemblance. The similarity of two patterns is measured as the distance between their feature vectors [15]. An extensive study of similarity measures using matching can be found in Ref. [23]. Methods have been developed to retrieve images using these measures [11], [27]. Unfortunately, they are inefficient for a large database. The view based similarity approach developed by Tangelder and Veltkamp [20] is robust and possesses high discriminative power compared to other shape matching methods, but requires a medium speed due to its lack of efficient indexing. For example, the shock graph matching method proposed by Sebastian et al. [18] is a view based similarity approach. Their results, derived from two databases of 99 and 216 shapes (Fig. 1), demonstrate a 100% recognition rate for the top three matches. However, this method is impractical for querying large databases due to the high computational cost of the deformation function between two shock graphs [17].

In this paper, we propose an efficient image retrieval algorithm for large scale shape databases. We use the Fourier descriptor (FD) as the similarity measure, the support vector machines (SVM) as the pattern classifier, and the self-organizing map (SOM) as the nearest neighbor searching method. The overall shape based image retrieval algorithm is illustrated in Fig. 2. First, we characterize a set of 2D shapes by their FDs. Second, we build an image database containing the shape number, the class label, and the FDs. Third, we use the FDs with class labels to train the SVM. Fourth, we feed the input patterns into the SVM and generate the predicted classes. The SVM with multi-class probability can provide a list of predicted, ranked classes. Then we select the best matched image based on the least mean square error distance (LMSED) among the images of the top rank class. We provide the user an option to retrieve more matched images. If the predicted class is incorrect, we iterate on the next predicted class in the ranked list. Using this algorithm, we achieve nearly the same accuracy but much faster than the existing method in Ref. [18].

The rest of this paper is organized as follows. In Section 2, we introduce Fourier descriptor and moment invariants. We describe the support vector machine and the self-organizing map, respectively, in Sections 3 Support vector machines (SVM), 4 Self-organizing map (SOM). We provide experimental results in Section 5. Finally, we draw conclusions in Section 6.

Section snippets

Fourier descriptor and moment invariants

The shape signature is defined by a 1D continuous function f(t). The Fourier transform of f(t) is given by Eq. (1). When dealing with discrete images, we use the discrete Fourier transform (DFT). Assuming that the object contour is digitized into N points, the DFT of f(t) is given by Eq. (2), where the value of u{0, 1,  , N  1}. From Euler’s equation, i.e., e−j2πut/N = cos (2πut/N)  j sin (2πut/N), we obtain Eq. (3). The coefficient F(u) is called a Fourier descriptor (FD).F(u)=-f(t)e-j2πutdtF(u)=1Nt=

Support vector machines (SVM)

The SVM, derived from the Vapnik’s structural risk minimization (SRM) principle [22], can reduce the empirical risk and quantities based on the bounds of the generalization error. The basic idea is to transform the training data into a higher dimensional space and then to find the hyperplane that maximizes the margin between classes. The simplest model of the SVM classifier is called the maximal margin classifier. As shown in Eq. (7), the SVM classifier attempts to place a linear plane between

Self-organizing map (SOM)

The Kohonen SOM [10] is an algorithm to visualize and interpret data by projecting it from a high dimensional space to a low dimensional space. Basically, it consists of a 2D lattice of nodes. Each node has an N dimensional feature vector that represents a point in the feature space. The map attempts to represent all the available observations. It loops through training data to find the closest node from the map. The choice is based on the nearest neighbor principle for feature vectors

Experimental results

To facilitate comparisons, we adopt the same dataset as in [18]. It contains 42 classes, each including 5–60 different shapes, except two classes, calf and donkey, with less than three shapes, which we exclude. Only two sets of 99 and 216 shapes are used in [18]; however, our experiment includes 1045 shapes, as shown in Fig. 3 (disregard the shapes in black background).

We choose the probability estimate option of the LIBSVM package using the Gaussian kernel function k(x, y) = exp (−(x  y)2/δ2) to

Conclusions

We have presented a novel approach to shape-based image retrieval using Fourier descriptors, support vector machines, and self-organizing maps. Unlike existing methods, we use the SVM with probability estimates to first obtain the most likely classes and then retrieve similar images based on the best match from the SOM of the selected class. Unlike the shock graph matching approach, our method does not need to perform a linear search on all the images in the database. Experimental results show

Acknowledgement

This work is partially supported by the National Science Council, Taiwan, ROC, under the Grant NSC 94-2213-E-216-024.

References (30)

  • J. Dongarra, Performance of Various Computers in Computational Chemistry, 2005,...
  • C. Faloutsos et al.

    Efficient and effective querying by image content

    Intell. Info. Syst.

    (1994)
  • W. Groski et al.

    Index-based object recognition in pictorial data management

    Comput. Vision Graph. Image Process.

    (1990)
  • C.-W. Hsu et al.

    A comparison of methods for multi-class support vector machines

    IEEE Trans. Neural Networks

    (2002)
  • M.-K. Hu

    Visual pattern recognition by moment invariants

    IRE Trans. Inform. Theor.

    (1962)
  • Cited by (0)

    View full text