In order to match color regions, we need a measure for the similarity of colors, i.e., pink is more similar to red than blue. We base the measurement of color similarity on the closeness in the HSV color space as follows: the similarity between any two colors, indexed by and , is given by
which corresponds to the proximity in the cylindrical HSV color space depicted in Figure 5. The measure of color similarity, , is used within the computation of the distance between color distributions as described next.
A distribution of colors is defined by a color histogram. By transforming the three color channels of image I[x,y] using transformation and quantization as defined in Section 2, where , the single variable color histogram is given by, where X and Y are the width and height of the image, respectively, which are used for normalization,
The most common dissimilarity measures for feature vectors are based upon the Minkowski metric, which has the following form, where and are the query and target feature vectors, respectively,
For example, both the , (r = 1) [1], and , (r = 2), metrics have been used for measuring dissimilarity of histograms. However, histogram dissimilarity measures based upon the Minkowski metric neglect to compare similar colors in the computation of dissimilarity. For example, using a Minkowski metric, a dark red image is equally dissimilar to a red image as to a blue image. By using color similarity measures within the distance computation, a quadratic metric improves histogram matching.
The QBIC project uses the histogram quadratic distance metric for matching images [3]. It measures the weighted similarity between histograms which provides more desirable results than ``like-bin'' only comparisons. The quadratic distance between histograms and is given by
where and denotes the similarity between colors with indices i and j. By defining color similarity in HSV color space, is given by Eq. 3. Since the histogram quadratic distance computes the cross similarity between colors, it is computationally expensive. Therefore, in large database applications, histogram indexing strategies, such as pre-filtering [5], are required to avoid exhaustive search.
Alternatively, we utilize color sets to represent color information. The distinction is that color sets give only a selection of colors, whereas, color histograms denote the relative amounts of colors. Although we use the above system for color set selection in order to extract regions, we note here that color sets can also be obtained by thresholding color histograms. For example, given threshold for color m, color sets are related to color histograms by
Color sets work well to represent regional color since (1) and have been derived to give a complete set of distinct colors and (2) salient regions possess only a few, equally dominant colors [13].
We use a modification of the color histogram quadratic distance equation (Eq. 6) to measure the distance between color sets. The quadratic distance between two color sets and is given by
Considering the binary nature of the color sets, the computational complexity of the quadratic distance function can be reduced. We decompose the color set quadratic formula to provide for a more efficient computation and indexing. By defining , and , the color set quadratic distance is given as
Since is a binary vector,
That is, any query for the most similar color set to may be easily processed by accessing individually and 's, where , see Table 2. As such, and 's are precomputed, stored and indexed individually. Notice also that is a constant of the query. The closest color set, , to is the one that minimizes .