ELEN E 6880 Statistical Pattern Recognition

(V. Castelli, M. Brodie, I. Rish, D. Oblinger)

Lecture 8: Nearest-Neighbor Classifiers, by Vittorio Castelli

Relevant Book Sections

Chapters 4.4, 4.5, and 4.6 of Duda, Hart, and Stork, "Pattern Classification".
Chapters 5, 6, 7, 11, and 26 of Devroye, Gyorfi, and Lugosi, "A Probabilistic Theory of Pattern Recognition" (on the class reading list) are devoted to the problem at hand and to its variations. Some of this material is very advanced.
Chapters 2.3 and 13 of Hastie, Tibshirani, and Friedman, "The Elements of Statistical Learning" (on the class reading list) deal with the nearest-neighbor and related method. This book is more oriented to the practitioner than to the previous bool

Material For The Lecture

Material covered in class

The lecture was prepared using a wide variety of material from the textbooks and from the additional material listed below.
The writeup, in pdf format, of the material covered in the lecture can be found here.

Variable-metric Nearest-Neighbor Classifiers

You might be asking yourself what is a good metric for nearest-neighbor classifiers. Although asymptotically it is known that the metric does not matter, it is clear (and known) that an appropriate choice of a metric can improve the classifier error rate for finite training sample size.
This area has been an active area of research in the past. However, more recently researchers have started questioning the principle that a unique distance metric for the entire feature space, and are working on adaptive metrics (namely, on "distance" functions that vary depending on the query point).
Early work on this topic was done by Jerome Friedman, at Stanford. His seminal paper "Flexible Metric Nearest Neighbor Classification" is available in compressed postscript form by following the link.
A researcher who has worked on the topic in very recent times is Carlotta Domeniconi. The following citations might be of interest to you

C. Domeniconi, D. Gunopulos, "Efficient Local Flexible Nearest Neighbor Classification", to appear in the Proceedings of the Second SIAM Intl. Conference on Data Mining, 2002.
C. Domeniconi, D. Gunopulos, "Adaptive Nearest Neighbor Classification using Support Vector Machines", Advances in Neural Information Processing Systems 14, MIT Press (NIPS-2001).
C. Domeniconi, J. Peng, D. Gunopulos, "Adaptive Metric Nearest Neighbor Classification", in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, June 13-15, 2000, Hilton Head Island, South Carolina.
C.Domeniconi, J. Peng, D. Gunopulos, "Locally Adaptive Metric Nearest Neighbor Classification", Technical Report UCR-CSE-00-02, August 10, 2000

Where to Find Additional Material

There is an enormous literature on Nearest-Neighbor Methods.

A collection of seminal papers was published by the IEEE: B. Dasarathy, "Nearest Neighbor Pattern Classification Techniques", IEEE Computer Society Press, 1990.
The IEEE and ACM digital library will return a large number of hits in response to queries on nearest-neighbor methods.
As we mentioned in class, nearest-neighbor methods are computationally intensive. The computational cost can be reduced using indexing structures. A recent survey of multidimensional indexing methods supporting nearest-neighbor queries can be found in this IBM technical report. Since the material in the report has been published in a book, please refrain from distributing it.