KNN for classification and regression
- The k-nearest neighbors algorithm (k-NN, kNN, or KNN) is an instance-based, non-parametric method proposed by Thomas Cover used for classification (when the output is a class membership) and regression (when the output is a value).
- K-NN is an instance-based algorithm, as it compares new problem instances with instances seen in training and makes prediction based on the existing instances that are ``close" to the new problem instance.
- In k-NN classification, the output is a class membership. An item/object is classified by a plurality vote of its neighbors, with the item being assigned to the class most common among its k nearest neighbors (k is typically small). If k = 1, then the object is simply assigned to the class of its nearest neighbor.
- In k-NN regression, the output is a value, which is the average of the values of k nearest neighbors.
- Both for classification and regression, a useful technique is to assign weights to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. A common weighting scheme is 1/d, where d is the distance between the new instance and the neighbor.
- Nearest neighbor search algorithms: brute-force vs those utilzing space partitioning data structure (e.g., ball tree or k-d tree).
Some notes
- How to set the parameters? What k to use? What distance metric?
(image from Wikipedia)