K-Nearest Neighbor Algorithm

K-Nearest Neighbor Algorithm (KNN) for Machine LearningAn Intuitive Guide with Python Implementation

K-Nearest Neighbor Algorithm (KNN)

In the world of Machine Learning, there are some algorithms that are so intuitive and straightforward, yet so powerful, that they become essential tools for beginners and experts alike. The K-Nearest Neighbor (K-NN) algorithm is one such example.

KNN is a supervised learning algorithm that is used for both classification and regression, though it is more widely used for classification tasks. Despite its simplicity, KNN delivers impressive results and is a fundamental stepping stone in understanding how machine learning models work based on similarity.

Complete Python Course with Advance topics:-Click Here
SQL Tutorial :-Click Here

🧠 What is the KNN Algorithm?

At its core, KNN assumes that similar things exist in close proximity. In other words, similar data points are near each other. It doesn’t learn explicitly during the training phase, which is why it’s also called a lazy learner — it stores the data and only makes decisions at runtime when a prediction is requested.

How It Works:

If a new data point needs to be classified, KNN:

  1. Looks at the ‘K’ closest data points from the training set (based on a distance metric).
  2. Counts how many of those neighbors belong to each category.
  3. Assigns the class that is most common among those neighbors to the new data point.

For example, suppose we are trying to decide whether an image is of a cat or a dog. If the new image looks more like cats from the existing dataset, KNN will classify it as a cat.

🤔 Why Do We Need KNN?

Imagine a situation where we have two categories: A and B, and a new data point x1 appears. Without a pre-defined rule or trained model, it’s hard to determine which group it belongs to. KNN solves this by evaluating similarity — a concept we intuitively understand and trust.

🔧 How KNN Algorithm Works – Step-by-Step

  1. Select the number K of neighbors.
  2. Calculate the distance (usually Euclidean) of the new data point from the training data.
  3. Sort the distances and determine the K nearest neighbors.
  4. Count the number of data points in each category among these neighbors.
  5. Assign the new data point to the category with the highest count.
  6. Your model is ready to make more predictions.

📐 Euclidean Distance Formula:

d=(x2−x1)2+(y2−y1)2d = \sqrt{(x_2 – x_1)^2 + (y_2 – y_1)^2}

🔢 Choosing the Value of K

Choosing the right value of K is critical:

  • K = 1 or 2: Might be too sensitive to noise or outliers.
  • Larger K: Smoother boundaries, but may blur the distinction between classes.
  • Rule of Thumb: Start with K=5 and experiment.

✅ Advantages of KNN:

  • Very simple and intuitive.
  • No training period – suitable for real-time predictions.
  • Works well with multi-class problems.
  • Robust to noisy data, especially with larger K values.

❌ Disadvantages of KNN:

  • Computationally expensive – calculates distance from all training samples.
  • Sensitive to irrelevant features and the scale of the data.
  • Need to manually choose optimal K.

🚗 Real-Life Example: SUV Purchase Prediction

Problem Statement:

A car manufacturer wants to target ads to users most likely to buy a new SUV. The company uses data from social networks like age and salary to determine potential buyers.

Let’s solve this using KNN in Python.

💻 Python Implementation of KNN

📁 Step 1: Data Preprocessing

import numpy as np  
import matplotlib.pyplot as plt  
import pandas as pd  

# Importing the dataset  
data_set = pd.read_csv('user_data.csv')  
x = data_set.iloc[:, [2,3]].values  # Age and Estimated Salary  
y = data_set.iloc[:, 4].values      # Purchased (0 or 1)

# Splitting dataset  
from sklearn.model_selection import train_test_split  
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=0)  

# Feature Scaling  
from sklearn.preprocessing import StandardScaler    
sc = StandardScaler()    
x_train = sc.fit_transform(x_train)    
x_test = sc.transform(x_test)

🔍 Step 2: Fitting the K-NN Model

from sklearn.neighbors import KNeighborsClassifier  
classifier = KNeighborsClassifier(n_neighbors=5, metric='minkowski', p=2)  
classifier.fit(x_train, y_train)

🧪 Step 3: Predicting the Test Set

y_pred = classifier.predict(x_test)

📊 Step 4: Confusion Matrix

from sklearn.metrics import confusion_matrix  
cm = confusion_matrix(y_test, y_pred)  
print(cm)

Let’s say the output is:

[[64  4]
 [ 3 29]]

This shows 93 correct predictions and only 7 incorrect ones. Better than logistic regression!

🖼️ Step 5: Visualizing the Results (Training Set)

from matplotlib.colors import ListedColormap  
x_set, y_set = x_train, y_train  
x1, x2 = np.meshgrid(np.arange(start=x_set[:, 0].min()-1, stop=x_set[:, 0].max()+1, step=0.01),  
                     np.arange(start=x_set[:, 1].min()-1, stop=x_set[:, 1].max()+1, step=0.01))  
plt.contourf(x1, x2, classifier.predict(np.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape),  
             alpha=0.75, cmap=ListedColormap(('red', 'green')))  
plt.xlim(x1.min(), x1.max())  
plt.ylim(x2.min(), x2.max())  
for i, j in enumerate(np.unique(y_set)):  
    plt.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],  
                c=ListedColormap(('red', 'green'))(i), label=j)  
plt.title('K-NN (Training Set)')  
plt.xlabel('Age')  
plt.ylabel('Estimated Salary')  
plt.legend()  
plt.show()

The result? A beautiful, non-linear decision boundary that separates buyers from non-buyers based on age and salary.

Download New Real Time Projects :-Click here
Complete Advance AI topics:- CLICK HERE

📌 Final Thoughts

The K-Nearest Neighbor algorithm is a fantastic example of simplicity in action. It’s a non-parametric, lazy learning algorithm that doesn’t need to build a model but instead relies on the power of proximity to make predictions.

Whether you’re solving a classification problem like identifying spam emails, predicting customer churn, or even recognizing handwritten digits — KNN has got your back.

If you’re new to Machine Learning, KNN is a great place to start — easy to understand, simple to implement, and highly effective when tuned correctly.

🔗 Stay tuned for more beginner-friendly machine learning guides!
Let me know if you’d like to explore other algorithms like SVM, Decision Trees, or Naive Bayes next!


k-nearest neighbor algorithm in machine learning
knn algorithm solved example
k-nearest neighbor algorithm example
knn algorithm formula
knn algorithm python
what is k in knn
knn full form
knn algorithm in machine learning python code
k nearest neighbor knn algorithm in machine learning
k-nearest neighbor knn algorithm example
k-nearest neighbor knn algorithm python
k-nearest neighbor knn algorithm geeksforgeeks
k-nearest neighbor algorithm in machine learning
k-nearest neighbor algorithm example
k-nearest neighbor algorithm in data mining
knn algorithm solved example
k-nearest neighbor algorithm in machine learning ppt
k-means algorithm
knn algorithm formula
knn vs k-means
random forest algorithm
k-nearest neighbor
k-nearest neighbor algorithm in machine learning
k-nearest neighbor algorithm example
K-Nearest Neighbor Algorithm in python
K-Nearest Neighbor Algorithm geeksforgeeks

Share this content:

1 comment

Post Comment