Hierarchical Clustering in Machine Learning
Hierarchical Clustering in Machine Learning
In the realm of unsupervised machine learning, one powerful technique often used to discover patterns in unlabeled datasets is Hierarchical Clustering, also known as Hierarchical Cluster Analysis (HCA). Unlike other clustering techniques, hierarchical clustering organizes data into a tree-like structure called a dendrogram, allowing us to visualize the nested grouping of data points and their similarities.
Let’s dive deeper into how it works, why it’s useful, and how you can implement it in Python.
Machine Learning Tutorial:-Click Here
🔍 What is Hierarchical Clustering?
Hierarchical clustering groups data based on a hierarchy—essentially a nested series of clusters. These clusters are structured as a tree, where each merge or split is recorded, creating a comprehensive picture of the clustering process.
This method differs significantly from K-Means Clustering:
- The number of clusters does not need to be predetermined.
- By visually chopping the tree, it produces a dendrogram that aids in cluster selection.
There are two main approaches:
- Agglomerative (Bottom-Up): Begins with every data point as a separate cluster and gradually combines them.
- Divisive (Top-Down): Begins with all of the data in a single cluster and divides it recursively.
We’ll focus on the Agglomerative Hierarchical Clustering in this article.
🤔 Why Use Hierarchical Clustering?
K-Means is excellent for speed and ease of use, however it has certain drawbacks:
- Requires prior knowledge about the number of clusters.
- Assumes all clusters are similar in size.
Hierarchical clustering overcomes these by:
- Not requiring predefined cluster counts.
- Allowing flexible cluster shapes and sizes.
- Providing a visual (dendrogram) to decide how many clusters are meaningful.
🧩 How Does Agglomerative Hierarchical Clustering Work?
Here’s a detailed breakdown of how it creates clusters:
Step 1: Consider every data point as a separate cluster.
If you have N data points, you’ll start with N clusters.
Step 2: Combine the two nearest groups.
This reduces the total cluster count to N – 1.
Step 3: Repeat the process
Continue combining the two nearest clusters until there is just one left.
Step 4: Build a dendrogram
This shows the entire process as a tree. You can cut the tree at the desired level to form clusters.
Download New Real Time Projects :-Click here
📐 Measuring Distance Between Clusters (Linkage Criteria)
The decision of “which clusters to merge” depends on the distance between them, and there are several ways to measure this:
- Single Linkage: The separation between the clusters’ two nearest spots.
- Complete Linkage: The separation between the two most distant points.
- Average Linkage: The mean of all pairwise separations between cluster points.
- Centroid Linkage: Distance between the centroids of the clusters.
The choice of linkage depends on your dataset and use case.
🌲 Dendrogram: Visualizing the Cluster Hierarchy
A dendrogram captures the merging steps of clustering in a tree format. Here’s how to read it:
- X-axis: Data points.
- Y-axis: Distance (usually Euclidean).
Each horizontal cut across the tree represents a different number of clusters. You can choose the optimal number of clusters by making a horizontal cut that spans the largest vertical distance without crossing any cluster merges.
🧪 Python Implementation of Hierarchical Clustering
Let’s walk through a real-world example: grouping mall customers based on Annual Income and Spending Score.
🔧 Step 1: Data Preprocessing
# Import libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Load dataset
dataset = pd.read_csv('Mall_Customers_data.csv')
X = dataset.iloc[:, [3, 4]].values # Annual Income and Spending Score
🌲 Step 2: Creating the Dendrogram
import scipy.cluster.hierarchy as sch
# Create dendrogram
dendrogram = sch.dendrogram(sch.linkage(X, method='ward'))
plt.title("Dendrogram")
plt.xlabel("Customers")
plt.ylabel("Euclidean Distance")
plt.show()
The ward
method minimizes the variance within clusters. The dendrogram will help you decide how many clusters to choose.
🧠 Step 3: Training the Hierarchical Clustering Model
from sklearn.cluster import AgglomerativeClustering
hc = AgglomerativeClustering(n_clusters=5, affinity='euclidean', linkage='ward')
y_hc = hc.fit_predict(X)
n_clusters=5
: Number of clusters based on the dendrogram.affinity='euclidean'
: Distance metric.linkage='ward'
: Linkage method used.
🎨 Step 4: Visualizing the Clusters
plt.scatter(X[y_hc == 0, 0], X[y_hc == 0, 1], s=100, c='red', label='Cluster 1')
plt.scatter(X[y_hc == 1, 0], X[y_hc == 1, 1], s=100, c='blue', label='Cluster 2')
plt.scatter(X[y_hc == 2, 0], X[y_hc == 2, 1], s=100, c='green', label='Cluster 3')
plt.scatter(X[y_hc == 3, 0], X[y_hc == 3, 1], s=100, c='cyan', label='Cluster 4')
plt.scatter(X[y_hc == 4, 0], X[y_hc == 4, 1], s=100, c='magenta', label='Cluster 5')
plt.title('Customer Segments')
plt.xlabel('Annual Income')
plt.ylabel('Spending Score')
plt.legend()
plt.show()
Download New Real Time Projects :-Click here
Complete Advance AI topics:- CLICK HERE
🎯 Conclusion
Hierarchical clustering is a powerful and intuitive technique when you’re dealing with unlabeled data and need flexibility in choosing the number of clusters. The visual aid of the dendrogram makes this method especially appealing for exploratory data analysis.
Compared to K-Means, hierarchical clustering gives you a full clustering hierarchy, from which you can derive insights at various levels of granularity.
✅ Key Takeaways
- No need to predefine the number of clusters.
- Use linkage methods like Ward, Single, or Complete to control cluster merging.
- The dendrogram is your best friend for visual analysis and interpretation.
hierarchical clustering in machine learning with example
k-means clustering in machine learning
hierarchical clustering example
divisive hierarchical clustering
divisive hierarchical clustering in machine learning
hierarchical clustering algorithm
agglomerative hierarchical clustering
hierarchical clustering in machine learning geeksforgeeks
knn in machine learning
clustering in machine learning
hierarchical clustering in machine learning python
hierarchical clustering in machine learning
explain hierarchical clustering in machine learning
divisive hierarchical clustering in machine learning
applications of hierarchical clustering in machine learning
advantages of hierarchical clustering in machine learning
non hierarchical clustering in machine learning
hierarchical clustering in machine learning python
hierarchical clustering in machine learning pdf
hierarchical clustering in machine learning example
hierarchical clustering in machine learning medium
hierarchical clustering in machine learning types
Post Comment