What is Silhouette Score?
Last Updated :
23 Jun, 2025
When we use clustering algorithms like K-Means to group data, we need a way to check how good those groups are. The Silhouette Score is one of the most popular ways to do this. It helps us measure how well each data point fits into its assigned cluster and how far it is from other clusters.
How the Silhouette Score Works
The Silhouette Score evaluates how well each data point fits within its assigned cluster, and how distinctly separated it is from other clusters. For each data point, two main quantities are calculated:
- Intra-cluster distance (a_i ): This measures how close the data point is to other points within the same cluster. It is computed by taking the average distance between the point and all other points in its own cluster. A smaller value of a_i indicates that the data point is well-matched to its cluster.
- Nearest-cluster distance (b_i ): This measures how far the data point is from points in the nearest neighboring cluster (i.e., the next best alternative cluster it could belong to). It is calculated as the average distance between the point and all points in the nearest different cluster. A larger b_i means the data point is well-separated from neighboring clusters.
Silhouette Distance
To understand how the Silhouette Score is calculated, we first look at the difference between how close a point is to its own cluster versus how far it is from the next closest cluster. This difference is called the silhouette distance.
- If the point is much closer to its own cluster than to other clusters, it means the clustering is good.
- If the point is close to both its own and a neighboring cluster, the clustering is less certain.
- If the point is closer to a different cluster than its own, it’s likely misclassified.
This difference between distances is turned into a score between -1 and 1 using the silhouette formula shown below:
\text{Silhouette Score} = \frac{b_i - a_i}{\max(a_i, b_i)}
What the Silhouette Score Tells Us
- The score can range from -1 to +1.
- A score close to +1 means the data point fits very well in its own cluster and is far from others.
- A score close to 0 means the data point is between clusters or the clusters are overlapping.
- A score close to -1 means the data point is in the wrong cluster.
The image below compares K-Means clustering using 6 centroids vs. 4 centroids. The clustering with 4 centroids has a higher Silhouette Score (0.84), indicating better-defined clusters.
Visual Comparison of Clustering with Different Centroids and Their Silhouette ScoreCalculating Silhouette Score with Python
In this example, we will create a synthetic dataset using random numbers and apply K-Means clustering. Then, we will calculate the Silhouette Score.
Step 1: Import necessary libraries
We need NumPy for generating random data, and scikit-learn for clustering and calculating the Silhouette Score.
Python
import numpy as np
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
Step 2: Generate random data
We create three separate groups of data points, where each group represents one cluster. The data points are spread around different centers using the normal distribution.
Python
np.random.seed(7)
x1 = np.random.normal(3, 1, (50, 2)) # Cluster 1 centered at 3
x2 = np.random.normal(7, 1, (50, 2)) # Cluster 2 centered at 7
x3 = np.random.normal(11, 1, (50, 2)) # Cluster 3 centered at 11
Step 3: Combine all clusters into one dataset
We merge all three groups into a single dataset to prepare it for clustering.
Python
data = np.vstack((x1, x2, x3))
Step 4: Apply K-Means clustering
We create the K-Means model to form 3 clusters and assign each data point to one of the clusters.
Python
model = KMeans(n_clusters=3, random_state=7)
predicted_labels = model.fit_predict(data)
Step 5: Calculate Silhouette Score
We calculate the Silhouette Score to evaluate how well the clustering worked.
Python
silhouette_val = silhouette_score(data, predicted_labels)
print("Silhouette Score:", silhouette_val)
Output:
Silhouette Score: 0.6808642416167786
The Silhouette Score of 0.68 shows that the clustering worked well, with points fitting well into their own clusters and clearly separated from others. A score above 0.5 usually means good clustering, and values close to 1.0 indicate strong separation. Since the data was generated with clear cluster centers, this result is expected.
Similar Articles:
Explore
Machine Learning Basics
Python for Machine Learning
Feature Engineering
Supervised Learning
Unsupervised Learning
Model Evaluation and Tuning
Advanced Techniques
Machine Learning Practice