What is Silhouette Score?

Last Updated : 23 Jun, 2025

When we use clustering algorithms like K-Means to group data, we need a way to check how good those groups are. The Silhouette Score is one of the most popular ways to do this. It helps us measure how well each data point fits into its assigned cluster and how far it is from other clusters.

How the Silhouette Score Works

The Silhouette Score evaluates how well each data point fits within its assigned cluster, and how distinctly separated it is from other clusters. For each data point, two main quantities are calculated:

Intra-cluster distance (a_i ): This measures how close the data point is to other points within the same cluster. It is computed by taking the average distance between the point and all other points in its own cluster. A smaller value of a_i indicates that the data point is well-matched to its cluster.
Nearest-cluster distance (b_i ): This measures how far the data point is from points in the nearest neighboring cluster (i.e., the next best alternative cluster it could belong to). It is calculated as the average distance between the point and all points in the nearest different cluster. A larger b_i means the data point is well-separated from neighboring clusters.

Silhouette Distance

To understand how the Silhouette Score is calculated, we first look at the difference between how close a point is to its own cluster versus how far it is from the next closest cluster. This difference is called the silhouette distance.

If the point is much closer to its own cluster than to other clusters, it means the clustering is good.
If the point is close to both its own and a neighboring cluster, the clustering is less certain.
If the point is closer to a different cluster than its own, it’s likely misclassified.

This difference between distances is turned into a score between -1 and 1 using the silhouette formula shown below:

\text{Silhouette Score} = \frac{b_i - a_i}{\max(a_i, b_i)}

What the Silhouette Score Tells Us

The score can range from -1 to +1.
A score close to +1 means the data point fits very well in its own cluster and is far from others.
A score close to 0 means the data point is between clusters or the clusters are overlapping.
A score close to -1 means the data point is in the wrong cluster.

The image below compares K-Means clustering using 6 centroids vs. 4 centroids. The clustering with 4 centroids has a higher Silhouette Score (0.84), indicating better-defined clusters.

Calculating Silhouette Score with Python

In this example, we will create a synthetic dataset using random numbers and apply K-Means clustering. Then, we will calculate the Silhouette Score.

Step 1: Import necessary libraries

We need NumPy for generating random data, and scikit-learn for clustering and calculating the Silhouette Score.

Python

import numpy as np
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

Step 2: Generate random data

We create three separate groups of data points, where each group represents one cluster. The data points are spread around different centers using the normal distribution.

Python

np.random.seed(7)
x1 = np.random.normal(3, 1, (50, 2))  # Cluster 1 centered at 3
x2 = np.random.normal(7, 1, (50, 2))  # Cluster 2 centered at 7
x3 = np.random.normal(11, 1, (50, 2)) # Cluster 3 centered at 11

Step 3: Combine all clusters into one dataset

We merge all three groups into a single dataset to prepare it for clustering.

Python

data = np.vstack((x1, x2, x3))

Step 4: Apply K-Means clustering

We create the K-Means model to form 3 clusters and assign each data point to one of the clusters.

Python

model = KMeans(n_clusters=3, random_state=7)
predicted_labels = model.fit_predict(data)

Step 5: Calculate Silhouette Score

We calculate the Silhouette Score to evaluate how well the clustering worked.

Python

silhouette_val = silhouette_score(data, predicted_labels)
print("Silhouette Score:", silhouette_val)

Output:

Silhouette Score: 0.6808642416167786

The Silhouette Score of 0.68 shows that the clustering worked well, with points fitting well into their own clusters and clearly separated from others. A score above 0.5 usually means good clustering, and values close to 1.0 indicate strong separation. Since the data was generated with clear cluster centers, this result is expected.

What is Silhouette Score?

How the Silhouette Score Works

Silhouette Distance

What the Silhouette Score Tells Us

Calculating Silhouette Score with Python

Step 1: Import necessary libraries

Step 2: Generate random data

Step 3: Combine all clusters into one dataset

Step 4: Apply K-Means clustering

Step 5: Calculate Silhouette Score

Similar Articles:

Explore

What is Silhouette Score?

How the Silhouette Score Works

Silhouette Distance

What the Silhouette Score Tells Us

Calculating Silhouette Score with Python

Step 1: Import necessary libraries

Step 2: Generate random data

Step 3: Combine all clusters into one dataset

Step 4: Apply K-Means clustering

Step 5: Calculate Silhouette Score

Similar Articles:

Explore

Machine Learning Basics

Python for Machine Learning

Feature Engineering

Supervised Learning

Unsupervised Learning

Model Evaluation and Tuning

Advanced Techniques

Machine Learning Practice

Thank You!

What kind of Experience do you want to share?