Evaluating the grain clustering
In the previous exercise, you observed from the inertia plot that 3 is a good number of clusters for the grain data. In fact, the grain samples come from a mix of 3 different grain varieties: "Kama", "Rosa" and "Canadian". In this exercise, cluster the grain samples into three clusters, and compare the clusters to the grain varieties using a cross-tabulation.
You have the array samples of grain samples, and a list varieties giving the grain variety for each sample. Pandas (pd) and KMeans have already been imported for you.
This exercise is part of the course
Unsupervised Learning in Python
Exercise instructions
- Create a
KMeansmodel calledmodelwith3clusters. - Use the
.fit_predict()method ofmodelto fit it tosamplesand derive the cluster labels. Using.fit_predict()is the same as using.fit()followed by.predict(). - Create a DataFrame
dfwith two columns named'labels'and'varieties', usinglabelsandvarieties, respectively, for the column values. This has been done for you. - Use the
pd.crosstab()function ondf['labels']anddf['varieties']to count the number of times each grain variety coincides with each cluster label. Assign the result toct. - Hit submit to see the cross-tabulation!
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a KMeans model with 3 clusters: model
model = ____
# Use fit_predict to fit model and obtain cluster labels: labels
labels = ____
# Create a DataFrame with labels and varieties as columns: df
df = pd.DataFrame({'labels': labels, 'varieties': varieties})
# Create crosstab: ct
ct = ____
# Display ct
print(ct)