Brief Tutorial On Unsupervised Learning
Here’s a brief tutorial for unsupervised learning using Python and scikit-learn library:
First, we need to import the necessary libraries and load the dataset:
<div id="crayon-69432b71e762b094921443" class="crayon-syntax crayon-theme-powershell crayon-font-monaco crayon-os-pc print-yes notranslate" data-settings=" minimize scroll-mouseover" style=" margin-top: 12px; margin-bottom: 12px; font-size: 12px !important; line-height: 15px !important;">
<div class="crayon-toolbar" data-settings=" mouseover overlay hide delay" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><span class="crayon-title"></span>
<div class="crayon-tools" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><div class="crayon-button crayon-nums-button" title="Toggle Line Numbers"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-plain-button" title="Toggle Plain Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-wrap-button" title="Toggle Line Wrap"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-expand-button" title="Expand Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-copy-button" title="Copy"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-popup-button" title="Open Code In New Window"><div class="crayon-button-icon"></div></div></div></div>
<div class="crayon-info" style="min-height: 18px !important; line-height: 18px !important;"></div>
<div class="crayon-plain-wrap"><textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly style="-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;">
import pandas as pd from sklearn.cluster import KMeans
Load the dataset
df = pd.read_csv(‘my_dataset.csv’)
1 2 3 4 5 6 | import pandas as pd from sklearn.cluster import KMeans # Load the dataset df = pd.read_csv('my_dataset.csv') |
Next, we need to prepare the data for clustering. This involves removing any unnecessary columns and scaling the data so that all features are on the same scale:
<div id="crayon-69432b71e7630006772277" class="crayon-syntax crayon-theme-powershell crayon-font-monaco crayon-os-pc print-yes notranslate" data-settings=" minimize scroll-mouseover" style=" margin-top: 12px; margin-bottom: 12px; font-size: 12px !important; line-height: 15px !important;">
<div class="crayon-toolbar" data-settings=" mouseover overlay hide delay" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><span class="crayon-title"></span>
<div class="crayon-tools" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><div class="crayon-button crayon-nums-button" title="Toggle Line Numbers"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-plain-button" title="Toggle Plain Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-wrap-button" title="Toggle Line Wrap"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-expand-button" title="Expand Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-copy-button" title="Copy"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-popup-button" title="Open Code In New Window"><div class="crayon-button-icon"></div></div></div></div>
<div class="crayon-info" style="min-height: 18px !important; line-height: 18px !important;"></div>
<div class="crayon-plain-wrap"><textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly style="-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;">
Remove unnecessary columns
X = df.drop([‘id’, ‘label’], axis=1)
Scale the data
from sklearn.preprocessing import StandardScaler scaler = StandardScaler() X_scaled = scaler.fit_transform(X)
1 2 3 4 5 6 7 8 | # Remove unnecessary columns X = df.drop(['id', 'label'], axis=1) # Scale the data from sklearn.preprocessing import StandardScaler scaler = StandardScaler() X_scaled = scaler.fit_transform(X) |
Once the data is prepared, we can apply the KMeans clustering algorithm to the data. In this example, we will use KMeans to cluster the data into three groups:
<div id="crayon-69432b71e7632093995868" class="crayon-syntax crayon-theme-powershell crayon-font-monaco crayon-os-pc print-yes notranslate" data-settings=" minimize scroll-mouseover" style=" margin-top: 12px; margin-bottom: 12px; font-size: 12px !important; line-height: 15px !important;">
<div class="crayon-toolbar" data-settings=" mouseover overlay hide delay" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><span class="crayon-title"></span>
<div class="crayon-tools" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><div class="crayon-button crayon-nums-button" title="Toggle Line Numbers"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-plain-button" title="Toggle Plain Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-wrap-button" title="Toggle Line Wrap"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-expand-button" title="Expand Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-copy-button" title="Copy"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-popup-button" title="Open Code In New Window"><div class="crayon-button-icon"></div></div></div></div>
<div class="crayon-info" style="min-height: 18px !important; line-height: 18px !important;"></div>
<div class="crayon-plain-wrap"><textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly style="-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;">
Apply KMeans clustering algorithm
kmeans = KMeans(n_clusters=3, random_state=42) kmeans.fit(X_scaled)
Get the cluster labels
cluster_labels = kmeans.labels_
1 2 3 4 5 6 7 | # Apply KMeans clustering algorithm kmeans = KMeans(n_clusters=3, random_state=42) kmeans.fit(X_scaled) # Get the cluster labels cluster_labels = kmeans.labels_ |
Finally, we can visualize the clusters using a scatter plot. In this example, we will plot the first two principal components of the data:
<div id="crayon-69432b71e7633591202240" class="crayon-syntax crayon-theme-powershell crayon-font-monaco crayon-os-pc print-yes notranslate" data-settings=" minimize scroll-mouseover" style=" margin-top: 12px; margin-bottom: 12px; font-size: 12px !important; line-height: 15px !important;">
<div class="crayon-toolbar" data-settings=" mouseover overlay hide delay" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><span class="crayon-title"></span>
<div class="crayon-tools" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><div class="crayon-button crayon-nums-button" title="Toggle Line Numbers"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-plain-button" title="Toggle Plain Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-wrap-button" title="Toggle Line Wrap"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-expand-button" title="Expand Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-copy-button" title="Copy"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-popup-button" title="Open Code In New Window"><div class="crayon-button-icon"></div></div></div></div>
<div class="crayon-info" style="min-height: 18px !important; line-height: 18px !important;"></div>
<div class="crayon-plain-wrap"><textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly style="-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;">
Visualize the clusters using a scatter plot
import matplotlib.pyplot as plt from sklearn.decomposition import PCA
pca = PCA(n_components=2) X_pca = pca.fit_transform(X_scaled)
plt.scatter(X_pca[:,0], X_pca[:,1], c=cluster_labels) plt.xlabel(‘PC1’) plt.ylabel(‘PC2’) plt.show()
1 2 3 4 5 6 7 8 9 10 11 12 | # Visualize the clusters using a scatter plot import matplotlib.pyplot as plt from sklearn.decomposition import PCA pca = PCA(n_components=2) X_pca = pca.fit_transform(X_scaled) plt.scatter(X_pca[:,0], X_pca[:,1], c=cluster_labels) plt.xlabel('PC1') plt.ylabel('PC2') plt.show() |
Here’s the complete code:
<div id="crayon-69432b71e7636282947404" class="crayon-syntax crayon-theme-powershell crayon-font-monaco crayon-os-pc print-yes notranslate" data-settings=" minimize scroll-mouseover" style=" margin-top: 12px; margin-bottom: 12px; font-size: 12px !important; line-height: 15px !important;">
<div class="crayon-toolbar" data-settings=" mouseover overlay hide delay" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><span class="crayon-title"></span>
<div class="crayon-tools" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><div class="crayon-button crayon-nums-button" title="Toggle Line Numbers"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-plain-button" title="Toggle Plain Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-wrap-button" title="Toggle Line Wrap"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-expand-button" title="Expand Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-copy-button" title="Copy"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-popup-button" title="Open Code In New Window"><div class="crayon-button-icon"></div></div></div></div>
<div class="crayon-info" style="min-height: 18px !important; line-height: 18px !important;"></div>
<div class="crayon-plain-wrap"><textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly style="-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;">
import pandas as pd from sklearn.cluster import KMeans from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA import matplotlib.pyplot as plt
Load the dataset
df = pd.read_csv(‘my_dataset.csv’)
Remove unnecessary columns
X = df.drop([‘id’, ‘label’], axis=1)
Scale the data
scaler = StandardScaler() X_scaled = scaler.fit_transform(X)
Apply KMeans clustering algorithm
kmeans = KMeans(n_clusters=3, random_state=42) kmeans.fit(X_scaled)
Get the cluster labels
cluster_labels = kmeans.labels_
Visualize the clusters using a scatter plot
pca = PCA(n_components=2) X_pca = pca.fit_transform(X_scaled)
plt.scatter(X_pca[:,0], X_pca[:,1], c=cluster_labels) plt.xlabel(‘PC1’) plt.ylabel(‘PC2’) plt.show()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | import pandas as pd from sklearn.cluster import KMeans from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA import matplotlib.pyplot as plt # Load the dataset df = pd.read_csv('my_dataset.csv') # Remove unnecessary columns X = df.drop(['id', 'label'], axis=1) # Scale the data scaler = StandardScaler() X_scaled = scaler.fit_transform(X) # Apply KMeans clustering algorithm kmeans = KMeans(n_clusters=3, random_state=42) kmeans.fit(X_scaled) # Get the cluster labels cluster_labels = kmeans.labels_ # Visualize the clusters using a scatter plot pca = PCA(n_components=2) X_pca = pca.fit_transform(X_scaled) plt.scatter(X_pca[:,0], X_pca[:,1], c=cluster_labels) plt.xlabel('PC1') plt.ylabel('PC2') plt.show() |
This script demonstrates a simple example of unsupervised learning using KMeans clustering to group data into three clusters, and visualizing the clusters using a scatter plot. Of course, the exact techniques and methods used for unsupervised learning will depend on the specific dataset and problem at hand.
