Explearn

Brief Tutorial On Unsupervised Learning

May 1, 2023

Here’s a brief tutorial for unsupervised learning using Python and scikit-learn library:

First, we need to import the necessary libraries and load the dataset:

    <div id="crayon-69432b71e762b094921443" class="crayon-syntax crayon-theme-powershell crayon-font-monaco crayon-os-pc print-yes notranslate" data-settings=" minimize scroll-mouseover" style=" margin-top: 12px; margin-bottom: 12px; font-size: 12px !important; line-height: 15px !important;">
	
        <div class="crayon-toolbar" data-settings=" mouseover overlay hide delay" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><span class="crayon-title"></span>
        <div class="crayon-tools" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><div class="crayon-button crayon-nums-button" title="Toggle Line Numbers"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-plain-button" title="Toggle Plain Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-wrap-button" title="Toggle Line Wrap"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-expand-button" title="Expand Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-copy-button" title="Copy"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-popup-button" title="Open Code In New Window"><div class="crayon-button-icon"></div></div></div></div>
        <div class="crayon-info" style="min-height: 18px !important; line-height: 18px !important;"></div>
        <div class="crayon-plain-wrap"><textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly style="-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;">

import pandas as pd from sklearn.cluster import KMeans

Load the dataset

df = pd.read_csv(‘my_dataset.csv’)

1
2
3
4
5
6
import pandas as pd
from sklearn.cluster import KMeans
 
# Load the dataset
df = pd.read_csv('my_dataset.csv')
 

Next, we need to prepare the data for clustering. This involves removing any unnecessary columns and scaling the data so that all features are on the same scale:

    <div id="crayon-69432b71e7630006772277" class="crayon-syntax crayon-theme-powershell crayon-font-monaco crayon-os-pc print-yes notranslate" data-settings=" minimize scroll-mouseover" style=" margin-top: 12px; margin-bottom: 12px; font-size: 12px !important; line-height: 15px !important;">
	
        <div class="crayon-toolbar" data-settings=" mouseover overlay hide delay" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><span class="crayon-title"></span>
        <div class="crayon-tools" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><div class="crayon-button crayon-nums-button" title="Toggle Line Numbers"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-plain-button" title="Toggle Plain Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-wrap-button" title="Toggle Line Wrap"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-expand-button" title="Expand Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-copy-button" title="Copy"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-popup-button" title="Open Code In New Window"><div class="crayon-button-icon"></div></div></div></div>
        <div class="crayon-info" style="min-height: 18px !important; line-height: 18px !important;"></div>
        <div class="crayon-plain-wrap"><textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly style="-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;">

Remove unnecessary columns

X = df.drop(&#91;‘id’, ‘label’], axis=1)

Scale the data

from sklearn.preprocessing import StandardScaler scaler = StandardScaler() X_scaled = scaler.fit_transform(X)

1
2
3
4
5
6
7
8
# Remove unnecessary columns
X = df.drop(&#91;'id', 'label'], axis=1)
 
# Scale the data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
 

Once the data is prepared, we can apply the KMeans clustering algorithm to the data. In this example, we will use KMeans to cluster the data into three groups:

    <div id="crayon-69432b71e7632093995868" class="crayon-syntax crayon-theme-powershell crayon-font-monaco crayon-os-pc print-yes notranslate" data-settings=" minimize scroll-mouseover" style=" margin-top: 12px; margin-bottom: 12px; font-size: 12px !important; line-height: 15px !important;">
	
        <div class="crayon-toolbar" data-settings=" mouseover overlay hide delay" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><span class="crayon-title"></span>
        <div class="crayon-tools" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><div class="crayon-button crayon-nums-button" title="Toggle Line Numbers"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-plain-button" title="Toggle Plain Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-wrap-button" title="Toggle Line Wrap"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-expand-button" title="Expand Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-copy-button" title="Copy"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-popup-button" title="Open Code In New Window"><div class="crayon-button-icon"></div></div></div></div>
        <div class="crayon-info" style="min-height: 18px !important; line-height: 18px !important;"></div>
        <div class="crayon-plain-wrap"><textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly style="-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;">

Apply KMeans clustering algorithm

kmeans = KMeans(n_clusters=3, random_state=42) kmeans.fit(X_scaled)

Get the cluster labels

cluster_labels = kmeans.labels_

1
2
3
4
5
6
7
# Apply KMeans clustering algorithm
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X_scaled)
 
# Get the cluster labels
cluster_labels = kmeans.labels_
 

Finally, we can visualize the clusters using a scatter plot. In this example, we will plot the first two principal components of the data:

    <div id="crayon-69432b71e7633591202240" class="crayon-syntax crayon-theme-powershell crayon-font-monaco crayon-os-pc print-yes notranslate" data-settings=" minimize scroll-mouseover" style=" margin-top: 12px; margin-bottom: 12px; font-size: 12px !important; line-height: 15px !important;">
	
        <div class="crayon-toolbar" data-settings=" mouseover overlay hide delay" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><span class="crayon-title"></span>
        <div class="crayon-tools" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><div class="crayon-button crayon-nums-button" title="Toggle Line Numbers"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-plain-button" title="Toggle Plain Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-wrap-button" title="Toggle Line Wrap"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-expand-button" title="Expand Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-copy-button" title="Copy"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-popup-button" title="Open Code In New Window"><div class="crayon-button-icon"></div></div></div></div>
        <div class="crayon-info" style="min-height: 18px !important; line-height: 18px !important;"></div>
        <div class="crayon-plain-wrap"><textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly style="-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;">

Visualize the clusters using a scatter plot

import matplotlib.pyplot as plt from sklearn.decomposition import PCA

pca = PCA(n_components=2) X_pca = pca.fit_transform(X_scaled)

plt.scatter(X_pca&#91;:,0], X_pca&#91;:,1], c=cluster_labels) plt.xlabel(‘PC1’) plt.ylabel(‘PC2’) plt.show()

1
2
3
4
5
6
7
8
9
10
11
12
# Visualize the clusters using a scatter plot
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
 
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
 
plt.scatter(X_pca&#91;:,0], X_pca&#91;:,1], c=cluster_labels)
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.show()
 

Here’s the complete code:

    <div id="crayon-69432b71e7636282947404" class="crayon-syntax crayon-theme-powershell crayon-font-monaco crayon-os-pc print-yes notranslate" data-settings=" minimize scroll-mouseover" style=" margin-top: 12px; margin-bottom: 12px; font-size: 12px !important; line-height: 15px !important;">
	
        <div class="crayon-toolbar" data-settings=" mouseover overlay hide delay" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><span class="crayon-title"></span>
        <div class="crayon-tools" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><div class="crayon-button crayon-nums-button" title="Toggle Line Numbers"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-plain-button" title="Toggle Plain Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-wrap-button" title="Toggle Line Wrap"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-expand-button" title="Expand Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-copy-button" title="Copy"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-popup-button" title="Open Code In New Window"><div class="crayon-button-icon"></div></div></div></div>
        <div class="crayon-info" style="min-height: 18px !important; line-height: 18px !important;"></div>
        <div class="crayon-plain-wrap"><textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly style="-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;">

import pandas as pd from sklearn.cluster import KMeans from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA import matplotlib.pyplot as plt

Load the dataset

df = pd.read_csv(‘my_dataset.csv’)

Remove unnecessary columns

X = df.drop(&#91;‘id’, ‘label’], axis=1)

Scale the data

scaler = StandardScaler() X_scaled = scaler.fit_transform(X)

Apply KMeans clustering algorithm

kmeans = KMeans(n_clusters=3, random_state=42) kmeans.fit(X_scaled)

Get the cluster labels

cluster_labels = kmeans.labels_

Visualize the clusters using a scatter plot

pca = PCA(n_components=2) X_pca = pca.fit_transform(X_scaled)

plt.scatter(X_pca&#91;:,0], X_pca&#91;:,1], c=cluster_labels) plt.xlabel(‘PC1’) plt.ylabel(‘PC2’) plt.show()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
 
# Load the dataset
df = pd.read_csv('my_dataset.csv')
 
# Remove unnecessary columns
X = df.drop(&#91;'id', 'label'], axis=1)
 
# Scale the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
 
# Apply KMeans clustering algorithm
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X_scaled)
 
# Get the cluster labels
cluster_labels = kmeans.labels_
 
# Visualize the clusters using a scatter plot
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
 
plt.scatter(X_pca&#91;:,0], X_pca&#91;:,1], c=cluster_labels)
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.show()
 

This script demonstrates a simple example of unsupervised learning using KMeans clustering to group data into three clusters, and visualizing the clusters using a scatter plot. Of course, the exact techniques and methods used for unsupervised learning will depend on the specific dataset and problem at hand.