본문 바로가기
Python

Enhance K-Means Clustering Performance with PCA! / K-평균 알고리즘에 PCA 활용하기!

by 재르미온느 2024. 5. 29.

While K-Means Clustering is a popular choice, it can sometimes struggle with high-dimensional datasets. PCA can be a valuable tool in such cases, as it helps us focus on the most important factors and improve the algorithm's efficiency.

 

before PCA processing

As can be seen in the figure above, K-Means Clustering wasn't able to effectively group data points in my dataset. To address this, I implemented Principal Component Analysis (PCA) as a preprocessing step.

 

PCA

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

#scaling!
scaler=StandardScaler()
dataPCA=data.drop('Classes  ',axis=1) #your dataframe
data_scaled=scaler.fit_transform(dataPCA)

#PCA
pca=PCA(n_components=2)
PCA=pca.fit_transform(data_scaled)
df_pca=pd.DataFrame(PCA)
x_pca= df_pca.values

 

 

After the PCA preprocessing

Finally it works! PCA made it!