While K-Means Clustering is a popular choice, it can sometimes struggle with high-dimensional datasets. PCA can be a valuable tool in such cases, as it helps us focus on the most important factors and improve the algorithm's efficiency.
As can be seen in the figure above, K-Means Clustering wasn't able to effectively group data points in my dataset. To address this, I implemented Principal Component Analysis (PCA) as a preprocessing step.
PCA
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
#scaling!
scaler=StandardScaler()
dataPCA=data.drop('Classes ',axis=1) #your dataframe
data_scaled=scaler.fit_transform(dataPCA)
#PCA
pca=PCA(n_components=2)
PCA=pca.fit_transform(data_scaled)
df_pca=pd.DataFrame(PCA)
x_pca= df_pca.values
After the PCA preprocessing
Finally it works! PCA made it!
'Python' 카테고리의 다른 글
How do I represent 'else: pass' in a ternary expression?/ 'else: pass'문 삼항연산식으로 쓰기 (0) | 2024.03.27 |
---|