python - K-Means taking a long time

I'm using k-means for my project for the first time. my dataset has more than 400,000 rows and 11 columns, I run the k-means for k= 3, 5, 7, 9, and 10. it took more than 65 minutes and still no output. is that normal? it's my first time so I'm not sure what to expect

I'm using python, visual studio

sse = []
silhouette_scores = []
k_values = [3, 5, 7, 9]

for k in k_values:
    kmeans = KMeans(n_clusters=k, random_state=42, n_init=1, init='k-means++')
    kmeans.fit(x_pca)
    sse.append(kmeans.inertia_)
    silhouette_scores.append(silhouette_score(x_pca, kmeans.labels_))

# the elbow method
plt.figure(figsize=(10, 6))
plt.plot(k_values, sse, marker='o')
plt.title('Elbow Method for Optimal k')
plt.xlabel('Number of clusters (k)')
plt.ylabel('Sum of Squared Errors (SSE)')
plt.show()

# silhouette scores
plt.figure(figsize=(10, 6))
plt.plot(k_values, silhouette_scores, marker='o')
plt.title('Silhouette Score for Optimal k')
plt.xlabel('Number of clusters (k)')
plt.ylabel('Silhouette Score')
plt.show()

I'm using python, visual studio

sse = []
silhouette_scores = []
k_values = [3, 5, 7, 9]

for k in k_values:
    kmeans = KMeans(n_clusters=k, random_state=42, n_init=1, init='k-means++')
    kmeans.fit(x_pca)
    sse.append(kmeans.inertia_)
    silhouette_scores.append(silhouette_score(x_pca, kmeans.labels_))

# the elbow method
plt.figure(figsize=(10, 6))
plt.plot(k_values, sse, marker='o')
plt.title('Elbow Method for Optimal k')
plt.xlabel('Number of clusters (k)')
plt.ylabel('Sum of Squared Errors (SSE)')
plt.show()

# silhouette scores
plt.figure(figsize=(10, 6))
plt.plot(k_values, silhouette_scores, marker='o')
plt.title('Silhouette Score for Optimal k')
plt.xlabel('Number of clusters (k)')
plt.ylabel('Silhouette Score')
plt.show()

Share Improve this question edited Nov 20, 2024 at 21:47 asked Nov 20, 2024 at 20:46 Joud 73 bronze badges

Hello and welcome to Stack Overflow! Please take your time to read through How To Ask and edit your question to include a Minimal Reproducible Example of your code. Otherwise it's hard to say anything about the performance problems. – Teemu Risikko Commented Nov 20, 2024 at 21:05
Apart from that, when using pandas + sklearn, a clustering like that takes a few seconds at most for fake data. Even with more complex datatypes than some fake data generated with np.random, it should definitely not take that long. – Teemu Risikko Commented Nov 20, 2024 at 21:15

Add a comment |

1 Answer 1

Sorted by: Reset to default 2

Analysis

It's not your K-means that is slow, it's silhouette_score.

The time complexity of K-means is

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - K-Means taking a long time - Stack Overflow

1 Answer 1

Analysis

`与本文相关的文章`

`评论列表(0)`