Preferred Language
Articles
/
joe-3680
Regularized K-Means Clustering via Fully Corrective Frank-Wolfe Optimization
...Show More Authors

Clustering high-dimensional data remains challenging because traditional k-means is sensitive to noise, outliers, and high dimensionality, often leading to unstable performance. The research presents a robust clustering system which combines the Fully Corrective Frank-Wolfe (FCFW) algorithm with k-means objective that uses Frobenius norm regularization. The addition of Frobenius norm regularization in the model produces more stable clusters while preventing overfitting and promoting cluster compactness. The proposed method uses probabilistic cluster assignments to enable each data point to join multiple clusters at different membership levels, thus supporting clusters with overlapping boundaries. The Kruskal-Wallis test functions as a feature selection method to identify crucial genes, which then guide the clustering operation toward important features in high-dimensional datasets. The FCFW-regularized k-means outperforms traditional k-means in all experiments performed on synthetic and real gene expression datasets. On a breast cancer gene expression dataset (GSE10797), it achieved an Accuracy of 89.39%, compared to 58% for traditional -means. Moreover, it surpassed a recent deep subspace clustering method (scPEDSSC) in Adjusted Rand Index by 8.3% on the Goolam single-cell dataset (0.968 vs. 0.885) and 7.2% on the Deng dataset (0.801 vs. 0.729). Overall, the proposed approach attained the highest ARI and Normalized Mutual Information (NMI) scores across five benchmark datasets. These results confirm that the FCFW-regularized -means yields more accurate and stable clustering results, demonstrating robust performance on high-dimensional data.

View Publication Preview PDF
Quick Preview PDF