Comparative Analysis on Dimension Reduction Algorithm of Principal Component Analysis and Singular Value Decomposition for Clustering

research
  • 05 Apr
  • 2023

Comparative Analysis on Dimension Reduction Algorithm of Principal Component Analysis and Singular Value Decomposition for Clustering

Clustering is a method of dividing datasets into several groups that have similarity or the same characteristics. High-dimensional Datasets will influence the effectiveness of the grouping process. This study compares two dimension reduction algorithms, namely Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) using K-Means clustering method to find out the best algorithm with the smallest Bouldin Davies Index evaluation. The dataset of this study involved public data from UCIMachine Learning which contains the number of weekly sales of a product. Data processing is performed by comparing the number of clusters from 3 to 10 and the dimension reduction from 2 to 10. From the data processing the RapidMiner tools, application with dimension reduction can provide better results than without dimension reduction. In particular, the PCA algorithm shows better results than the SVD, with which the best number of clusters is 5, and the number of dimensional reductions is 3 with a Bouldin Index of 0.376.

Unduhan

 

REFERENSI

  • [1]Ahmar A. S., Napitupulu D., Rahim R., Hidayat R., Sonatha Y. and Azmi M 2018 Using K-Means Clustering to Cluster Provinces in Indonesia Journal of Physics: Conference Series 1028
  • [2]Allab K., Labiod L. and Nadif M 2017 A Semi-NMF-PCA Unified Framework for Data Clustering IEEE Transactions on Knowledge and Data Engineering 29 2-16
  • [3]Dash P., Nayak M. and Prasad Das G 2014 Principal Component Analysis using Singular Value Decomposition for Image Compression International Journal of Computer Applications 93 21-27
  • [4]Jumadi Dehotman Sitompul B., Salim Sitompul O. and Sihombing P 2019 Enhancement Clustering Evaluation Result of Davies-Bouldin Index with Determining Initial Centroid of K-Means Algorithm Journal of Physics: Conference Series 1235
  • [5]Luo S., Chen T. and Jian L 2018 Using principal component analysis and least squares support vector machine to predict the silicon content in blast furnace system International Journal of Online Engineering 14 149-162
  • [6]Mohamed A. A 2019 An effective dimension reduction algorithm for clustering Arabic text Egyptian Informatics Journal 0-4
  • [7]Pérez-Ortega J., Almanza-Ortega N. N. and Romero D 2018 Balancing effort and benefit of K-means clustering algorithms in Big Data realms PLoS ONE 13 1-19
  • [8]Sastry S. H. and Babu P. M. S. P 2013 Implementation of CRISP Methodology for ERP Systems 2 203-217 Retrieved from http://arxiv.org/abs/1312.2065
  • [9]Swathi H. R., Sohini S., Surbhi and Gopichand G 2017 Image compression using singular value decomposition IOP Conference Series: Materials Science and Engineering 263 5-8
  • [10]Warren Liao T 2005 Clustering of time series data - A survey Pattern Recognition 38 1857-1874
  • [11]Widiyaningtyas T., Prabowo M. I. W. and Pratama M. A. M 2017 Implementation of k-means clustering method to distribution of high school teachers International Conference on Electrical Engineering, Computer Science and Informatics (EECSI) 19-21
  • [12]Zarzour H., Al-Sharif Z., Al-Ayyoub M. and Jararweh Y 2018 A new collaborative filtering recommendation algorithm based on dimensionality reduction and clustering techniques 9th International Conference on Information and Communication Systems, ICICS 2018 102-106 2018
  • [13]Zhang L., Marron J. S., Shen H. and Zhu Z. 2007 Singular value decomposition and its visualization Journal of Computational and Graphical Statistics 16 833-854