Optimal clustering approach using silhouette criterion and combined auto encoder k-medoids models

Research

Title	Optimal clustering approach using silhouette criterion and combined auto encoder k-medoids models
Type	Thesis
Keywords	Cluster Analysis, Method, Unsupervised Learning, Silhouette Coefficient, Autoencoder, k-Medoids, Optimal Number of Clusters, Large Datasets, Data Classification.
Year	2024
Researchers	maryam Amiri(PrimaryAdvisor)، Ali Rabeea Qasim Alhussein(Student)

Abstract

Clustering is vital in unsupervised learning, which helps identify patterns in the data set that has been given. Existing methods for the number of clusters do not work optimally in establishing the actual number of clusters and are also inefficient in the initial placement of the cluster centers, especially when the data has numerous features. This thesis presents a pioneering silhouette-criterion incorporated autoencoder k-medoids clustering method. The method begin stressing the fact that the data set is preprocessed with Min- Max normalization to guarantee balanced contribution of the features. Then a criterion called the silhouette criterion goes ahead and assesses different clustering scenarios to come up with the best clusters in its own opinion. To enhance medoid initialization, an autoencoder reconstructs the transposed set and outlines the movement positions that appropriately characterize the space. These positions improve the medoids since they start the clustering, improving its accuracy. The method was validated on four benchmark datasets: In the internal cluster validation, silhouette coefficient gained values of 0.8384, 0.7340, 0.7871, and 0.7423 when clustering Iris, Wine, Breast Cancer Wisconsin and MNIST, respectively. It is also confirmed that the use of the proposed method results in a high accuracy of cluster determination together with substantial enhancement of the clustering quality, particularly in high-dimensionality. This approach gives a great opportunity in data analysis and pattern recognition for a number of fields.

maryam Amiri

Research

Abstract