2025 : 4 : 12
maryam Amiri

maryam Amiri

Academic rank: Assistant Professor
ORCID: https://orcid.org/0000-0002-7411-9552
Education: PhD.
ScopusId: 57146848900
HIndex:
Faculty: Engineering
Address:
Phone: 32625522

Research

Title
Optimal clustering approach using silhouette criterion and combined auto encoder k-medoids models
Type
Thesis
Keywords
Cluster Analysis, Method, Unsupervised Learning, Silhouette Coefficient, Autoencoder, k-Medoids, Optimal Number of Clusters, Large Datasets, Data Classification.
Year
2024
Researchers maryam Amiri(PrimaryAdvisor)، Ali Rabeea Qasim Alhussein(Student)

Abstract

Clustering is vital in unsupervised learning, which helps identify patterns in the data set that has been given. Existing methods for the number of clusters do not work optimally in establishing the actual number of clusters and are also inefficient in the initial placement of the cluster centers, especially when the data has numerous features. This thesis presents a pioneering silhouette-criterion incorporated autoencoder k-medoids clustering method. The method begin stressing the fact that the data set is preprocessed with Min- Max normalization to guarantee balanced contribution of the features. Then a criterion called the silhouette criterion goes ahead and assesses different clustering scenarios to come up with the best clusters in its own opinion. To enhance medoid initialization, an autoencoder reconstructs the transposed set and outlines the movement positions that appropriately characterize the space. These positions improve the medoids since they start the clustering, improving its accuracy. The method was validated on four benchmark datasets: In the internal cluster validation, silhouette coefficient gained values of 0.8384, 0.7340, 0.7871, and 0.7423 when clustering Iris, Wine, Breast Cancer Wisconsin and MNIST, respectively. It is also confirmed that the use of the proposed method results in a high accuracy of cluster determination together with substantial enhancement of the clustering quality, particularly in high-dimensionality. This approach gives a great opportunity in data analysis and pattern recognition for a number of fields.