Maintainability Evaluation of Object-Oriented Software System Using Clustering Techniques
DOI:
https://doi.org/10.24297/ijct.v5i2.3535Keywords:
Maintainability, Clustering, Software Metrics, Quality Models, K-Means, X-Means, Software Engineering, Data MiningAbstract
In today’s world data is produced every day at a phenomenal rate and we are required to store this ever growing data on almost daily basis. Even though our ability to store this huge data has grown but the problem lies when users expect sophisticated information from this data. This can be achieved by uncovering the hidden information from the raw data, which is the purpose of data mining. Data mining or knowledge discovery is the computer-assisted process of digging through and analyzing enormous set of data and then extracting the meaning out of it. The raw and unlabeled data present in large databases can be classified initially in an unsupervised manner by making use of cluster analysis. Clustering analysis is the process of finding the groups of objects such that the objects in a group will be similar to one another and dissimilar from the objects in other groups. These groups are known as clusters. In other words, clustering is the process of organizing the data objects in groups whose members have some similarity among them. Some of the applications of clustering are in marketing -finding group of customers with similar behavior, biology- classification of plants and animals given their features, data analysis, and earthquake study -observe earthquake epicenter to identify dangerous zones, WWW -document classification, etc. The results or outcome and efficiency of clustering process is generally identified though various clustering algorithms. The aim of this research paper is to compare two important clustering algorithms namely centroid based K-means and X-means. The performance of the algorithms is evaluated in different program execution on the same input dataset. The performance of these algorithms is analyzed and compared on the basis of quality of clustering outputs, number of iterations and cut-off factors.