#Clustering #algorithms have
emerged as an alternative powerful meta-learning tool to accurately analyze the
massive volume of data generated by modern applications. In particular, their
main goal is to categorize data into clusters such that objects are grouped in
the same cluster when they are similar according to specific metrics. There is
a vast body of knowledge in the area of clustering and there has been attempts
to analyze and categorize them for a larger number of applications. However,
one of the major issues in using clustering algorithms for big data that causes
confusion amongst practitioners is the lack of consensus in the definition of
their properties as well as a lack of formal categorization. With the intention
of alleviating these problems, this paper introduces concepts and algorithms
related to clustering, a concise survey of existing (clustering) #algorithms as
well as providing a comparison, both from a theoretical and an empirical
perspective. From a theoretical perspective, we developed a categorizing
framework based on the main properties pointed out in previous studies.
Empirically, we conducted extensive experiments where we compared the most
representative #algorithm from each of the categories using a large number of
real (big) data sets. The effectiveness of the candidate clustering #algorithms
is measured through a number of internal and external validity metrics,
stability, runtime, and scalability tests. In addition, we highlighted the set
of clustering algorithms that are the best performing for #big_data.
No comments:
Post a Comment