Recent Approaches of Partitioning a Set into Overlapping Clusters, Distance Metrics and Evaluation Measures
This paper reviews recently proposed overlapping co-clustering approaches and related evaluation measures. An overlap captures multiple views of the partitions in data set, hence is more expressive than traditional flat partitioning approaches. We present a graph-theoretic formulation of co-clustering which allows nodes to possess multiple memberships and hence finds usage in diverse applications like text mining, web mining, collaborative filtering and community detection. We also study proposed quality measures specifically adjusted to overlapping scenarios.particular subject.
J. Chen, O. R. Zaiane, and R. Goebel, “Detecting Communities in Large Networks by Iterative Local Expansion,” in International Conference on Computational Aspects of Social Networks, 2009. CASON ’09, 2009, pp. 105–112.
X. Wang, L. Tang, H. Gao, and H. Liu, “Discovering Overlapping Groups in Social Media,” in 2010 IEEE International Conference on Data Mining, 2010, pp. 569–578.
V. Crescenzi, P. Merialdo, and P. Missier, “Clustering Web pages based on their structure,” Data & Knowledge
L. Yi, B. Liu, and X. Li, “Eliminating Noisy Information in Web Pages for Data Mining,” in Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2003, pp. 296–305.
Y. Cheng and G. Church, “Biclustering of expression data,” Proc Eighth Int Conf Intell Syst Mol Biol, vol. 8, pp. 93–103, Dec. 1999.
J. Yang, H. Wang, W. Wang, and P. Yu, “Enhanced biclustering on expression data,” in Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings, 2003, pp. 321–327.
S. Har-Peled, P. Indyk, and R. Motwani, Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality. 2012.
Rajaraman and J. D. Ullman, Mining of Massive Datasets. New York, NY, USA: Cambridge University Press, 2011, p. 87.
Shrivastava, “Exact Weighted Minwise Hashing in Constant Time,” arXiv:1602.08393 [cs], Feb. 2016.
Gionis, P. Indyk, and R. Motwani, “Similarity Search in High Dimensions via Hashing,” in Proceedings of the 25th International Conference on Very Large Data Bases, San Francisco, CA, USA, 1999, pp. 518–529.
F. O. D. Franca, “Scalable Overlapping Co-clustering of Word-Document Data,” in Eleventh International Conference on Machine Learning and Applications (ICMLA), 2012, vol. 1, pp. 464–467.
S. Dhillon, S. Mallela, and D. S. Modha, “Information-theoretic Co-clustering,” in Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2003, pp. 89–98.
F. O. de França, “A Hash-based Co-Clustering Algorithm for Categorical Data,” arXiv:1407.7753 [cs], Jul. 2014.
T. George and S. Merugu, “A scalable collaborative filtering framework based on co-clustering,” in Fifth IEEE International Conference on Data Mining (ICDM’05), 2005, p. 4 pp.-pp.
G. Karypis, “METIS and ParMETIS,” in Encyclopedia of Parallel Computing, D. Padua, Ed. Springer US, 2011, pp. 1117–1124.
W. Lin, Y. Zhao, P. S. Yu, and B. Deng, “An Effective Approach on Overlapping Structures Discovery for Co-clustering,” in Web Technologies and Applications, L. Chen, Y. Jia, T. Sellis, and G. Liu, Eds. Springer International Publishing, 2014, pp. 56–67.
D. Chakrabarti, S. Papadimitriou, D. S. Modha, and C. Faloutsos, “Fully Automatic Cross-associations,” in Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2004, pp. 79–88.
Long, Z. (Mark) Zhang, and P. S. Yu, “Co-clustering by Block Value Decomposition,” in Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, New York, NY, USA, 2005, pp. 635–640.
S. Andrews, “In-Close2, a high performance formal concept miner,” in Conceptual Structures for Discovering Knowledge : 19th International Conference on Conceptual Structures, ICCS 2011, Derby, UK, July 25-29, 2011. Proceedings, S. Andrews, S. Polovina, R. Hill, and B. Akhgar, Eds. Derby: Springer, 2011, pp. 50–62.
R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, “Trawling the Web for emerging cyber-communities,” Computer Networks, vol. 31, no. 11–16, pp. 1481–1493, May 1999.
L. N. F. Ana and A. K. Jain, “Robust data clustering,” in 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings, 2003, vol. 2, p. II-128-II-133 vol.2.
C.j. Van Rijsbergen, “Foundation of evaluation,” Journal of Documentation, vol. 30, no. 4, pp. 365–373, Apr. 1974.
Lancichinetti, S. Fortunato, and J. Kertész, “Detecting the overlapping and hierarchical community structure in complex networks,” New J. Phys., vol. 11, no. 3, p. 33015, 2009.
Banerjee, C. Krumpelman, J. Ghosh, S. Basu, and R. J. Mooney, “Model-based Overlapping Clustering,” in Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, New York, NY, USA, 2005, pp. 532–537.
S. Gregory, “A Fast Algorithm to Find Overlapping Communities in Networks,” in Machine Learning and Knowledge Discovery in Databases, W. Daelemans, B. Goethals, and K. Morik, Eds. Springer Berlin Heidelberg, 2008, pp. 408–423.
Copyright (c) 2019 Gursimran Pal, Sahil Kakkar
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided that the original work is properly cited.