MapReduce based Classification for Microarray data using Parallel Genetic Algorithm
DOI:
https://doi.org/10.24297/jac.v12i15.2413Keywords:
MapReduce, Hadoop, Microarray, genes, mutual information, parallel attribute clustering, classificationAbstract
Inorder to uncover thousands of genes Microarray   produces high throughput is used. Only few gene expression data out of thousands of data is used for disease predication and also for disease classification in medical environment.  To find such initial coexpressed gene groups of clusters whose joint expression is strongly related with the class label A Supervised attribute clustering is used. By sharing the information between each attributes the Mutual Information uses the information of sample varieties to measure the similarity among the attributes. From this the redundant and irrelevant attributes are removed. After forming the clusters the PGA is used to find the optimal feature and is given as mapper function so as to improve the class separability. Using this method the diagnosis can be made easier and effective since its done parallelly. The predictive accuracy is estimated using all the three classifiers such as K-nearest neighbours including naive bayes and Support Vector machine. Thus the overall approach used reducer function which provides excellent predictive capability for accurate medical diagnosis.
Downloads
References
M. Dettling and P. Buhlmann, “Supervised Clustering of Genesâ€, Genome Biology, Vol.3, No. 12, pp.0069.1-0069.15,2002.
P. A. Devijver and J. Kittler, “Pattern Recognition: A Statistical Approachâ€, Prentice Hall,1982.
E. Domany, “Cluster Analysis of Gene Expression Dataâ€, J.Statistical Physics, Vol.110, Nos. 3-6, pp. 1117-1139, 2011.
T. R. Golub, D. K. Slonim, P. Tamayo and C. Huard, “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoringâ€, Science, Vol. 286, No. 5439, pp. 531-537, 1999.
D. Huang and T. W. S. Chow, “Effective Feature Selection Scheme Using Mutual Informationâ€, Neurocomputing, Vol.63, pp.325-343, 2004.
Lei Wang, “Feature Selection with Kernel Class Separabilityâ€, IEEE Trans.Pattern Analysis and Machine Intelligence, Vol. 30, No., 9, 2008.
J. Li, H. Su, H. Chen and B. W. Futscher, “Optimal Search-based Gene Subset Selection for Gene Array Cancer Classificationâ€, IEEE Trans. Biomedical Eng., Vol. 56, No .4, pp. 1063-1069, 2009.
Pradipta Maji, “Fuzzy-Rough Supervised Attribute Clustering Algorithm and Classification of Microarray Dataâ€, IEEE Trans. Cybernetics., Vol. 41, No.1, 2011.
Sheng-Bo Guu., Michael Lyu R. and Tat-Ming Lok, ‘Gene Selection Based on Mutual Information for the Classification of Multi-class Cancer’, Science, Vol 134, 2004.
Pradipta Maji., Mutual information based supervised Attribute clustering for microarray sample classification., IEEE transaction on Knowledge and data Engineering., Vol 24,No.1, Jan2012.
P.Aarthi, E.Gothai “Enhancing Sample Classification for Microarray datasets using Genetic Algorithmâ€, International Conference on Information Communication & Embedded Systems (ICICES 2014)
Apache. Org. Hadoop distributed file system. http://hadoop.apache.org.
Apache Hadoop, http://www.cloudera.com/hadoop/
Borthakur D. (2007), ‘The hadoop distributed file system: architecture and design’, Hadoop Project Website
P.Aarthi, E.Gothai, “Improving Class Separability for Microarray datasets using Genetic Algorithm
with KLD Measureâ€, International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 3, Issue 2, March 2014.
Downloads
Published
How to Cite
Issue
Section
License
All articles published in Journal of Advances in Linguistics are licensed under a Creative Commons Attribution 4.0 International License.