Figuring out Extinct Values of Yeast Gene Microarray Expression (YGME) and Influencing Successive Time for Hierarchical Clustering Technique – An Improvement

( Vol-5,Issue-12,December 2018 ) OPEN ACCESS

Akey Sungheetha, Rajesh Sharma R


Cluster, Yeast data, Hierarchical clustering, k means clustering, filtering data.


The numerous missing value computation approaches for yeast data have been suggested in the literature. Throughout the past few years, investigators are keen on driving a lot of research effort on giving methodical assessments of the dissimilar computation procedures. The problem of controlling the missing values are designed with samples of tough microorganisms, such as yeast. Expensive strategies are present which has targeted to develop a varied collection of samples. They are regularly in effect for concurrently disturbing various small samples, but are greatly lesser effective for larger samples. The manufactured devices highlight interference rates after these minor samples having 5% of cells interrupted in 2 to 38 seconds range, frequently ignoring to indicate the organism interrupted or the small sample size. At the outset, maximum procedures continued to be evaluated by means of highlighting on the accuracy of the computation, using metrics such as the Correlation (uncentered), Correlation (centered), Absolute correlation (uncentered), Absolute correlation (centered), Spearman Rank correlation, Kendall’s tau, Euclidean distance and City block distance. This proves the best clustering range. In the proposed approach running time is also computed for the various used methods using the same above mentioned metrics. On the other hand, it has turn out to be strong that the attainment of the accuracy and running time of the whole yeast gene data had a better assessment in further applied relations by way of hierarchical clustering approach. Accuracy and running time are sorted out for both large and small samples once after computing the missing values. Running times of the different clustering methods in a yeast dataset are existing in the work for the missing value rate of 4%. The hierarchical clustering was the fastest among the specified clustering methods (K-Means (gene) clustering technique, Self-Organized Mapping and Principle Component Analysis). However, the SOM was still about 10 times faster than k means. The running time of the original hierarchical method was about one third for that of its proposed version.

ijaers doi crossref DOI:


Paper Statistics:
  • Total View : 114
  • Downloads : 12
  • Page No: 300-308
Cite this Article:
Click here to get all Styles of Citation using DOI of the article.

[1] Rajesh Sharma R, Akey Sungheetha, Dual Tree Complex Wavelet Transform, Probabilistic Neural Network and Fuzzy Clustering based on Medical Images Classification – A Study, International Journal of Advanced Engineering, Management and Science, vol. 4, no. 12, pp. 793-799 (2018),
[2] Tuikkala J, Elo L, Nevalainen OS, Aittokallio T: Improving missing value estimation in microarray data with gene ontology. Bioinformatics 2006, 22(5):566–572.
[3] Sharma, R. Rajesh, and P. Marikkannu. "Hybrid RGSA and support vector machine framework for three-dimensional magnetic resonance brain tumor classification." ScientificWorldJournal 2015 (2015): 184350.
[4] Sungheetha, Akey, and J. Suganthi. "An efficient clustering-classification method in an information gain NRGA-KNN algorithm for feature election of micro array data." Life Sci J 10.Suppl 7 (2013): 691-700.
[5] Sharma, Rajesh, and Akey Sungheetha. "Segmentation and classification techniques of medical images using innovated hybridized techniques—a study." Intelligent Systems and Control (ISCO), 2017 11th International Conference on. IEEE, 2017.
[6] Sungheetha, Akey, and R. Rajesh Sharma. "Extreme Learning Machine and Fuzzy K-Nearest Neighbour Based Hybrid Gene Selection Technique for Cancer Classification." Journal of Medical Imaging and Health Informatics 6.7 (2016): 1652-1656.
[7] Beaula, A. Rajesh Sharma R., et al. "Comparative study of distinctive image classification techniques." Intelligent Systems and Control (ISCO), 2016 10th International Conference on. IEEE, 2016.
[8] J. Suganthi Akey Sungheetha, “Energy Saving Optimized Polymorphic Hybrid Multicast Routing Protocol.” International Review on Computers and Softwares, Vol.8, No.6, pp. 1367 – 1373.
[9] Sungheetha, A., Mssujitha, R., Arthi, V., Sharma, R.R. 2017, Data analysis of multiobjective density based spatial clustering schemes in gene selection process for cancer diagnosis, Proceedings of 2017 4th International Conference on Electronics and Communication Systems, ICECS 2017.
[10] Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JG, Sabet H, Tran T, Yu X, Powell JI, Yang LM, Marti GE, Moore T, Hudson J, Lu LS, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 2000, 403(6769):503–611.
[11] Schafer JL, Graham JW: Missing data: our view of the state of the art. Psychol Methods 2002, 2(7):147–177.
[12] Little RJA, Rubin DB: Statistical analysis with missing data. New York: John. Wiley & Sons; 1987.
[13] Meneghini MD, Wu M, Madhani HD: Conserved Histone Variant H2A.Z Protects Euchromatin from the Ectopic Spread of Silent Heterochromatin. Cell 2003, 112: 725–736.
[14] Kobor MS, Venkatasubrahmanyam S, Meneghini MD, Gin JW, Jennings JL, Link AJ, Madhani HD, Rine J: A Protein Complex Containing the Conserved Swi2/Snf2-Related ATPase Swr1p Deposits Histone Variant H2A.Z into Euchromatin. PLoS Biol 2004.
[15] Yuan GC, Ma P, Zhong WX, Liu JS: Statistical assessment of the global regulatory role of histone acetylation in Saccharomyces cerevisiae. Genome Biol 2006, 7: 8.
[16] Yuan GCLY, Dion MF, Slack MD, Wu LF, Altschuler SJ, Rando OJ: Genome-scale identification of nucleosome positions in S. cerevisiae. Science 2005, 309: 626–630.
[17] Schubeler D, MacAlpine DM, Scalzo D, Wirbelauer C, Kooperberg C, van Leeuwen F, Gottschling DE, O'Neill LP, Turner BM, Delrow J, Bell SP, Groudine M: The histone modification pattern of active genes revealed through genome-wide chromatin analysis of a higher eukaryote. Genes Dev 2004, 18(11):1263–1271.
[18] Pokholok DK, Harbison CT, Levine S, Cole M, Hannett NM, Lee TI, Bell GW, Walker K, Rolfe PA, Herbolsheimer E, Zeitlinger J, Lewitter F, Gifford DK, Young RA: Genome-wide map of nucleosorne acetylation and methylation in yeast. Cell 2005, 122(4):517–527.
[19] Rando OJ: Global patterns of histone modifications. Curr Opin Genet Dev 2007, 17: 94–99.
[20] Rajesh Sharma R, P. Marikkannu, Akey Sungheetha, “Three-Dimensional MRI Brain Tumor Classification using Hybrid Ant Colony Optimization and Gray Wolf Optimizer.” International Journal of Biomedical Engineering and Technology, vol. 29, no. 1, pp. 34-45 (2019).
[21] Bandyopadhyay S, Mukhopadhyay A, Maulik U, (2007), An improved algorithm for clustering gene expression data. Bioinformatics, vl. 23(21), pp. 2859-2865.
[22] Rao A, (2002), A clustering algorithm for gene expression data using wavelet packet decomposition, Systems and Computers, Conference Record of the Thirty-Sixth Asilomar Conference on IEEE,Vol. 1, pp. 316-319.
[23] Tseng G,(2004), A comparative review of gene clustering in expression profile,Automation, Robotics and Vision Conference, ICARCV 8th IEEE,Vol. 2, pp. 1320-1324.
[24] Chow C, K Zhu, H Lacy, J Lingen, M W, Kuo,(2009), A cooperative feature gene extraction algorithm that combines classification and clustering, In Bioinformatics and Biomedicine Workshop, BIBMW International Conference on IEEE,pp. 197-202.
[25] Dutta, Dipankar, Pranab Dutta, and Jaya Sil. "Data clustering with mixed features by multi objective genetic algorithm." Hybrid Intelligent Systems (HIS), 2012 12th International Conference on. IEEE, 2012.
[26] Choudhury N, Sarmah R, & Sarma S, (2012), A modified QT-clustering algorithm over Gene Expression data, In Recent Advances in Information Technology (RAIT),1st International Conference on IEEE,pp. 542-547.
[27] Eisen M B, Spellman P T, Brown P O, & Botstein D , (1998), Cluster analysis and display of genome-wide expression patterns, Proceedings of the National Academy of Sciences, 95(25), pp. 14863-14868.
[28] Sharma, Rajesh, P. S. Renisha, and Akey Sungheetha. 2016 "Comparative Study on Medical Image Classification Techniques." International Journal of Advanced Engineering, Management and Science 2.11.
[29] Sharma, Rajesh, et al. 2016 "Effective Disaster Management by Efficient Usage of Resources." International Journal of Advanced Engineering, Management and Science 2.12.
[30] Tamayo P,Slonim D, Mesirov J,Zhu Q,Kitareewan S, Dmitrovsky E,Golub T R, (1999), Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation, Proceedings of the National Academy of Sciences, vol. 96(6), pp. 2907-2912.
[31] Miroslav Marinov, Keila Pena-Hernandez, Rajitha Gopidi, Jia-Fu Chang, Lei Hua ,(2004), An Introduction to Cluster Analysis for Data Mining, Journal of Medical Systems, Springer,vol. 36, pp. 2431-2448.
[32] Richard C Dubes, Anil K Jain, (1988), Algorithms for Clustering Data, Prentice Hall,pp.320.
[33] Estivill-Castro V,Yang J, (2000), Fast and robust general purpose clustering algorithms. In PRICAI Topics in Artificial Intelligence , Springer Berlin Heidelberg ,pp. 208-218.
[34] Fraley C, Raftery A E, (1998), How many clusters? Which clustering method? Answers via model-based cluster analysis, The computer journal,vol. 41(8), pp. 578-588.