Effects of Classification Techniques on Medical Reports Classification
DOI:
https://doi.org/10.24297/ijct.v13i2.2906Keywords:
Document classification, positive-class based learning, partially supervised classification, labelled and unlabeled data, medical text mining, and features reduction.Abstract
Text classification is the process of assigning pre-defined category labels to documents based on what a classifications has learned from training examples. This paper investigates the partially supervised classification approach in the medical field. The approaches that have been evaluated include Rocchio, Naïve Bayesian (NB), Spy, Support vector machine (SVM), and Expectation Maximization (EM). A combination of these methods has been conducted. The experimental result showed that the combination which uses EM in step 2 is always produces better results than those uses SVM using small set of training samples. We also found that reducing the features based on tf-tdf values is decreasing the classification performance dramatically. Moreover, reducing the features based on their frequencies improve the classification performance significantly while also increasing efficiency, but it may require some experimentationÂ