A Model for Improving Classifier Accuracy using Outlier Analysis

Authors

  • Lakshmi Sreenivasa Reddy.D Rise Gandhi Group of institutions, Ongole
  • Dr B. Raveendrababu VNR VJIET, Hyderabad
  • Dr A. Govardhan JNTUH, Hyderabad

DOI:

https://doi.org/10.24297/ijct.v7i1.3480

Keywords:

Data Mining, Outlier detection, BAD Score, NAVF, Fuzzy AVF

Abstract

Anomalies are those records, which have different behavior and do not comply with the remaining records in the dataset. Outlier analysis is the concept to find anomalies in Datasets.  Detecting outliers efficiently is an important issue in many fields of science, medicine and technology. Many methods are available to detect anomalies in numerical datasets but a limited number of methods available for categorical datasets. In this work, a novel method to detect outliers in categorical data based on entropy is proposed. This algorithm finds anomalies based on each record score and has great intuitive appeal. These scores called BAD scores. This algorithm utilizes the frequency of each value in the dataset. Greedy method needs k- scans of dataset to find ‘k’ outliers where as the proposed method needs only one scan of dataset and it calculates BAD score of each record directly. It avoids the problem of giving ‘k’ as an input and can find any number of outliers based on our data set directly.AVF method has less time complexity when compared with the other methods like Greedy, FPOF and FDOD. Greedy has good accuracy when compared with other methods like AVF and FPOF, FDOD (which are based on frequency patterns of all combinations of values in each record). Our algorithm shows better results in accuracy than AVF algorithm and Greedy. But this method has reached nearest to AVF in time complexity.  This algorithm has been applied on Nursery dataset and Bank dataset taken from “UCI Machine Learning Repositoryâ€. In this work, it is proposed to extend Normal distribution [11], and Fuzzy concept [12] to BAD score [13] that is NAVF combined with Fuzzy AVF is applied to BAD Score.  Numerical attributes are excluded from Datasets for our analysis. The experimental results show that it is efficient for outlier detection in categorical dataset.

Downloads

Download data is not yet available.

Author Biographies

Lakshmi Sreenivasa Reddy.D, Rise Gandhi Group of institutions, Ongole

Department of CSE

Dr A. Govardhan, JNTUH, Hyderabad

Director of Evaluation

Downloads

Published

2013-05-21

How to Cite

Reddy.D, L. S., Raveendrababu, D. B., & Govardhan, D. A. (2013). A Model for Improving Classifier Accuracy using Outlier Analysis. INTERNATIONAL JOURNAL OF COMPUTERS &Amp; TECHNOLOGY, 7(1), 500–509. https://doi.org/10.24297/ijct.v7i1.3480

Issue

Section

Research Articles