Automatic Threshold Selections by exploration and exploitation of optimization algorithm in Record Deduplication

Authors

  • K. Deepa Assistant Professor, Department of Electronics and Communication Engineering, M.Kumarasamy College of Engineering, Karur
  • C. Vivek Associate Professor, Department of Electronics and Communication Engineering, M.Kumarasamy College of Engineering, Karur
  • S.Palanivel Rajan Professor, Sri Ramakrishna Engineering College, Coimbatore, Tamilnadu, India

DOI:

https://doi.org/10.24297/jac.v12i11.820

Keywords:

GA, ModifiedABC, Similarity metrics, Cosine Similarity, Levenshtein Distance

Abstract

A deduplication process uses similarity function to identify the two entries are duplicate or not by setting the threshold.  This threshold setting is an important issue to achieve more accuracy and it relies more on human intervention. Swarm Intelligence algorithm such as PSO and ABC have been used for automatic detection of threshold to find the duplicate records. Though the algorithms performed well there is still an insufficiency regarding the solution search equation, which is used to generate new candidate solutions based on the information of previous solutions.  The proposed work addressed two problems: first to find the optimal equation using Genetic Algorithm(GA) and next it adopts an modified  Artificial Bee Colony (ABC) to get the optimal threshold to detect the duplicate records more accurately and also it reduces human intervention. CORA dataset is considered to analyze the proposed algorithm.

Downloads

Download data is not yet available.

Downloads

Published

2016-06-16

How to Cite

Deepa, K., Vivek, C., & Rajan, S. (2016). Automatic Threshold Selections by exploration and exploitation of optimization algorithm in Record Deduplication. JOURNAL OF ADVANCES IN CHEMISTRY, 12(11), 4515–4522. https://doi.org/10.24297/jac.v12i11.820

Issue

Section

Articles