Please use this identifier to cite or link to this item: http://idr.iitp.ac.in:8080/jspui/handle/123456789/315
Title: Data Refinement : An Abstraction Based Approach
Authors: Koshley, D. K.
Keywords: Computer Science & Engineering
Issue Date: 2015
Abstract: In large information processing systems, dirty data may cause various problems during computations, as the users have limited knowledge about the underlying database structures and the data in it. Bad data may occur due to the poor database designer, typo-errors, entity resolution problem, ambiguity, etc. Data cleaning is known as a promising field of research aiming at cleaning the bad data and generating clean database instances. In this thesis, we apply abstraction-based approach to remedy such bottlenecks. Abstraction provides us single clean instances satisfying soundness criteria of query systems. To this aim, we combine (i)similarity-based classification approach and (ii) the Abstraction based approach representing each cluster with the property of interest. This proposal improves the efficiency and performance of the query systems w.r.t. the existing systems. Bertossi et al. proposed a data-cleaning technique based on matching dependences and matching functions, which is, in practice, intractable for some cases during the application of matching dependences in random orders. Moreover, the result of the application of a single matching dependence on a dirty database instance is a set of clean instances depending on the number of dirty tuples, which results a high computational overhead as well as large space requirement. The aim of this thesis is to propose an improvement of the Bertossi’s approach based on the Abstract Interpretation framework. This yields a single clean abstract database instance which is a sound approximation of all possible concrete clean instances. The convergence of the cleaning process can also be guaranteed by using widening operators in the abstract domain. The proposal improves significantly the efficiency and performance of the query systemsw:r:t: the Bertossi’s one.
URI: http://hdl.handle.net/123456789/315
Appears in Collections:01. CSE

Files in This Item:
File Description SizeFormat 
Data Refinement - An Abstraction Based Approach.pdf1.04 MBAdobe PDFView/Open    Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.