Please use this identifier to cite or link to this item:
|Title:||Plagiarism Detection in Text using Vector Space Model|
Vector Space Model
|Abstract:||Plagiarism denotes the act of copying someone else's idea (or, works) and claiming it as his/her own. Plagiarism detection is the procedure to detect the texts of a given document which are plagiarized, i.e. copied from from some other documents. Potential challenges are due to the facts that plagiarists often obfuscate the copied texts; might shuffle, remove, insert, or replace words or short phrases; might also restructure the sentences replacing words with synonyms; and changing the order of appearances of words in a sentence. In this paper we propose a technique based on textual similarity for external plagiarism detection. For a given suspicious document we have to identify the set of source documents from which the suspicious document is copied. The method we propose comprises of four phases. In the first phase, we process all the documents to generate tokens, lemmas, finding Part-of-Speech (PoS) classes, character-offsets, sentence numbers and named-entity (NE) classes. In the second phase we select a subset of documents that may possibly be the sources of plagiarism. We use an approach based on the traditional Vector Space Model (VSM) for this candidate selection. In the third phase we use a graph-based approach to find out the similar passages in suspicious document and selected source documents. Finally we filter out the false detections1.|
|Appears in Collections:||2012|
Files in This Item:
|Plagiarism Detection in Text using Vector Space Model.pdf||267.35 kB||Adobe PDF||View/Open Request a copy|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.