Direkt zum Inhalt | Direkt zur Navigation

Sektionen
Deprecated

These pages are deprecated and will be deleted soon.

Please use

west.uni-koblenz.de

instead!

WeST is member of the

WSTNet

 
Deprecated

These pages are deprecated and will be deleted soon.

Please use

west.uni-koblenz.de

instead!

 
Startseite Campus Koblenz Fachbereich 4: Informatik Institute for Web Science and Technologies Scientific theses A Probabilistic Framework for Plagiarism Detection

A Probabilistic Framework for Plagiarism Detection

Organisatorisches

Beschreibung

Plagiarism is the “"Use or close imitation of the language and thoughts of another author and the representation of them as one's own original work."(Random House Compact Unabridged Dictionary, 1995 ). Systems for plagiarism detection aim to automatically recognize plagiarised texts. The most common setting is the extrinsic analysis, in which a reference corpus is given from which a suspicious document might have plagiarised text fragments to several extents. In this case a system has to first select candidate documents from the reference corpus, second to analyse in detail which parts have actually been plagiarised and third clean the found fragments in a post-processing step.

Most system use heuristic approaches and a mixture of methods to solve the individual steps in the plagiarism detection task. The objective of this thesis here is to develop and evaluate a sound probabilistic framework for plagiarism detection. Core component would be an existing method to model the likelihood of a text fragment to be a plagiate of another text fragment.

In more detail, the work should cover:

  • Complete probabilistic framework for plagiarism detection
  • Adaptation of the framework to the steps of candidate selection, detailed analysis and post-processing
  • Implementation of a reference system
  • Evaluation on corpora for plagiarism detection.

Requirements:

  • Good programming skills
  • Good knowledge of basic probabilistic maths
  • Knowledge of Information Retrieval techniques are of advantage
  • Management of large data sets will be necessary

 

zuletzt verändert: 07.02.2012 11:58

Kontakt