Startseite Campus Koblenz Fachbereich 4: Informatik Institute for Web Science and Technologies Scientific theses Optimizing Caching and Crawling Strategies for Stream-based SchemEX Computation

Optimizing Caching and Crawling Strategies for Stream-based SchemEX Computation

Organisatorisches

Beschreibung

SchemEX is a stream-based approach to compute an schema index over Linked Open Data (LOD) [1]. The data stream is generated by an RDF crawler harvesting triples from the semantic web. So far, SchemEX uses a FIFO queue as cache on the stream of RDF triples to extract schema information from the crawled resources. The strategy of the RDF crawler so far is not considered at all.

Different caching strategies on a given data stream influence the quality of the resulting schema index. Likewise a guidance of the crawler or the provision of a more suitable crawling strategy might be favourable for a better index quality. The task would be to develop, implement and evaluate different strategies for caching and crawling in the SchemEX scenario.

In more detail, the work should cover:

  • Development of caching strategies
  • Development of crawling strategies/guidance
  • Incorporation of the strategies in the existing system used for computing SchemEX
  • Evaluation on a suitable corpus

Requirements

  • Good programming skills
  • Knowledge of Semantic Web techniques are of advantage
  • Management of large data sets will be necessary


[1] Mathias Konrath, Thomas Gottron, and Ansgar Scherp. SchemEX -- Web-Scale Indexed Schema Extraction of Linked Open Data, http://www.cs.vu.nl/~pmika/swc/submissions2011/swc2011_submission_5.pdf

zuletzt verändert: 12.01.2012 16:28

Kontakt

Deprecated

These pages are deprecated and will be deleted soon.

Please use

west.uni-koblenz.de

instead!

WeST is member of the

WSTNet

Deprecated

These pages are deprecated and will be deleted soon.

Please use

west.uni-koblenz.de

instead!