SchemEX is a stream-based approach to compute an schema index over Linked Open Data (LOD) . The data stream is generated by an RDF crawler harvesting triples from the semantic web. So far, SchemEX uses a FIFO queue as cache on the stream of RDF triples to extract schema information from the crawled resources. The strategy of the RDF crawler so far is not considered at all.
Different caching strategies on a given data stream influence the quality of the resulting schema index. Likewise a guidance of the crawler or the provision of a more suitable crawling strategy might be favourable for a better index quality. The task would be to develop, implement and evaluate different strategies for caching and crawling in the SchemEX scenario.
In more detail, the work should cover:
 Mathias Konrath, Thomas Gottron, and Ansgar Scherp. SchemEX -- Web-Scale Indexed Schema Extraction of Linked Open Data, http://www.cs.vu.nl/~pmika/swc/submissions2011/swc2011_submission_5.pdf