WebScience 2010 Data

About the Data Sets - Download - Contact - Literature

About the Data Sets

During the last few years, collaborative tagging systems like Flickr, Del.icio.us or Bibsonomy got more and more popular because they allow users to easily upload resources like photos, bookmarked URLs and BibTeX entries and to share them with other users. Additionally, the users can organize their resources by assigning tags or keywords to them. Over time, one can observe the emergence of a loose categorization system which can be used for retrieving specific resources and navigating through the large set of resources, which is frequently called a folksonomy.

In recent literature, several models have been proposed for reproducing and understanding the tagging behavior of regular users in such folksonomies. Until now, they all have been evaluated by visually comparing their ability to reproduce characteristic properties which can be observed on the macro-level of a folksonomy. In our paper, which is published at the Web Science Conference 2010 we are the first who apply statistical methods for comparing the different tagging models and for measuring the statistical significance of the results.

On this page, we provide the supplemental material for our paper. This supplemental material gives the detailed graphs of the characteristic properties found in the data sets used throughout the paper. Furthermore, the raw data sets as well as the simulation software are provided so that the results reported in the paper can be reproduced.

Download

In our paper, we use all in all 15 pairs of co-occurrence streams for which we analyze the characteristic properties and compare them to simulation results achieved with the Epistemic Model and the Yule-Simon Model with Memory. The following supplemental material for this paper can be downloaded:

  • websci2010-supplemental.pdf: The detailed graphs of the characteristic properties of the 15 co-occurrence stream pairs. Furthermore, the parameters are provided for which the tag frequency distribution simulated by the Epistemic Model and the Yule-Simon Model with Memory best fitted the real tag frequency distribution found in the co-occurrence streams.
  • README.pdf: Information about the software and the raw data sets which are provided below. This information is needed for understanding how to use the software and the raw data sets.
  • websci2010-taggingmodels.jar: This jar file contains the software, its source code as well as the raw data sets required for reproducing the results from the Web Science paper. Source code and data sets can be extracted with any zip utility.

Contact

Klaas Dellschaft

Prof. Dr. Steffen Staab

Literature

2010

Dellschaft2010ODI
Dellschaft, Klaas; Staab, Steffen (2010): On Differences in the Tagging Behavior of Spammers and Regular Users. In: Proceedings of the Web Science Conference 2010.

last modified Apr 01, 2010 12:56 PM

Kontakt