Direkt zum Inhalt | Direkt zur Navigation


These pages are deprecated and will be deleted soon.

Please use



WeST is member of the



These pages are deprecated and will be deleted soon.

Please use




An Epistemic Dynamic Model for Tagging Systems

About the Tagging Model - Data Sets - Contact - Literature

About the Tagging Model

During the last few years, collaborative tagging systems like Flickr, Del.icio.us or Bibsonomy got more and more popular because they allow users to easily upload resources like photos, bookmarked URLs and BibTeX entries and to share them with other users. Additionally, the users can organize their resources by assigning tags or keywords to them. Over time, one can observe the emergence of a loose categorization system which can be used for retrieving specific resources and navigating through the large set of resources, which is frequently called a folksonomy.

Thus, folksonomies constitute intriguing dynamic systems constructed by the collaboration and interaction of its users. They offer new possibilities for finding resources. But at the same time they constitute a challenge for existing models of categorization and retrieval of resources because the usage of tags at the micro-level of the individual user and at the macro-level of groups of users and of the complete user community has neither been understood nor has been put in a relationship with each other.

Recent research has brought forward an interesting temporal perspective on the understanding of folksonomies by viewing them as dynamic stochastic systems with memory. But this perspective abstracts away the background knowledge common to folksonomy users putting too much emphasis on imitation of other users and random generation of vocabulary. We advocate the hypothesis that both components, i. e. the background knowledge and the imitation, are needed for explaining and understanding the tagging behavior of users. We describe our proposal in the technical report below. It better approximates behavior found in actual tagging systems and it thus gives us more meaningful insights into the tagging process. For example, it helps us to distinguish between effects in the tagging system caused by the natural language behavior of users and effects that are specific to the user interface of tagging systems.

Data Set and Software Simulator

In the following, we provide for each of the co-occurrence streams from the technical report three files:

  • Stream: In the files with the streams, each row corresponds to a single tag assignment. The first column contains an artificial tag ID and the second column an artificial resource ID. The order of the rows corresponds to the order in which the tag assignments were made by the users.
  • URLs: Each row contains a single URL that was crawled for the web corpus of the corresponding stream. The URLs are alphabetically ordered.
  • Web Corpus: These files contain the word occurrence probabilities in the web corpora. Each row contains three columns: The first column contains an artificial tag ID that is the same as in the tag streams. If the word doesn't exist in the stream, a negative integer ID is used. The second column contains the word and the third column its relative occurrence probability. The rows are ordered by descending occurrence probability and the sum of all occurrence probabilities is 1.
Tag  Tag Assignments Users Tags Resources Stream URLs Web Corpus
ajax  2,949,614 88,526 41,898    71,525 25MB 3.8MB  11MB
blog 6,098,471  158,578  186,043  557,017   60MB  25MB 92MB
xml  974,866 44.326 31,998 61,843  9.3MB 3.3MB 6.3MB

Finally, we provide the Java software that was used for doing the simulations described in the technical report and the generated artificial tag streams:

  • TaggingModels.jar Software simulator of the TopN-Model and the Yule-Simon Process with Memory. The source code of the software is also contained in the jar file. It can e.g. be extracted with any zip utility. See the README for more details about how to start the simulator.
  • TaggingSimulation.tgz Archive containing all generated tag streams, the software simulator and the technical report.
  • README File describing how to start the software simulator and which files are contained in the archive with the artificial tag streams.


Dr. Klaas Dellschaft

Prof. Dr. Steffen Staab



Dellschaft, Klaas; Staab, Steffen (2008): An Epistemic Dynamic Model for Tagging Systems. In: HYPERTEXT 2008, Proceedings of the 19th ACM Conference on Hypertext and Hypermedia.


zuletzt verändert: 29.03.2010 12:10