PINTS - Experiments Data Sets
Self-organizing structure and availability of almost unlimited resource capacities make the peer-to-peer architecture very attractive for large-scale sharing of annotated data in Web 2.0 scenarios. We addressed the problem of information aggregation and utilization in a decentralized tagging environment, introduced the vector space model for characterization of users, resources, and tags, and analyzed the problem of constructing a reliable approximation for feature vectors in a fully decentralized setting.
A large-scale systematic evaluation with realistic data sets was done to prove the viability of our approach.
Two large-scale folksonomy data sets were used for the simulation of PINTS. They were obtained by systematically crawling the Flickr and Del.icio.us portals during 2006 and 2007. The crawls were done in the context of the Tagora project. The crawling targets were the core elements, namely users, tags, resources and tag assignments.
The statistics of the crawled datasets are summarized below
(518 MB) packed with 7zip
(848 MB) packed with 7zip
The archives were compressed with 7zip and contain a single text file with time-ordered tag assignments in 4 tab-separated columns. The columns are (in following order): posting date, user ID, resource ID, and tag label.