Programm und Teilnehmer

Proseminar Peer-to-Peer Informationssysteme
Wintersemester 2006/2007

23.10.2006 - INTRODUCTION

Introduction
Speaker: Sergej Sizov - Slides

28.11.2006 - PRESENTATION SKILLS

Präsentation Skills and Scientific Writing
Speaker: Sergej Sizov - Slides

05.12.2006 - ARCHITECTURES AND ROUTING

Opponents: Heiko Günther, Sven Kühner

1. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

A fundamental problem that confronts peer-to-peer applications is to efficiently locate the node that stores a particular data item. This paper presents Chord, a distributed lookup protocol that addresses this problem. Chord provides support for just one operation: given a key, it maps the key onto a node. Data location can be easily implemented on top of Chord by associating a key with each data item, and storing the key/data item pair at the node to which the key maps. Chord adapts efficiently as nodes join and leave the system, and can answer queries even if the system is continuously changing. Results from theoretical analysis, simulations, and experiments show that Chord is scalable, with communication cost and the state maintained by each node scaling logarithmically with the number of Chord nodes.

Speaker: Marc Knaup - Slides
Tutor: Dr. Dr. Sergej Sizov

2: HyperCuP: Hypercubes, Ontologies and Efficient Search on P2P Networks

Peer-to-peer networks are envisioned to be deployed for a wide range of applications. However, P2P networks evolving in an unorganized manner suffer from serious scalability problems, limiting the number of nodes in the network, creating network overload and pushing search times to unacceptable limits. We address these problems by imposing a deterministic shape on P2P networks: We propose a graph topology which allows for very efficient broadcast and search, and we describe a broadcast algorithm that exploits the topology to reach all nodes in the network with the minimum number of messages possible. We provide an efficient topology construction and maintenance algorithm which, crucial to symmetric peer-to-peer networks, does neither require a central server nor super nodes in the network. Nodes can join and leave the self-organizing network at any time, and the network is resilient against failure. Moreover, we show how our scheme can be made even more efficient by using a global ontology to determine the organization of peers in the graph topology, allowing for efficient concept-based search.

Speaker: Jens Vieweg - Slides
Tutor: Simon Schenk

12.12.2006 - DISTRIBUTED AUTHORITY RANKING

Opponents: Marc Knaup, Mark Schneider

1: Distributed Page Ranking in Structured P2P Networks

This paper discusses the techniques of performing distributed page ranking on top of structured peer-to-peer networks. Distributed page ranking are needed because the size of the web grows at a remarkable speed and centralized page ranking is not scalable. Open System PageRank is presented in this paper based on the traditional PageRank used by Google. We then propose some distributed page ranking algorithms, partially prove their convergence, and discuss some interesting properties of them. Indirect transmission is introduced in this paper to reduce communication overhead between page rankers and to achieve scalable communication. The relationship between convergence time and bandwidth consumed is also discussed. Finally, we verify some of the discussions by experiments based on real datasets.

Speaker: Henning Selt - Slides
Tutor: Dr. Dr. Sergej Sizov

2: JXP: Global Authority Scores in a P2P Network

This document presents the JXP algorithm for dynamically and collaboratively computing PageRank-style authority scores of Web pages distributed in a P2P network. In the architecture that we pursue, every peer crawls and indexes Web fragments at its discretion, driven by the thematic profile or overlay neighborhood of the peer. The JXP algorithm runs at every peer, and is initialized by a local authority computation on the basis of the locally available Web fragment. Peers collaborate by periodically 'meeting' with other peers in the network. Whenever two peers meet they exchange their local information and use this new information to improve their local authority scores. Even though only local computations are performed, the JXP scores approximate the global importance of pages in the entire network. The storage demand of each peer is linear in the number of Web pages and the locally stored Web fragment. Experiments show the quality and practical viability of the JXP algorithm.

Speaker: Masoud Roschani - Slides
Tutor: Dr. Dr. Sergej Sizov

19.12.2006 - DATA SYNOPSES

Opponents: Rainer Weißenfels, Hichem Zarrai

1: Capturing Collection Size for Distributed Non-Cooperative Retrieval

Modern distributed information retrieval techniques require accurate knowledge of collection size. In non-cooperative environments, where detailed collection statistics are not available, the size of the underlying collections must be estimated. While several approaches for the estimation of collection size have been proposed, their accuracy has not been thoroughly evaluated. An empirical analysis of past estimation approaches across a variety of collections demonstrates that their prediction accuracy is low. Motivated by ecological techniques for the estimation of animal populations, we propose two new approaches for the estimation of collection size. We show that our approaches are significantly more accurate that previous methods, and are more efficient in use of resources required to perform the estimation.

Speaker: Heiko Richter - Slides
Tutor: Dr. Dr. Sergej Sizov

2: Spectral Bloom Filters

A Bloom Filter is a space-efficient randomized data structure allowing membership queries over sets with certain allowable errors. It is widely used in many applications which take advantage of its ability to compactly represent a set, and filter out effectively any element that does not belong to the set, with small error probability. This paper introduces the Spectral Bloom Filter (SBF), an extension of the original Bloom Filter to multi-sets, allowing the filtering of elements whose multiplicities are below a threshold given at query time. Using memory only slightly larger than that of the original Bloom Filter, the SBF supports queries on the multiplicities of individual keys with a guaranteed, small error probability. The SBF also supports insertions and deletions over the data set. We present novel methods for reducing the probability and magnitude of errors. We also present an efficient data structure and algorithms to build it incrementally and maintain it over streaming data, as well as over materialized data with arbitrary insertions and deletions. The SBF does not assume any a priori filtering threshold and effectively and efficiently maintains information over the entire data-set, allowing for ad-hoc queries with arbitrary parameters and enabling a range of new applications.

Speaker: Heiko Günther - Slides
Tutor: Dr. Dr. Sergej Sizov

09.01.2006 - SEARCH AND RETRIEVAL

Opponents: Masoud Roschani, Henning Selt

1: Replication, Load Balancing and Efficient Range Query Processing in DHTs

We consider the conflicting problems of ensuring data-access load balancing and efficiently processing range queries on peer-to-peer data networks maintained over Distributed Hash Tables (DHTs). Placing consecutive data values in neighboring peers is frequently used in DHTs since it accelerates range query processing. However, such a placement is highly susceptible to load imbalances, which are preferably handled by replicating data (since replication also introduces fault tolerance benefits). In this paper, we present Hot-RoD, a DHT-based architecture that deals effectively with this combined problem through the use of a novel locality-preserving hash function, and a tunable data replication mechanism which allows trading off replication costs for fair load distribution. Our detailed experimentation study shows strong gains in both range query processing efficiency and data-access load balancing, with low replication overhead. To our knowledge, this is the first work that concurrently addresses the two conflicting problems using data replication.

Speaker: Mark Schneider - Slides
Tutor: Olaf Görlitz

2: Improving Collection Selection with Overlap Awareness in P2P Search Engines

Collection selection has been a research issue for years. Typically, in related work, precomputed statistics are employed in order to estimate the expected result quality of each collection, and subsequently the collections are ranked accordingly. Our thesis is that this simple approach is insufficient for several applications in which the collections typically overlap. This is the case, for example, for the collections built by autonomous peers crawling the web. We argue for the extension of existing quality measures using estimators of mutual overlap among collections and present experiments in which this combination outperforms CORI, a popular approach based on quality estimation. We outline our prototype implementation of a P2P web search engine, coined MINERVA, that allows handling large amounts of data in a distributed and self-organizing manner. We conduct experiments which show that taking overlap into account during collection selection can drastically decrease the number of collections that have to be contacted in order to reach a satisfactory level of recall, which is a great step toward the feasibility of distributed web search.

Speaker: Sven Kühner
Tutor: Olaf Görlitz

16.01.2006 - SEMANTIC OVERLAY NETWORKS

Opponents: Heiko Richter, Jens Vieweg

1: Semantic Overlay Networks for P2P Systems

In a peer-to-peer (P2P) system, nodes typically connect to a small set of random nodes (their neighbors), and queries are propagated along these connections. Such query flooding tends to be very expensive. We propose that node connections be influenced by content, so that for example, nodes having many 'Jazz' files will connect to other similar nodes. Thus, semantically related nodes form a Semantic Overlay Network (SON). Queries are routed to the appropriate SONs, increasing the chances that matching files will be found quickly, and reducing the search load on nodes that have unrelated content. We have evaluated SONs by using an actual snapshot of music-sharing clients. Our results show that SONs can significantly improve query performance while at the same time allowing users to decide what content to put in their computers and to whom to connect.

Speaker: Rainer Weißenfels - Slides
Tutor: Alexander Kubias

last modified Feb 14, 2009 07:00 PM

Kontakt