Prof. Jeffrey D. Ullman

(Stanford University, USA),

"Cluster-Based Computation of Relational Joins"

The prevalence of large racks of interconnected processor nodes forces us to take another look at how to exploit parallelism when taking the join of large relations. Sometimes, there is a gain in total cost to be had by distributing pieces of each relation to several different nodes and computing the join of several large relations at once. The optimization problem is to pick the degree of replication of each relation, under the constraint that the total number of compute-nodes is fixed. We set up this problem as a nonlinear optimization and show that there is always a solution (which must be approximated by rounding to the nearest integers). For some of the most common types of join -- star joins and chain joins -- we give closed-form solutions to the optimization problem. Finally, we point out that the join algorithm we propose can be implemented using features already present in Hadoop, the open-source implementation of map-reduce.

Bio: Jeffrey D. Ullman is S. W. Ascherman Prof. of Engineering at Stanford Univ. (emeritus since 2003). He received his B. S. in Engineering Mathematics from Columbia Univ. and his Ph. D. in Electrical Engineering from Princeton University. After some years at Bell Laboratories, he was professor at Princeton Univ. In 1979, he joined the faculty of Stanford University. Among other awards, Prof. Ullman received several honorary doctorates, is Guggenheim Fellow and Fellow of the ACM, and won the Knuth Prize in 2000 for his outstanding contributions to the foundations of computer science. He is an author or coauthor of 16 books and 170 technical publications and has been for a long time among the top most cited authors in computer science.

Zeit: Freitag, 12. Juni 2009, 15:30 Uhr
Ort: Kaiserslautern, Gebäude 42, Raum 110