Shimin Chen

Thesis Title: Redesigning Database Systems in Light of CPU Cache Prefetching
Degree Type: Ph.D. in Computer Science
Advisor(s): Anastassia Ailamaki, Todd Mowry
Graduated: December 2005

Abstract:

Computer systems have enjoyed an exponential growth in processor speed for the past 20 years, while main memory speed has improved only moderately. Today a cache miss to main memory takes hundreds of processor cycles. Recent studies have demonstrated that on commercial databases, about 50% or more of execution time in memory is often wasted due to cache misses. In light of this problem, a number of recent studies focused on reducing the number of cache misses of database algorithms. In this thesis, we investigate a different approach: reducing the impact of cache misses through a technique called cache prefetching. Since prefetching for sequential array accesses has been well studied, we are interested in studying non-contiguous access patterns found in two classes of database algorithms: the B⁺-Tree index algorithm and the hash join algorithm. We re-examine their designs with cache prefetching in mind, and combine prefetching and data locality optimizations to achieve good cache performance.

For B⁺-Trees, we first propose and evaluate a novel main memory index structure, Prefetching B⁺-Trees, which uses prefetching to accelerate two major access patterns of B⁺-Tree indices: searches and range scans. We then apply our findings in the development of a novel index structure, Fractal Prefetching B⁺-Trees, that optimizes index operations both for CPU cache performance and for disk performance in commercial database systems by intelligently embedding cache-optimized trees into disk pages.

For hash joins, we first exploit cache prefetching separately for the I/O partition phase and the join phase of the algorithm. We propose and evaluate two techniques, Group Prefetching and Software-Pipelined Prefetching, that exploit inter-tuple parallelism to overlap cache misses across the processing of multiple tuples. Then we present a novel algorithm, Inspector Joins, that exploits the free information obtained from one pass of the hash join algorithm to improve the performance of a later pass. This new algorithm addresses the memory bandwidth sharing problem in shared-bus multiprocessor systems.

We compare our techniques against state-of-the-art cache-friendly algorithms for B⁺-Trees and hash joins through both simulation studies and real machine experiments. Our experimental results demonstrate dramatic performance benefits of our cache prefetching enabled techniques.

Thesis Committee:
Anastassia Ailamaki (Co-Chair)
Todd C. Mowry (Co-Chair)
Christos Faloutsos
Phillip B. Gibbons
David J. DeWitt (University of Wisconsin at Madison)

Jeannette Wing, Head, Computer Science Department
Randy Bryant, Dean, School of Computer Science

Keywords:
Cache Prefetching, Database Systems, CPU Cache Performance, Data Locality Optimizations, B+-Trees, Hash Joins