CS 378: Information Systems

Spring 1999

Assignment 4 (10 points)

Clustered and Unclustered B+ Tree Indexes

Due Wednesday March 24 by NOON

Using the code provided in the cs378/hw4 directory and discussed in class, you will run a series of experiments that measure the I/O costs (ie. number of read operations) of clustered and unclustered B+ tree indexes.

The driver program in cs378/hw4 creates an index into the movies database with the number of votes being the key. It provides an example of using the index to perform a query where the key equals 116.

An unclustered index can be created by reading in the "movies" file, which has the records sorted by title. The records having the same number of votes are distributed throughout this file. A clustered index can be created by reading in the "movies2" file, which has the records sorted by the number of votes. For this file, the records having the same number of votes are located next to each other.

For your experiments, you should run at least 5 trials with the clustered index and 5 trials with the unclustered index. The number of matches for each query should range from 1 to 100. To display your results, graph your data so that the number of matches is indicated by the x-axis and the number of read operations is indicated by the y-axis. You may draw the graph by hand if it is neatly done. Be sure your caption describes the circumstances of your results!

Finally, prepare a report that describes your results, explains them, and draws conclusions from them. Your explanation of the performance difference between a clustered index and an unclustered index should include detailed examples of what happens during a retrieval under both cases. The benefit of B+ trees should also be placed in the context of other retrieval methods including the sequential search and the binary search. When forming your conclusions, make sure you carefully qualify when your conclusions hold. Keep in mind that empirical results do not prove anything, but, when combined with a solid explanation, they provide powerful support to your conclusions.

A good report will be approximately 1 single-spaced page. The grade for this assignment will strongly depend on the lucidity, detail, and conciseness of your report.

Submission

Turn in your report accompanied with the graph of your results.