For this assignment, you will be implementing the LZ78 compression
algorithm, which uses a dictionary (ie. a hash table) for storing and
indexing common sequences of characters. A rudimentary hash table (the
CodeTable
class) has already been implemented and will
simplify the completion of part 1. Part 2 has you extending the
CodeTable
class in order to perform experiments on hash table
performance. Finally, you will extend CodeTable
so that its
table automatically resizes when necessary.
Implement the LZ78 encoding algorithm as
described online. While we will go through an example in class, this
reference provides a comprehensive description of the algorithm. The files
in ~miller/cs231/hw2
contain source files that will also help
you get started. You should copy them to your own hw2
directory.
Among these files is the Press.cpp
file, which has a good
demonstrations of the CodeTable
and CodeStream
classes. The CodeTable
implements the dictionary for the LZ78
algorithm. The CodeStream
class allows you to easily write out
compressed integer--code pairs to a file as required by the LZ78 algorithm.
Running the make
command compiles the CodeTable
and CodeStream
classes and the source Press.cpp
to create the executable program called press
.
Also supplied is the decoding implementation Depress.cpp
(the
executable is called depress
). Currently the demonstration
press
program produces a file that can be successfully
interpreted by the depress
program. However, you will note
that no file compression is obtained in its current form.
Once you have implemented the encoding algorithm, run it on at least three similar text files of different sizes. Data collection (and only data collection) may be done in groups of up to three people. Graph your results, where the x-ordinate is the file size and the y-ordinate is the compression ratio. Describe the results and draw an appropriate conclusion in a short, written summary (about one to two paragraphs). If you collected data in a group, be sure to list your collaborators. The code and written summary must be your own work.
From our discussion in class, you should recall that the linear probing method may produce large clusters around certain hash indexes and thus cause a large number of collisions. To investigate their frequency, write a display member function that writes out the following for each hash entry:
Graph your results, where the x-ordinate is the hash table size and the y-ordinate is the average number of probes needed to find the key. Describe the results and draw an appropriate conclusion in a short, written summary (about one to two paragraphs). Again, data collection may be done in a group of up to three people, but the code and written summary must be your own work.
Finally, extend the CodeTable
class so that the hash table is automatically
resized when it becomes more than half full.