CSC 313: Data Structures in Java
Autumn 2002
Inidividual Assignment 1
Using a class specification and Algorithm Analysis
Due Wednesday September 25 before 11:30pm
Part I
For Part I, you will read the specification of two predefined classes
and use them to analyze some literary texts. This involves having
your program read in a text, count word frequencies from that text,
and create a report summarizing the frequencies. In particular, your
program should report the following statistics:
- The total number of words in the text
- The number of different words in the text
- The ratio of the number of different words to the total number
of words
- The list of words (in any order) that occur more than 2% of the
time and the percentage that they occur.
To help you with the programming (and to practice using collections
and iterators), you must make use of the following classes:
- The StreamTokenizer class allows you to create a stream
object that reads from a file one word at a time. Here is a simple demonstration program. For more details, see
the documentation for the java.io package.
- The SetOfCountables class allows you to easily count the
number of occurences of any collection of objects (such as words).
Here is a simple demonstration program that
uses it. And here is the online documentation that I created for it.
You will also need its iterator class (called
CountablesIterator) to sequentially step through the counted
objects. Here is its online
documentation. Finally, you will need to place its jar file (csc313hw1.jar) where the java compiler and
interpreter can find it.
If you have not used a jar file before, it may involve some effort
to set up your programming environment so that java can find the jar
file and the classes contained in it. To use the SetCountables class
and the CountablesIterator class, you will need to do the following:
- Download the jar file, which contain the two classes.
- Do one of the following:
- Place the jar file in the jdk1.3.x\jre\lib\ext folder
AND in the Program_Files\JavaSoft\JRE\1.3.xxx\lib\ext
folder. Note: this worked for me using several methods.
- Place the jar file in any folder and set the CLASSPATH
variable to include the location of the jar file and the
current directory (e.g. ".;C:\mylib\csc313hw1.jar"). Note: this
is probably the right way to do it, but it involves knowing
how to set the classpath variable on your computer.
- Unzip the jar file (e.g. with WinZip) and place the
resulting csc313 folder in the same directory as your
program. Note: if you are having trouble unzipping the jar
file. Here is the zip file
holding the same folders and files.
- Insert the statement
import csc313.hw1.*;
at the top your program's file.
Here are some additional specifications for your program:
- The user supplies the name of the analyzed text. This may be done
with a command line prompt or with a graphical dialog box.
- The StreamTokenizer occasionally has problems separating words.
You do not need to worry about these rare occurrences. Simply treat
them as one long word.
- You do not need to define any new classes to complete this
assignment. While it can be written all in one file, you should at
least break the design of the program into a few static methods. Name
your program's file TextAnalysis.java.
- The commented program should include your name and a short
description of what it does. Good identifier names (e.g. methods, variables,
constants) should explain what your program is doing. When it
is not clear, supplement your program when simple, easy-to-understand
comments.
- You should test your program on the following short stories: The Cask of Amontillado and The Open Window. Paste your results in
a file called output.txt.
Part II
For each of the code segments below, answer the following questions:
- Which operation (e.g. comparison or statement) will typically be executed more than any operation?
- For input size n, how many times will this statement be executed?
- What is the time complexity class (Big-Oh analysis) of this code segment?
Code segment A
for (int i = 0; i < n; i += 2)
sum++;
Code segment B
for (int i = 0; i < n; i++)
for (int j = 0; j < n * n; j++)
sum++;
Code segment C
for (int i = 0; i < n; i++)
for (int j = 0; j < i; j++)
sum++;
Example analysis
Here is an example analysis of a similar code fragment:
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
for (int k = 0; k < 3; k++)
sum++;
- "sum++" is repeated most frequently
- We work the analysis starting from the innermost loop
- "sum++" is executed 3 times for the k-loop
- which is then executed 3*n for the j-loop
- which is then executed 3*n*n for the i-loop
- 3n^2 overall
- the segment has a quadratic time complexity,
also written as O(n^2)
Submission
Before the due date and time, you should submit the following three
files through the submission
Web page:
- Your individual version of TextAnalysis.java
- The file output.txt containing your program's output to the
short stories.
- Any accompanying classes needed to run your program.
- The file answers.txt, which can the answers to the
questions in part II.