CS 232: Data Structures and Problem Solving
Spring 1999
Assignments 7 and 8 (Each one week)
Decision Trees
Tree Construction (Assignment 7) is due Monday April 12
Feature Selection (Assignment 8) is due Monday April 19
In these assignments you will be writing and experimenting with code
that performs the supervised learning task using a decision tree. Much of
the work has already been done with classes provided for you.
Assignment 7 has you building a DecisionTree class using these classes.
In particular, your class should have the following:
- A constructor that builds the decision given an ExampleBag
(a set of training examples), which is passed as a parameter
- A method for writing out the decision tree
- A method that uses the tree to predict a category when given an example.
Here are the specifications for the provided classes:
- Example -- represents one example, which
may or may not have the category specified
- ExampleBag -- holds a set of
training examples
- DecisionNode -- represents each
node in the decision tree
- StringBag -- useful for storing a
list of categories
To get you started, here is the pseudocode for the constructor
and the build method:
DecisionTree(ExampleBag trainingSet) {
create root with categories from the training set
build(root, trainingSet)
}
build(DecisionNode node, ExampleBag trainingSet) {
if (trainingSet has more than one unique category)
choose decision feature
assign node decision feature
assign node position of decision feature
get yesSet which consists of examples of trainingSet that are
consistent with the decision feature
get noSet which consists of examples of trainingSet that are
not consistent with the decision feature
if both yesSet and noSet are not empty
create yesNode with categories from yesSet
build(yesNode, yesSet)
create noNode with categories from noSet
build(noNode, noSet)
}
Submission for Assignment 7
Turn in a hardcopy of your DecisionTree class, which should contain a
series of test code in the static main method. Also turn in a script
showing the output of your test code.
Assignment 8
Assignment 8 has you creating a better means for selecting features.
After which, you will run some experiments testing your feature-selection
algorithm.
In particular, you should do the following:
- Write a new chooseFeature method for the ExampleBag class. This
method should do a significantly better job than the random feature
selection.
- Compare the performance of your selection method with the random
method using the "party" dataset.
Feature selection
The feature selection criteria is up to you, but it should involve
choosing a feature that will deliver a good split of categories. To
perform this task, you will need to find features in the
examples, rate their utility, and choose the feature with the
best measure.
Comparison
The "party" dataset is a set of congressional voting records. The first
item is the representative's party (Democrat or Republican), which will
serve as the example's category; the remaining
items are how the representative voted on certain issues. These votes serve
as features. Your
performance should be measured in the following way:
numCorrect = 0
For each example E:
remove E from the training set
build the tree with training set
predict E's category with the tree
if prediction is correct, then numCorrect++
Performance = numCorrect / size of dataset
Submission for Assignment 8
Turn in a hardcopy of your chooseFeature code and a display of the
resulting tree from the "party" database using your feature selection
method. Finally, include the results of your experiment.