CS 232: Data Structures and Problem Solving

Spring 1999

Assignments 7 and 8 (Each one week)

Decision Trees

Tree Construction (Assignment 7) is due Monday April 12

Feature Selection (Assignment 8) is due Monday April 19

In these assignments you will be writing and experimenting with code that performs the supervised learning task using a decision tree. Much of the work has already been done with classes provided for you.

Assignment 7 has you building a DecisionTree class using these classes. In particular, your class should have the following:

Here are the specifications for the provided classes:

To get you started, here is the pseudocode for the constructor and the build method:

  DecisionTree(ExampleBag trainingSet) {
     create root with categories from the training set
     build(root, trainingSet)
  }

  build(DecisionNode node, ExampleBag trainingSet) {
     if (trainingSet has more than one unique category)
         choose decision feature
         assign node decision feature
         assign node position of decision feature
         
         get yesSet which consists of examples of trainingSet that are
              consistent with the decision feature
         get noSet which consists of examples of trainingSet that are
              not consistent with the decision feature

         if both yesSet and noSet are not empty
             create yesNode with categories from yesSet
             build(yesNode, yesSet)
             create noNode with categories from noSet
             build(noNode, noSet)
  }
Submission for Assignment 7
Turn in a hardcopy of your DecisionTree class, which should contain a series of test code in the static main method. Also turn in a script showing the output of your test code.

Assignment 8

Assignment 8 has you creating a better means for selecting features. After which, you will run some experiments testing your feature-selection algorithm.

In particular, you should do the following:

Feature selection
The feature selection criteria is up to you, but it should involve choosing a feature that will deliver a good split of categories. To perform this task, you will need to find features in the examples, rate their utility, and choose the feature with the best measure.
Comparison
The "party" dataset is a set of congressional voting records. The first item is the representative's party (Democrat or Republican), which will serve as the example's category; the remaining items are how the representative voted on certain issues. These votes serve as features. Your performance should be measured in the following way:
  numCorrect = 0 
  For each example E:
    remove E from the training set
    build the tree with training set
    predict E's category with the tree
    if prediction is correct, then numCorrect++
  
  Performance = numCorrect / size of dataset
  
Submission for Assignment 8
Turn in a hardcopy of your chooseFeature code and a display of the resulting tree from the "party" database using your feature selection method. Finally, include the results of your experiment.