The program "extract.pl" does the follwing: 1. performs rudimentary parsing of an HTML file to extract the text (note that the file extension of the input file must be ".html" or ".htm"; 2. removes stop words - the list of stop words is built-in within a hash (a perl associative array); 3. stems the extracted words using Porter's stemming algorithm; 4. outputs the stems along with theit text frequency in the input document one per line (sorted in the descending order of text frequency). The program can be run as follows: perl extract.pl > For example: perl extract.pl test.html > test.out will parse the file test.html and writes the resutls into the file test.out.