To Lecture Notes
CSC 433 -- Apr 3, 2017
Review Questions
- What are some useful windows in the SAS 9.4 IDE?
Ans: Enhanced Editor, Results Viewer (HTML output), Output Window (typewriter output),
Log (shows what SAS did and any error messages), Explorer (Excel-like
spreadsheet that shows data libraries and current dataasets), Results Window (table
of contents of SAS output).
- What are some items that you can specify in an options statement?
Ans: Here is an example options statement:
options linesize=70 nodate pageno=1;
- Give four ways to specify the column for reading the next item in an input statement.
Ans: m-n (read columns m to n)
@n (move column pointer to column n)
+n (move column pointer n columns to the right)
@"GET" (move the column pointer past the string
"GET")
- Write a SAS script to verify that Jan 1, 1960 is the SAS zero date.
- What is the difference between these informats?
$10. :$10.
Write a SAS test script to verify your answer.
Ans: The second informat uses the colon modifier, which stops reading when a space is encountered.
Write a test script to verify the difference between
these two informats. Ans:
data test;
input a $10.;
output;
input a :$10.;
output;
datalines;
abcde fghijklm
abcde fghijklm
;
proc print;
run;
Output:
Obs a
1 abcde fghi
2 abcde
With the colon modifier :$10. stops the input when it
reaches a space.
- What is the output?
data numbers;
input x 3. +2 y 8.3 z 9.2;
datalines;
12345678.901234567890123456789
;
proc print;
Output:
Obs x y z
1 123 678.901 3456789.01
When x is read with the informat 3., three digits are read.
When y is read with 8.3, there is an embedded decimal point, so,
starting with 6, 8 characters are read: 678.9012; this value is then
rounded to 3 digit after the decimal.
When z is read, there is no embedded decimal point, so 9 digits are read:
345678901. Then a decimal point is positioned so that there are two digits
to the right of the decimal point.
- Explain the difference between @ and
@@. Write a SAS script to verify your answer.
Ans: @ means hold the input line until the end of the data step.
@@ means hold the input line until the data in the line are all used.
- How do you display labels as column headers in proc print output?
Ans: Use the label option:
proc print label;
D2L Quiz 1
- Work on the D2L Quiz1 in groups of 2, 3, or 4. Remember that you can take
each quiz twice.
Computing New Variables
User Defined Formats
- Instead of using an if..else statement, you can use proc format to define your own format for displaying data.
- See the Bulls Example, which displays the position with a user defined format.
Comma Delimited Files
Reading Binary Data
- Look at the Sound Example.
- Use this hex dump program to view the binary data file in the sound.bin file
in sound.zip:
- The depth of a sound file is the number of bits used to represent the sampled audio wave.
- The sound.bin file contains simulated sound intensity values of depth 8 (one byte each).
- Commercial sound systems use sound values of depth 16 or 24 sampled 44,100 to
192,000 times per second.
- Human ears cannot tell the difference between sound file depth of 24 vs. 32.
The set and Subsetting if Statements
- Create a dataset named kids from the input file
kids.txt. From this dataset create a second dataset
named older_girls that only contains the names and ages of the girls that are older than 11.
- In a SAS dataset
use the keep statement to only keep specified variables in the dataset.
use the drop statement to drop specified variables from the dataset.
use the subsetting if statement to keep only certain rows in the dataset. For example:
if gender = 'F';
Conditions that Cause Dataset Execution to Terminate
- Situations where a dataset stops execution and passes to the next data step
or proc.
- The data in the datalines, a raw input file, or an input dataset is exhausted when an
input or set statement is executed.
- There is no data in datalines or in an input file when an input statement is
executed.
- There are no observations in an input dataset when a set statement is executed.
- If there is no input or set statement, when the last line of the data step is executed.
- When an stop statement is executed.
- A situation where a dataset stops execution, an error condition is set, and the number of observations in the dataset
is set to 0:
- When an abort statement is executed.
Causing Variables to Persist across Observations
The statement
retain x 0;
means retain x, initializing it to zero for the first iteration.
A sum variable is automatically retained:
data compute_sum;
input x;
sum + x;
This is similar to the sum += x of many procedural programming languages.
Writing to a File
- Use the file and put statements to write to a file on disc.
- A data stop with the name _null_ is traditionally used when
using the file and put statements so that no actual dataset is created.
- Practice Problem: Compute the average of a list of values and print the result to an external file. Do this in two
different ways:
- Use sum variables sum and count in a _null_ data step.
data values;
input x @@;
datalines;
23 42 68 123 21
;
data _null_;
file "c:/datasets/ave1.txt";
set values end=eof;
retain count sum;
count + 1;
sum + x;
if eof then do;
ave = sum / count;
put "Average: " ave 10.3;
end;
Use proc means to compute the average and save the result to an output dataset.
Then use a _null_ data step that reads in this average and prints the result.
data values;
input x @@;
datalines;
23 42 68 123 21
;
proc means data=values noprint;
output out=summary mean=ave;
data _null_;
file "c:/datasets/ave2.txt";
set summary;
put "Average: " ave 8.3;
Compare with the Sales Example.
A datastep put statement cannot numeric constants or expressions or any kind. (String constants are okay.)
For example, this statement is incorrect:
put "Value: " 1.234;
Write the put statement like this instead:
val = 1.234;
put "Value: val;
Instead, assign the numeric constant to a variable
and then put the variable name in the put statement.
Look at the Project 1 Description.
Question: For Project 1 how can you find and throw away lines not to be used for data, such as those that begin with
--------------------------
or
Pace: 6:00 | 7:00 | 8:00 |
Ans: If one of these lines is read from the input file, the jersey_num variable will
will be missing, so use one of these statements:
if id_num = . then delete;
or
if id_num ~= .; * Subsetting if;
Alternatively, use the _infile_ variable with the substr function
and delete the observation if the line from the input file begins with
-----
or
Pace:
Project 1
Executable vs. Declarative Statements
Declarative statements supply information to SAS and take effect
when SAS compiles the program statements.
Examples of declarative statements:
datalines drop input length
output put retain
Declarative statements are also called nonexecutable statements.
Encodings
Try using this SAS code to write out the
first two datalines of the kids.txt file to a UTF-8 encoded file.
Now open a Windows Command Prompt window and type the file with the DOS
comand type
kids-utf8.txt. This is the output:
C:\datasets>type kids-utf8.txt
Connie F 8
Jacqueli F 14
Johnny M 8
Bill M 11
The characters at the beginning of the file are the
characters that correspond to the UTF-8 byte order marks (BOM).
These are magic numbers indicating that the encoding used in the
file is UTF-8. There characters are shown using the Original Equipment
Manufacturer's Code Page, which is DOS code page 437. This was the encoding first used for IBM PC's in the 1980s.
Let's check the default code page number for a Command Prompt Window:
C:\datasets> c:\datasets chcp
Active code page: 437
Now let's change the code page to 1252 and retype the document:
C:\datasets c:\datasets chcp 1252
Active code page: 1252
C:\datasets>type kids-utf8.txt
Connie F 8
Jacqueli F 14
Johnny M 8
Bill M 11
The three characters at the beginning of the file are now . Use the hex dump website
that we used for the Sound Example, to display the hex dump of the file:
0000-0010: ef bb bf 43-6f 6e 6e 69-65 20 46 20-38 0d 0a 4a ...Conni e.F.8..J
0000-0020: 61 63 71 75-65 6c 69 20-46 20 31 34-0d 0a 4a 6f acqueli. F.14..Jo
0000-0030: 68 6e 6e 79-20 4d 20 38-0d 0a 42 69-6c 6c 20 4d hnny.M.8 ..Bill.M
0000-0035: 20 31 31 0d-0a .11..
In particular, here are the BOM (first
three bytes) for the file:
Hex | Decimal | 437 CP | 1252 CP |
ef | 239 | ∩ |
ï |
bb | 187 |
╗ |
» |
bf | 191 |
┐ |
¿ |
We can obtain the decimal values of the BOM characters with Powershell command in a DOS Command Prompt:
Get-Content kids-utf8.txt -Encoding Byte > codes.txt
A table of SAS encodings.
This Wikipedia article shows some historically important code pages:
The DOS 1252 code page is roughly equivalent to the SAS encoding wlatin1.
SAS Functions
Project 2
Control Statements
Select Statements
A select statement can represent an if ... else statement in a more concise manner:
select(expression)
when(constanta, ... , constantb)
action1;
when(constantc, ... , constantd)
action2;
...
...
when(constanty, ... , constantz)
otherwise
default action;
end;
See the StateRegion Examples, Example 2.
Do Loops
Repeat the loop while the condition is true:
do while(condition);
action-statements;
end
Repeat the loop until the condition is true (while the condition is false):
do until(condition);
action-statements;
end
The do while and do until statements are called indefinite loops.
Repeat loop with the index i taking on the values from m to n.
This is called a definite loop.
do i = m to n;
action-statements;
end;
See the Quiz4 Example.