These are statements that do not belong to a particular DATA or PROC step. They have a global effect.
SAS carries out all statements in the DATA step in order for each input observation .
DATA <dataset_name>;
INFILE <file_name>;
DATALINES;
INPUT <variable_name [type] [position]>; Example: i) INPUT MFG @@; ii) INPUT MFG $ TYPE $ SEEK TRANSFER; iii) INPUT MFG $ 1-8 TYPE $ 11-12 SEEK 13-16 TRANSFER 17-19;
LABEL <variable_name='label'>...; Example: i) LABEL MFG='Manufacturer'; ii) LABEL MFG='Manufacturer' SEEK='Seek Time';
Symbol Operation Example
** Exponentiation Z=X**2; * Multiplication Z=X*Y; / Division Z=X/Y; + Addition Z=X+Y; - Subtraction Z=X-Y;
DROP <variable_name>...;removes named variables from the dataset and keeps unnamed variables.
KEEP <variable_name>...;keeps named variables and drops unnamed variables from the dataset.
IF <expression> THEN <statement>; ELSE <statement>;Note: The ELSE statement is optional. The IF ... THEN parts comprise a single statement:
i) IF SEEK < 15 THEN CLASS = 'FAST'; ELSE CLASS = 'SLOW'; ii) CLASS='SLOW'; IF SEEK < 15 THEN CLASS = 'FAST';
SAS comparison operators are shown below. You can use either the symbol or the two-letter abbreviation.
Symbol Abbrev
<, <= LT, LE >, >= GT, GE =, ^= EQ, NE
A special form of the "IF" statement is used for subsetting a dataset, that is selecting/excluding particular observations.
DATA CDROM; INPUT MFG $ TYPE $ SEEK TRANSFER; IF SEEK < 15;
The statement IF SEEK < 15; is equivalent to:
i) IF SEEK < 15 THEN OUTPUT; ii) IF SEEK >=15 THEN DELETE;
i) * ... ; ii) /* ... */ DATA CDROM; * Read in variables; INPUT MFG $ TYPE $ TRANSFER SEEK; /* ignore next statement SEEKMIN = SEEK/60000; */
DATA CDROM; INPUT MFG $ TYPE $ SEEK TRANSFER; IF SEEK < 15 THEN CLASS='FAST'; ELSE CLASS='SLOW'; DROP MFG TYPE; DATALINES; NEC 12X 7.3 105 SONY 6X 23.1 830 SONY 4X 40.1 330 CANON 6X 13.5 530 SONY 12X 5.5 1000The resulting dataset will contain observations 1, 4 and 5 and will look like:.
7.3 105 13.5 530 5.5 1000
SAS procedures execute predefined procedures which may be either statistical or utility procedures. The data structure processed is the most recently created dataset unless otherwise specified in a "DATA=" option.
PROC <procedure_name>; [procedure_statement];
VAR <variable_name>;
BY <variable_name>;
You should be familiar with the following procedures:
PROC CORR [options]; [VAR <variable_name>;]
PROC MEANS [options]; [VAR <variable_name>...;]
PROC UNIVARIATE [options]; [VAR <variable_name>...;]
PROC PRINT [options]; [VAR <variable_name>...;]
PROC SORT [options]; BY <variable_name>...;
PROC PLOT [options]; PLOT <dep_var_name>*<indep_var_name>='*' [options];