> pnorm(1) - pnorm(-1) [1] 0.6826895 #<-- Matches the well known value 68%. > pnorm(2) - pnorm(-2) [1] 0.9544997 #<-- Matches the well known value 95%. > pnorm(3) - pnorm(-3) [1] 0.9973002 #<-- Matches the well known value 99.7%. > pnorm(4) - pnorm(-4) [1] 0.9999367 #<-- Very close to 1.You can also compute these four probabilities in one line as
> pnorm(c(1, 2, 3) - pnorm(c(-1, -2, -3)) [1] 0.6826895 0.9544997 0.9973002Here is how to compute these probabilites using a SAS script:
* The cdf function means cumulative distribution function; data probs; p1 = cdf("normal", 1) - cdf("normal", -1); output; p2 = cdf("normal", 2) - cdf("normal", -2); output; p3 = cdf("normal", 3) - cdf("normal", -3); output; proc print; run;Here is the SAS output:
Obs p 1 0.68269 2 0.95450 3 0.99730
> qnorm(0.975) [1] 1.959964Similarly we find that a 99% confidence interval for z ∼ N(0, 1) is [-2.58, 2.58].
qnorm(c(0.975, 0.995)) Output: [1] 1.959964 2.575829Here is the corresponding SAS script and output:
data confint; val = quantile("normal", 0.975); output; val = quantile("normal", 0.995); output; proc print; run; Output: Obs val 1 1.95996 2 2.57583
proc boxplot; plot scores * dummy / boxstyle = schematic;The variable dummy is defined in the data step as 1 or some other arbitrary constant.
boxplot(scores)Here are the resulting SAS boxplot and R boxplot. Note that neither SAS nor R distinguishes between mild and extreme outliers. For both boxplots, all outliers are shown with an O symbol.
proc univariate noprint; histogram scores / endpoints = (0, 20, 40, 60, 80, 100); or proc univariate noprint; histogram scores / endpoints = (0 to 100 by 20);R function call for drawing the histogram:
hist(scores, breaks=seq(0, 100, 20))Here are the resulting SAS histogram and R histogram.
* Create dataset. data kids; infile kids.txt; input name, gender, age, firstobs=2; proc print data=kids proc means mean;The input file is c:\datasets\kids.txt, which contains
Name Gender Age Sally F 12 Alex M 11 Jason M 9 Molly F 10Ans: Here is the corrected version:
* Create dataset; data kids; infile "c:/datasets/kids.txt" firstobs=2; input name $ gender $ age; proc print data=kids; proc means mean; run;
kids = read.table("c:/datasets/kids.txt", header=TRUE) cat("kids data frame:\n") kids cat("Average age of kids:\n") mean(kids$Age)The R cat function literally outputs text to the console window. New line characters can be included as \n.
> x = rnorm(200) > qqnorm(x)
Discipline | Dependent Variable | Independent Variables |
---|---|---|
History of Science | Height of adult child | Height of father, height of mother |
Psychology | Aggressiveness of moonlighting employees |
Age, gender, self-esteem, history of aggression supervisor abuse, perception of injustice |
Geography | Predicted population density using satellite maps |
Proportion of low density population areas, proportion of high density population areas |
Music | Entropy of composition | Year of birth of composer |
Accounting | Negative personality rating of accountant | Age, gender, education, income |
Engineering | Heat rate of gas turbine engine | Rotation rate, inlet temperature, exhaust gas temperature, cycle pressure ratio, air flow rate |
Management | Vice president's attitude towards improving company efficiency |
Level of CEO leadership, level of congruence between VP and CEO |
Law | Likelihood of changing a verdict from not guilty to guilty after deliberations |
Gender of juror, expert testimony in case (yes or no) |
Education | SAS-Mathematics score | Scores on PSAT test,
did student receive coaching, number of math courses taken in high school, GPA in math courses |
Mental health | Adjustment to community | Demographic (4
variables), diagnostic (7 variables), treatment (4 variables), commmunity (6 variables) |