Normal Plots

We know that a bell-shaped histogram is said to have a normal distribution.
However, in practice, it is often difficult to look at a histogram and determine how close its distribution is to normal.
An easier way to tell if a dataset is normally distributed is to look at the normal plot. If the normal plot is close to a straight line, the distribution of the dataset is close to normal.

To define normal scores using Van der Waerden's method, find z-scores that divide the standard normal curve into n+1 equal areas of 1/(n+1) each. For example, the normal scores when n = 5 are defined by the bins (-∞, -0.97], (-0.97, -0.43], (-0.43, 0.00], (0.00, 0.43], [0.43, 0.97], (0.97, ∞). Each of these bins has area 1/(5+1) = 0.1667. This means that the normal scores for a dataset with n = 5 are -0.97, -0.43, 0.00, 0.43, and 0.97.
We can also see that the bins (-∞, -0.97], (-∞, -0.43], (-∞, 0.00], (-∞, 0.43], (-∞, 0.97] have areas 1/6 = 0.1667, 1/3 = 0.3333, 1/2 = 0.5000, 2/3 = 0.6667, 5/6 = 0.8333, respectively.

A normal plot or Q-Q plot is formed by plotting the normal scores defined in the previous section are plotted on the y-axis vs. the actual sorted data values on the x-axis.
If the normal plot is close to a straight line, we conclude that the dataset is close to normal.
Here is a normal plot of the dataset
3 60 98 145 201
The dataset values on the y-axis are plotted against the normal scores
-0.97 -0.43 0.00 0.43 0.97.
Here is the resulting normal plot:
The normal plot is approximately a straight line, so we conclude that the original dataset is close to normally distributed.

Example: Obtain the normal plot of the NBS-10 data.
Select Analyze >> Descriptive Statistics >> Q-Q Plots.
Move Difference to the Variables box, and select Van der Vaerden as the Proportion Estimation Formula.

What if the dataset is not normal?
Here are the normal plots of datasets compare a normal dataset with other datasets tha deviate from normality in various ways.
In these normal plots, the actual data points are plotted on the x-axis and the expected normal scores (Van der Waerden's method) are plotted on the y-axis.

Data are normal. All the points of the normal plot fall roughly on the reference line.
Data are skewed to the left. The data are further away on the left and closer on the right than they would be if they were normal.
Data are skewed to the right. The data are closer on the left and further away on the right than they would be if they were normal.
The distribution of the data has thick tails. The data are further away on both the left and the right than they would be if they were normal.
The distribution of the data has thin tails. The data are closer on both the left and the right than they would be if they were normal.