> I must confess that, while I have some understanding about how to > interpret a normal probability plot, I have absolutely no idea how to > construct one for a particular data set. As a teacher who may has used > them to justify using the t-test, this makes me very uncomfortable, > especially when being asked how this plot was constructed and having to > profess ignorance. > > Is it possible to explain how to do this in an email? Or is there a > website that I can visit?
Here's a simple data set that my notes say came from the Minitab Reference Manual, release 10.5. (I don't have the manual here.) X: .1, .9, 1.1, 1.8, 2.3 Refer to these data points, in order, as x_i, with i = 1,2,3,4,5.
Now sketch a picture of the normal curve. If the X values are normally distributed, you might expect them to occur at, say, the 10th, 30th, 50th, 70th, and 90th percentiles (i.e., at the z-values with cum probs of .1, .3, .5, .7, and .9). Use a normal table or the TI-83 to look up the z-values for these percentiles. I get about -1.28, -.52, 0, .52, and 1.28. Construct, by hand, an ordinary "x-y plot" of the five points (X,z). That's a normal probability plot.
Now compare the hand-drawn result with the following Minitab plot; they should be pretty similar.
MTB > set c1 DATA> .1 .9 1.1 1.8 2.3 DATA> end MTB > nscore c1 c2 MTB > name c1 'X' c2 'Nscore' MTB > print c1-c2
Nscore - - x - 0.80+ - - x - - 0.00+ x - - - x - -0.80+ - - x - --------+---------+---------+---------+---------+--------X 0.40 0.80 1.20 1.60 2.00
MTB > GPro. MTB > nooutfile
One last matter... For a data set of size n = 5, as given above, think about a formula that produces the percentiles in this example:
i 1 2 3 4 5 j .1 .3 .5 .7 .9
A little thought shows that j = (i-.5)/n. Various groups have chosen slightly different formulas for j, yielding slightly different normal scores for the plot. According to my notes, Minitab uses (i - 3/8)/(n + 1/4) and Data Desk uses (i - 1/3)/(n + 1/3). Each of these choices yields slightly different cum probs; for example, here are the Nscores generated by Minitab (which I moved from the plot above):
my example above: -1.28 -0.52 0.00 0.52 1.52 Minitab: -1.18 -0.50 0.00 0.50 1.18 Data Desk: -1.15 -0.49 0.00 0.49 1.15
But I think you will see that each of these three choices gives essentially the same plot.
Hope I've got that right, and that it helps--
============================================== Bruce King Department of Mathematics and Computer Science Western Connecticut State University 181 White Street Danbury, CT 06810 (firstname.lastname@example.org)