Richard L. Scheaffer University of Florida Chief Faculty Consultant, AP Statistics
In recent years, much discussion has take place around the role of computers in teaching introductory (pre-calculus) statistics. Since the AP Statistics Course Description has been out, this discussion has broadened to include a new audience of high school teachers. I have an abiding interest in teaching introductory statistics and had something to do with that Course Description, and so I will add my thoughts and opinions to this discussion.
In recent times the argument has moved from "some technology v. no technology" to "computer technology v. graphing calculator technology." This is a positive step, for those of us who have been around awhile, for we still have colleagues who insist on students remembering the "short- cut" formulas for calculations of certain statistics, and make them work through numerous examples by hand. These hand calculation skills will be of little help on an AP exam. All formulas used on the exam are given in the form that was thought to be most meaningful for understanding the underlying concept, not for simplifying calculations.
Can a student do well in introductory statistics without ever touching a computer? Yes. Can a student perform well on the AP Statistics exam without having computer experience? Yes. In fact, a student who understands statistics can do well on an AP exam with just a scientific calculator and may do OK with NO calculator. Calculation is not the key to success here! The policy is, though, that students are expected to have a graphing calculator for the exam.
The AP Statistics Course Description recommends, however, that students get some experience with modern statistical software sometime during the course. Why do I think this is a sound and reasonable policy for a course of the type we are promoting here?
The AP course emphasizes data collection, summarization and analysis as the basis for decision making under uncertainty. It is designed to be a course about the practice of modern statistics, but taught so that the practitioners understand the underlying concepts that are at work. Since we are not going to prove any theorems, the only way for students to understand these concepts is to provide them with empirical evidence. That empirical evidence comes about most efficiently and effectively through the use of a computer. I will embellish this general comment in just two areas, exploratory data analysis and simulation in inference.
Exploratory data analysis is much more than drawing a boxplot or two, which can be done on a graphing calculator (albeit without scales on the axes). It is sometimes defined as the art of seeing into the data through revelation, residuals, re-expression, and resistance. Revelation comes about first by looking at various plots of the data (stemplots, boxplots, dotplots, scatterplots, matrix scatterplots, three-dimensional plots, etc.). Modern software has a host of plots most students have never seen before and allows for the tailoring of these plots to emphasize certain features of the data. Also, the plots might be linked so that a potential influential observation highlighted in a scatterplot will show up on a histogram or a stemplot, or in the data set itself, allowing connections to be made. Exploration is only of interest, however, on real data sets, many of which are too large to be entered into a graphing calculator (although this is only a temporary problem).
Residuals have to do with fitting a model to the data and looking at the difference between the model predictions and the observed data points. Modern computer software will allow rapid fitting of a wide variety of models very quickly, and will automatically store the residuals and standardized residuals for future analysis, including the exploratory plots mentioned above. A key concept of model fitting is that of influential observations. Points that do not fit the pattern can be isolated, moved or deleted quickly and the effect on the fit of the model can be readily observed. In fact, some software programs allow the regression line to move about on the screen while points are being added, deleted, or moved. Such dynamic demonstrations are very effective in teaching concepts.
Re-expression means transformations. Data can be transformed by a wide array of built-in functions, stored and used in model fitting, often with just a single command, by modern statistical software. The linking of data sets allows for dynamic changes to, say, a scatterplot to be viewed on the screen while the transformation is taking place. This is one of the most effective ways I've ever found to show students what a power transformation does.
Resistance means making use of statistics that protect the analysis against unusual data points, like using the median rather than the mean as a measure of center. Modern software includes a variety of techniques for fitting resistant models, including resistant regression lines and time series smoothing.
This leads me to the next section, the use of simulation in inference. Everyone has a favorite technique for illustrating the Central Limit Theorem for means of random samples through a simulation. At this point, these are cumbersome to carry out on a graphing calculator, although simple ones can be done. These types of simulations can easily be extended to simulations of the behavior of confidence intervals for a mean. Suppose we want to illustrate how the sampling distribution (and confidence interval) for the median compares to that for the mean. Easily done on most computers. Similarly, suppose we want to illustrate the sampling distribution of the maximum of a sample, or the correlation coefficient, or the sample standard deviation. All of these are easily accomplished with modern software, and all add to the learning experience of the students.
Now, I can here the argument that much of this material mentioned above is not in the AP outline. We do not have to do sampling distributions for the median or for a maximum, for example. True enough. But, the idea of an AP course is to make it an enriching experience for the student. Having some experience with medians and maxima, for example, will help the understanding of the concept of sampling distribution and will show how techniques generalize to a larger class of applications. (Students who study French learn something about English. Students who study physics learn a little about applied mathematics.)
There is, in addition to the above, the small point that computer experience will help the student with his or her college work in statistics. A student may get by in high school by completing all data analysis assignments on a calculator, but that is not going to happen in college.
The real question, then, is "What do you want your students to get out of the course?" If a passing grade on the AP exam is the only objective, then the computer is not essential. If the goal is to present an interesting, lively, enriching modern course that will help students pass the AP exam and understand concepts that will improve their practice of statistics in the future, then the computer becomes essential, in my opinion. The question is not one of using a sledgehammer on a tack. It is one of understanding when, why and how to use a tack as opposed to when, why and how to use a staple or a spike.
For reference. see Hoaglin and Moore, Perspectives on Contemporary Statistics, MAA Notes no. 21, 1992.