The Math Forum

Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Math Forum » Discussions » Courses » ap-stat

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: [ap-stat] Confounding variables
Replies: 3   Last Post: Nov 13, 2005 2:35 PM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Paul Velleman

Posts: 1,607
Registered: 12/6/04
[ap-stat] Re: Confounding variables
Posted: Nov 13, 2005 8:21 AM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

A lurking variable doesn't just affect the apparent relationship
between two variables. An interaction term might do that but wouldn't
be lurking. A lurking variable directly affects both X and Y and
thereby makes it appear that X and Y are directly related to each
other when, without the lurking variable, they would not be or would
not be to that extent or in that direction.

My favorite is the strong positive association between the number of
firefighters at a fire and the amount of damage. Perhaps you
shouldn't call the fire department.
The lurking variable is the size of the blaze, which "causes" both
damage and fire fighters.

It isn't necessary for lurking variables to be unobserved. Often they
are lying around in the open in the data or available in related
data. It is their joint affect on two variables, creating the
spurious appearance of a relationship that makes them "lurk". The
best lurking variable stories hint at spurious causation: storks
"causing" births, firefighters "causing" damage, TV's "causing"
greater life expectancy, and so on. But cases where we actually had a
right to conclude causation are rare because they would only show up
in the context of a designed experiment and that would mean that
somehow the randomization had broken down. So often lurking variable
examples are cautionary tales warning us not to infer causation from

Confounded variables vary together so that one cannot tease apart
which is responsible for any observed effect. But only predictors (or
factors in an experiment) are said to be confounded. An external
variable that is correlated with our response, but not associated
with our factors is not a confounder because we will still be able to
observe the effect of the factors on the response. Confounding can
occur due to poor design in an experiment (offer both a low interest
rate and low fee to one group of customers and a higher interest rate
and higher fee to another; you'll never be able to tell whether
customers were more motivated by the difference in interest rate or
the difference in fee.) But confounding can also simply be
structural. Any study that takes place over time risks that something
else will happen during that time that will vary with the treatments
and could be responsible for the observed effect -- but we'll never
know. Confounding is related to collinearity in linear models. In
both, we can't separate the effects of two (or more) potential
predictors because of the structure of the data, but there needn't be
any particularly strong association with the response variable.

So, a hint: lurking variables are most common in observational
studies. If we are *observing* two variables over time, for example,
then an external change (seasonal, political, economic, or whatever)
could affect both of them and thereby be a lurking variable. They are
much less common in designed experiments because we randomize to
avoid such things. Lurking variables will show up only if we fail to
randomize completely or correctly. If we are *controlling* a
treatment and observing a response over time, an external change
wouldn't lurk because it couldn't affect our treatment variable.

Confounding, however, can show up in an experiment either through
poor design or just because there is no reasonable way to avoid it.
Things happen that are not under our control, but might be confounded
with the factors, making it impossible to tell whether our treatment
or the external change was responsible for the response. The best
defense is to record as much supplementary information (temperature,
precipitation, economic conditions, whatever might matter) so that
these could be used as covariates in our analysis. Even that might
not cure the problem because the external variables might still be
collinear with our factors and impossible to separate from the
predictors we care about.

(Of course, the concepts of multiple predictors, covariates, and
collinearity are beyond the scope of the AP syllabus, which is
probably part of what makes these topics confusing here.)

But in the final analysis (as it were), it really doesn't matter what
we call them. The important thing for students to recognize is that
variables other than those named in the study may be closely
associated with one or several variables. When we learn about a
study, we should be skeptical and think about possible external
variables. When we find (or think we may have found) some, we must
take extra care in interpreting the models, and we should be
especially cautious about inferring causation. If differentiating
lurking from confounding variables helps students to understand this,
then go for it. But if it becomes just another trivial distinction
for them to memorize then I say to hell with it. Focus on what
happens to the models and how to interpret them with care when other
variables might affect the data.

-- Paul


Course related websites:
AP Central:
ap-stat is an Electronic Discussion Group (EDG) of The College Board, 45 Columbus Avenue, New York, NY 10023-6992

To UNSUBSCRIBE or CHANGE YOUR EMAIL ADDRESS, please use the list website:

Send questions about the list to

Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2018. All Rights Reserved.