Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: How tests can drop the ball.
Replies: 0

 Jerry P. Becker Posts: 16,576 Registered: 12/3/04
How tests can drop the ball.
Posted: Sep 19, 2000 5:06 PM

*********************************
From the New York Times on the Web, Wednesday, September 13, 2000. See
http://www.nytimes.com/2000/09/13/national/13LESS.html . Our thanks
to Carol Bohlin for bringing this article to our attention.
*********************************
LESSONS

How Tests Can Drop The Ball

By Richard Rothstein

MIKE PIAZZA, batting .332, could win this year's Most Valuable Player
award. He has been good every year, with a .330 career average, twice
a runner-up for m.v.p. and a member of each All- Star team since his
rookie season.

The Mets reward Piazza for this high achievement, at the rate of \$13
million a year.

But what if the team decided to pay him based not on overall
performance but on how he hit during one arbitrarily chosen week? How
well do one week's at-bats describe the ability of a true .330 hitter?

Not very. Last week Piazza batted only .200. But in the second week
of August he batted .538. If you picked a random week this season,
you would have only a 7-in-10 chance of choosing one in which he hit
.250 or higher.

Are standardized-test scores, on which many schools rely heavily to
make promotion or graduation decisions, more indicative of true
ability than a ballplayer's weekly average?

Not really. David Rogosa, a professor of educational statistics at
Stanford University, has calculated the "accuracy" of tests used in
California to abolish social promotion. (New York uses similar tests.)

Consider, Dr. Rogosa says, a fourth-grade student whose "true"
reading score is exactly at grade level (the 50th percentile). The
chances are better than even (58 percent) that this student will
score either above the 55th percentile or below the 45th on any one
test.

Results for students at other levels of true performance are also
surprisingly inconsistent. So if students are held back, required to
attend summer school or denied diplomas largely because of a single
test, many will be punished unfairly.

About half of fourth-grade students held back for scores below the
30th percentile on a typical reading test will actually have "true"
scores above that point. On any particular test, nearly 7 percent of
students with true scores at the 40th percentile will likely fail,
scoring below the 30th percentile.

Are Americans prepared to require large numbers of students to repeat
a grade when they deserve promotion?

Professor Rogosa's analysis is straightforward. He has simply
converted technical reliability information from test publishers
(Harcourt Educational Measurement, in this case) to more
understandable "accuracy" guides.

Test publishers calculate reliability by analyzing thousands of
student tests to estimate chances that students who answer some
questions correctly will also answer others correctly. Because some
students at any performance level will miss questions that most
students at that level get right, test makers can estimate the
reliability of each question and of an entire test.

Typically, districts and states use tests marketed as having high
reliability. Yet few policy makers understand that seemingly high
reliability assures only rough accuracy - for example, that true 80th
percentile students will almost always have higher scores than true
20th percentile students.

But when test results are used for high-stakes purposes like
promotion or graduation decisions, there should be a different
concern: How well do they identify students who are truly below a
cutoff point like the 30th percentile? As Dr. Rogosa has shown, the
administering of a single test may do a poor job of this.

Surprisingly, there has not yet been a wave of lawsuits by parents of
children penalized largely because of a single test score. As more
parents learn about tests' actual accuracy, litigation regarding
high-stakes decisions is bound to follow. Districts and states will
then have to abandon an unfair reliance on single tests to evaluate
students.

When Mike Piazza comes to bat, he may face a pitcher who fools him
more easily than most pitchers do, or fools him more easily on that
day. Piazza may not have slept well the night before, the lights may
bother him, or he may be preoccupied by a problem at home. On
average, over a full season, the distractions do not matter much, and
the Mets benefit from his overall ability.

Likewise, when a student takes a test, performance is affected by
random events. He may have fought with his sister that morning. A
test item may stimulate daydreams not suggested by items in similar
tests, or by the same test on a different day. Despite a teacher's
warning to eat a good breakfast, he may not have done so.

If students took tests over and over, average accuracy would improve,
just as Mike Piazza's full-season batting average more accurately
reflects his hitting prowess. But school is not baseball; if students
took tests every day, there would be no time left for learning.

So to make high-stakes decisions, like whether students should be
promoted or attend summer school, giving great importance to a single
test is not only bad policy but extraordinarily unfair. Courts are
unlikely to permit it much longer.
**********************************************
--
Jerry P. Becker
Dept. of Curriculum & Instruction
Southern Illinois University
Carbondale, IL 62901-4610 USA
Phone: (618) 453-4241 [O]
(618) 457-8903 [H]
Fax: (618) 453-4244
E-mail: jbecker@siu.edu