The Math Forum

Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Math Forum » Discussions » Professional Associations » nyselmath

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: nyt>Grading Standardized Tests
Replies: 0  

Advanced Search

Back to Topic List Back to Topic List  
Roberta M. Eisenberg

Posts: 23
Registered: 8/31/09
nyt>Grading Standardized Tests
Posted: Sep 28, 2009 7:15 AM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply
att1.html (10.4 K)
nytlogo152x23.gif (1.1 K)
oped500.jpg (94.4 K)

Bobbi Eisenberg


Reading Incomprehension

Published: September 27, 2009
LAST week, Education Secretary Arne Duncan acknowledged standardized
tests are flawed measures of student progress. But the problem is not
so much the tests themselves ? it?s the people scoring them.


Tucker Nichols

Times Topics: Arne Duncan
Many people remember those tests as lots of multiple-choice questions
answered by marking bubbles with a No. 2 pencil, but today?s exams
nearly always include the sort of ?open ended? items where
students fill up the blank pages of a test booklet with their own
thoughts and words. On many tests today, a good number of points come
from such open-ended items, and that?s where the real trouble begins.

Multiple-choice items are scored by machines, but open-ended items
are be scored by subjective humans who are prone to errors. I know
because I was one of them. In 1994, I was a graduate student looking
for part-time work. After a five-minute interview I got the job of
scoring fourth-grade, state-wide reading comprehension tests. The for-
profit testing company that hired me paid almost $8 an hour, not bad
money for me at the time.

One of the tests I scored had students read a passage about bicycle
safety. They were then instructed to draw a poster that illustrated a
rule that was indicated in the text. We would award one point for a
poster that included a correct rule and zero for a drawing that did not.

The first poster I saw was a drawing of a young cyclist, a helmet
tightly attached to his head, flying his bike over a canal filled
with flaming oil, his two arms waving wildly in the air. I stared at
the response for minutes. Was this a picture of a helmet-wearing
child who understood the basic rules of bike safety? Or was it meant
to portray a youngster killing himself on two wheels?

I was not the only one who was confused. Soon several of my fellow
scorers ? pretty much people off the street, like me ? were
debating my poster, some positing that it clearly showed an
understanding of bike safety while others argued that it most
certainly did not. I realized then ? an epiphany confirmed over a
decade and a half of experience in the testing industry ? that the
score any student would earn mostly depended on which temporary
employee viewed his response.

A few years later, still a part-time worker, I had a similar
experience. For one project our huge group spent weeks scoring ninth-
grade movie reviews, each of us reading approximately 30 essays an
hour (yes, one every two minutes), for eight hours a day, five days a
week. At one point the woman beside me asked my opinion about the
essay she was reading, a review of the X-rated movie ?Debbie Does
Dallas.? The woman thought it deserved a 3 (on a 6-point scale), but
she settled on that only after weighing the student?s strong writing
skills against the ?inappropriate? subject matter. I argued the
essay should be given a 6, as the comprehensive analysis of the movie
was artfully written and also made me laugh my head off.

All of the 100 or so scorers in the room soon became embroiled in the
debate. Eventually we came to the ?consensus? that the essay
deserved a 6 (?genius?), or 4 (well-written but ?naughty?), or
a zero (?filth?). The essay was ultimately given a zero.

This kind of arbitrary decision is the rule, not the exception. The
years I spent assessing open-ended questions convinced me that large-
scale assessment was mostly a mad scramble to score tests, meet
deadlines and rake in cash.

The cash, though, wasn?t bad. It was largely for this reason that I
eventually became a project director for a private testing company.
The scoring standards were still bleak. A couple of years ago I
supervised a statewide reading assessment test. My colleague and I
were relaxing at a pool because we believed we?d already finished
scoring all of the tens of thousands of student responses. Then a
call from the home office informed us that a couple of dozen unscored
tests had been discovered.

Because our company?s deadline for returning the tests was that day,
my colleague and I had to score them even though we were already well
into happy hour. We spent the evening listening to a squeaky-voiced
secretary read student answers to us over a scratchy speakerphone
line, while we made decisions that could affect somebody?s future.

These are the kinds of tests, after all, that can help determine
government financing for schools. There is already much debate over
whether the progress that Secretary Duncan hopes to measure can be
determined by standardized testing at all. But in the meantime, we
can give more thought to who scores these tests. We could start by
requiring that scoring be done only by professionals who have made a
commitment to education ? rather than by people like me.

Todd Farley is the author of the forthcoming ?Making the Grades: My
Misadventures in the Standardized Testing Industry.?

A version of this article appeared in print on September 28, 2009, on
page A23 of the New York edition.

Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2018. All Rights Reserved.