From Education Week [American Education's Newspaper of Record],
Wednesday, October 3, 2012, Volume 32, Issue 6, p. 23. See
Engineering Good Math Tests
By Hugh Burkhardt
Narrow math tests inevitably drive down real standards because
accountability pressures principals and teachers to teach to the test.
Conversely, well-engineered tests of the math we actually want people
to study and learn raise standards. It may not surprise you that
high-performing countries such as Singapore have better mathematics -
as defined in the Common Core State Standards - on their tests than
the United States does. More surprisingly, good tests are less
expensive in real terms.
There are worrying signs that the actual common-core assessments will
be too close to "business as usual," albeit computerized. If
so, most U.S. students and future citizens will be condemned to
further mediocrity in mathematics.
The need for better tests is accepted by business, industry, and
government. In 2009, President Barack Obama called on "our
nation's governors and state education chiefs to develop standards and
assessments that don't simply measure whether students can fill in a
bubble on a test, but whether they possess 21st-century skills like
problem-solving and critical thinking, entrepreneurship, and
Since then, the states have led the development of common standards in
mathematics that embody this broader vision, and two consortia of
states, the Smarter Balanced Assessment Consortium, or SBAC, and the
Partnership for Assessment of Readiness for College and Careers, or
PARCC, have been funded to develop assessments aligned with the
standards. Much progress has been made.
Everyone accepts that, when used as the linchpin of accountability,
tests are not "just" measurement, but often direct the
efforts of school employees and dominate what is taught in classrooms.
The SBAC "content specification" for the common-core math
assessment (which I helped write) features problem-solving and
modeling with mathematics, reasoning, and critiques of reasoning,
alongside the concepts and skills needed to make these possible.
SIDEBAR: "To find out if students can do
mathematics, we need to find how well they can create, critique, and
explain substantial chains of reasoning."
Crucially, it also includes many examples of assessment tasks that
show how these principles have been realized in math examinations in
the United States and around the world. Examples are harder to
misinterpret than descriptions. Teachers, students, and citizens
understand that items on the tests represent the types of tasks
students must learn to do.
The feedback to SBAC on this content specification has been
overwhelmingly positive. So what is the problem?
A strong undertow of fear appears to be pulling the system back to the
familiar. This is a test of our courage - a test our tests may
There is growing concern that test implementation will be a third-rate
realization of the common core - that the design and
"engineering" will not be good enough. The problems seem to
be caused by a mixture of fear and lack of experience and by a
decision making structure unsuitable for innovation. State assessment
directors are fearful of cost and litigation if their well-oiled
testing systems, already sometimes controversial, have to change.
High-quality examinations that cover the common core and meet
international standards are outside their experience and their zone of
In high-performing countries, mathematics-curriculum experts have
final say on the problems and scoring of the examinations.
Psychometricians are technical advisers. In the United States, the
practice has been turned upside down: Psychometricians too often have
the final say on the items in a test, while the mathematics experts
play a secondary role. SBAC and PARCC continue this upside-down
tradition that values technical measurement above accountability for
teaching and learning the core mathematics in the standards.
What is the problem with current tests? Multiple-choice tests and
their latest variant - computer-adaptive tests - measure with many
very short items. The grain size of these items is much smaller than
the basic concepts of mathematics. The items have a very indirect
relationship to the targets of instruction: the math in the standards.
In mathematical reasoning and problem-solving, the whole is more than
the sum of the parts. This is recognized in English/language arts,
where we assess substantial pieces of reading and writing.
To find out if students can do mathematics, we need to find how well
they can create, critique, and explain substantial chains of
reasoning. Multiple-choice tests cannot handle this, nor can their
computer-based variants. When you look at the
"technology-enhanced items" designed to assess "depth
of knowledge," you find that potentially rich tasks have been
broken into sequences of short items. This ignores the real target:
chains of student reasoning that may take diverse paths and be
expressed with words, sketch diagrams, and symbols in diverse ways.
Mathematics is not treated as a coherent body of mathematical content
and practices, but as fragments indirectly related to the target
knowledge. This makes a test that defines the targets of instruction
It is easy to do better. You ask students to tackle tasks that
represent the kinds of performance that you really want them to be
able to do, not proxy tasks that are easy to assess. As with writing,
you have them scored by trained human beings using specific rubrics
for each task that award points for the core elements of performance.
You audit the process to ensure reliable scoring. This is the way
examinations are run in other advanced countries.
A well-made test matches the depth and balance of the learning
targets. This involves selecting an appropriate balance of short items
and substantial performance tasks so that teachers who teach to the
test, as most teachers will, are led to deliver a balanced curriculum
that reflects the standards. This needs a "mathematics board,"
a body whose members are experts in math education and mathematics.
The consortia should establish such panels for task selection and test
Where will the tasks come from? Designing accessible assessment tasks
that demand substantial chains of reasoning is a challenging area of
educational design. Test vendors have little experience, and the
skills do not come quickly. However, there is a large international
literature of well-engineered tasks across this range that can be
licensed for use in tests. (Disclosure: A project in which I am
involved-the Mathematics Assessment Resource Service, or MARS-is
one not-for-profit source.)
And what of cost? Vendors charge a dollar or two for traditional
tests, and they only need a class period of testing time. People are
rightly concerned at "wasting" teaching and learning time.
Yet ask teachers how much time they spend on otherwise unproductive
test preparation. Typical responses are that test prep for state tests
takes 20 days a year. That's more than 10 percent of teachers' time
and, worse, more than 10 percent of the students' learning time. This
is the real cost of aiming at a cheap target.
Good tests cost a bit more than computer-based tests. How much depends
on how you manage them. One inexpensive model is to make scoring
training and actual scoring part of each teacher's job. This is
high-quality professional development, showing teachers what is valued
in math performance and what other students can do. If this takes two
days a year, you are still well ahead on the test-prep clock, with
many more days for real teaching and learning than with artificial
What about test prep for good tests? With "tests worth teaching
to," that is something you want. Test tasks are valuable learning
experiences. The test itself is not a waste of learning time; it is
instead exactly the task for which teaching prepares you.
Hugh Burkhardt has, since 1982, led a series of assessment
projects with test providers in the United States and the United
Kingdom who sought to align their mathematics tests with learning
goals. He is based at the University of Nottingham's Shell Center in
England, where he works with the Mathematics Assessment Project of the
Mathematics Assessment Resource Service, and the University of
California, Berkeley. He founded the International Society for Design
and Development in Education and chairs the advisory board of its
e-journal, Educational Designer.
Jerry P. Becker
Dept. of Curriculum & Instruction
Southern Illinois University
625 Wham Drive
Mail Code 4610
Carbondale, IL 62901-4610
Phone: (618) 453-4241 [O]
(618) 457-8903 [H]
Fax: (618) 453-4244