Date: Oct 7, 2012 3:37 PM Author: Jerry P. Becker Subject: [ncsm-members] Engineering Good Math Tests *****************************

From Education Week [American Education's Newspaper of Record],

Wednesday, October 3, 2012, Volume 32, Issue 6, p. 23. See

http://www.edweek.org/ew/articles/2012/10/03/06burkhardt.h32.html?tkn=WTOFJ6xjinoyM6BBw1BUuskmalCVVBBXrnse&cmp=clp-edweek#comments

******************************

Commentary

Engineering Good Math Tests

By Hugh Burkhardt

Narrow math tests inevitably drive down real standards because

accountability pressures principals and teachers to teach to the

test. Conversely, well-engineered tests of the math we actually want

people to study and learn raise standards. It may not surprise you

that high-performing countries such as Singapore have better

mathematics - as defined in the Common Core State Standards - on

their tests than the United States does. More surprisingly, good

tests are less expensive in real terms.

There are worrying signs that the actual common-core assessments will

be too close to "business as usual," albeit computerized. If so, most

U.S. students and future citizens will be condemned to further

mediocrity in mathematics.

The need for better tests is accepted by business, industry, and

government. In 2009, President Barack Obama called on "our nation's

governors and state education chiefs to develop standards and

assessments that don't simply measure whether students can fill in a

bubble on a test, but whether they possess 21st-century skills like

problem-solving and critical thinking, entrepreneurship, and

creativity."

Since then, the states have led the development of common standards

in mathematics that embody this broader vision, and two consortia of

states, the Smarter Balanced Assessment Consortium, or SBAC, and the

Partnership for Assessment of Readiness for College and Careers, or

PARCC, have been funded to develop assessments aligned with the

standards. Much progress has been made.

Everyone accepts that, when used as the linchpin of accountability,

tests are not "just" measurement, but often direct the efforts of

school employees and dominate what is taught in classrooms. The SBAC

"content specification" for the common-core math assessment (which I

helped write) features problem-solving and modeling with mathematics,

reasoning, and critiques of reasoning, alongside the concepts and

skills needed to make these possible.

---------------------------------

SIDEBAR: "To find out if students can do mathematics, we need to

find how well they can create, critique, and explain substantial

chains of reasoning."

---------------------------------

Crucially, it also includes many examples of assessment tasks that

show how these principles have been realized in math examinations in

the United States and around the world. Examples are harder to

misinterpret than descriptions. Teachers, students, and citizens

understand that items on the tests represent the types of tasks

students must learn to do.

The feedback to SBAC on this content specification has been

overwhelmingly positive. So what is the problem?

A strong undertow of fear appears to be pulling the system back to

the familiar. This is a test of our courage - a test our tests may

fail.

There is growing concern that test implementation will be a

third-rate realization of the common core - that the design and

"engineering" will not be good enough. The problems seem to be caused

by a mixture of fear and lack of experience and by a decision making

structure unsuitable for innovation. State assessment directors are

fearful of cost and litigation if their well-oiled testing systems,

already sometimes controversial, have to change. High-quality

examinations that cover the common core and meet international

standards are outside their experience and their zone of comfort.

In high-performing countries, mathematics-curriculum experts have

final say on the problems and scoring of the examinations.

Psychometricians are technical advisers. In the United States, the

practice has been turned upside down: Psychometricians too often have

the final say on the items in a test, while the mathematics experts

play a secondary role. SBAC and PARCC continue this upside-down

tradition that values technical measurement above accountability for

teaching and learning the core mathematics in the standards.

What is the problem with current tests? Multiple-choice tests and

their latest variant - computer-adaptive tests - measure with many

very short items. The grain size of these items is much smaller than

the basic concepts of mathematics. The items have a very indirect

relationship to the targets of instruction: the math in the

standards. In mathematical reasoning and problem-solving, the whole

is more than the sum of the parts. This is recognized in

English/language arts, where we assess substantial pieces of reading

and writing.

To find out if students can do mathematics, we need to find how well

they can create, critique, and explain substantial chains of

reasoning. Multiple-choice tests cannot handle this, nor can their

computer-based variants. When you look at the "technology-enhanced

items" designed to assess "depth of knowledge," you find that

potentially rich tasks have been broken into sequences of short

items. This ignores the real target: chains of student reasoning that

may take diverse paths and be expressed with words, sketch diagrams,

and symbols in diverse ways. Mathematics is not treated as a coherent

body of mathematical content and practices, but as fragments

indirectly related to the target knowledge. This makes a test that

defines the targets of instruction invalid.

It is easy to do better. You ask students to tackle tasks that

represent the kinds of performance that you really want them to be

able to do, not proxy tasks that are easy to assess. As with writing,

you have them scored by trained human beings using specific rubrics

for each task that award points for the core elements of performance.

You audit the process to ensure reliable scoring. This is the way

examinations are run in other advanced countries.

A well-made test matches the depth and balance of the learning

targets. This involves selecting an appropriate balance of short

items and substantial performance tasks so that teachers who teach to

the test, as most teachers will, are led to deliver a balanced

curriculum that reflects the standards. This needs a "mathematics

board," a body whose members are experts in math education and

mathematics. The consortia should establish such panels for task

selection and test balancing.

Where will the tasks come from? Designing accessible assessment tasks

that demand substantial chains of reasoning is a challenging area of

educational design. Test vendors have little experience, and the

skills do not come quickly. However, there is a large international

literature of well-engineered tasks across this range that can be

licensed for use in tests. (Disclosure: A project in which I am

involved-the Mathematics Assessment Resource Service, or MARS-is one

not-for-profit source.)

And what of cost? Vendors charge a dollar or two for traditional

tests, and they only need a class period of testing time. People are

rightly concerned at "wasting" teaching and learning time. Yet ask

teachers how much time they spend on otherwise unproductive test

preparation. Typical responses are that test prep for state tests

takes 20 days a year. That's more than 10 percent of teachers' time

and, worse, more than 10 percent of the students' learning time. This

is the real cost of aiming at a cheap target.

Good tests cost a bit more than computer-based tests. How much

depends on how you manage them. One inexpensive model is to make

scoring training and actual scoring part of each teacher's job. This

is high-quality professional development, showing teachers what is

valued in math performance and what other students can do. If this

takes two days a year, you are still well ahead on the test-prep

clock, with many more days for real teaching and learning than with

artificial tests.

What about test prep for good tests? With "tests worth teaching to,"

that is something you want. Test tasks are valuable learning

experiences. The test itself is not a waste of learning time; it is

instead exactly the task for which teaching prepares you.

----------------------------------------

Hugh Burkhardt has, since 1982, led a series of assessment projects

with test providers in the United States and the United Kingdom who

sought to align their mathematics tests with learning goals. He is

based at the University of Nottingham's Shell Center in England,

where he works with the Mathematics Assessment Project of the

Mathematics Assessment Resource Service, and the University of

California, Berkeley. He founded the International Society for Design

and Development in Education and chairs the advisory board of its

e-journal, Educational Designer.

*********************************************

--

Jerry P. Becker

Dept. of Curriculum & Instruction

Southern Illinois University

625 Wham Drive

Mail Code 4610

Carbondale, IL 62901-4610

Phone: (618) 453-4241 [O]

(618) 457-8903 [H]

Fax: (618) 453-4244

E-mail: jbecker@siu.edu