The Math Forum

Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Math Forum » Discussions » Policy and News » mathed-news

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: A further response to posting on Calculator Use
Replies: 0  

Advanced Search

Back to Topic List Back to Topic List  
Jerry P. Becker

Posts: 16,576
Registered: 12/3/04
A further response to posting on Calculator Use
Posted: Aug 31, 2013 7:11 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply
att1.html (25.0 K)

From Hugh Burkhardt, who writes: Your readers my be interested in a
paper on the strengths and limitations of technology in assessment:
Section 4 of "High stakes assessment in support of policy"
and references therein.

The essential problem is that current AI cannot handle the diversity
of the long chains of autonomous student reasoning involved in
"really doing mathematics". (As long as you stick to short items -
no problem) That limitations plies equally to "integrating learning

If you look at the things that AI people (rightly) claim as great
achievements, like a robot that will vacuum clean a room, this is not
surprising. The big successes are in systems with rigid rules, like

What follows is Section 4 of the paper "High stakes assessment in
support of policy" given at

4. The roles of IT

There is an understandable enthusiasm in various places for
computer-based assessment, although students who have expereienced
computer-based tests of mathematics are not always so enthusiastic,
(see Pead 2010, pp. 187-193). It appears to offer inexpensive testing
with instant reporting of results. Here we shall look at the power of
computer-based systems for the various phases of assessment: managing
the assessment system, presenting tasks to students, providing a
natural working medium for the student, capturing the student's
responses, scoring those responses, monitoring scoring, and reporting
the results. We shall see that IT, at least in its current state, is
invaluable for some of these purposes, useful for others, and very
weak for yet others.

* Managing the assessment system: Computers can be a powerful aid to
the processes involved in large-scale assessment, even with
conventional written tests. Scanning student responses saves paper
shipping and checking. Presenting responses to scorers on screen and
collecting their scores, item by item, within a task allows scorers
to work quickly, often at home, and collects data for reporting and
analysis. Inserting standard responses that check scorer reliability
facilitates monitoring. Most large scale test providers use such
systems - and, crucially, they present no obvious problems for a
wider range of task types.

* Presenting tasks to students: The potential gain of presenting
tasks on-screen is that it allows a wider variety of tasks to be
delivered in an examination setting. Video can be used to present
problem contexts much more vividly. Investigative 'microworlds' in
science or mathematics can help in assessing the processes of problem
solving and scientific reasoning, enabling students to explore,
analyze, infer and check the properties of a system that is new to

* Providing a natural working medium for the student: This is an
aspect that is often overlooked but, if students are to be able
really to show what they can do, the mode of working in examinations
must be reasonably natural and familiar. In language arts or history,
where reading and writing text dominate, word processors provide a
natural medium for working, and for constructing written responses.
This is a medium that is familiar to most students. However, in
mathematics and science, where paper and pencil jottings, sketch
graphs and diagrams, tables and mathematical notation are a central
part of the way most people think about problems, computers are a
clumsy and inhibiting medium. Inputting diagrams, fractions and
algebra is slow - a distraction from the problem and an unproductive
use of test time. The specialized software that is available takes
time to learn, implying changes in curriculum, and standards (see

* Capturing the students' responses: This is straightforward in the
case of multiple-choice or short constructed responses (such as a
number or a few words). It is problematic for richer, more open tasks
for the reasons explained in the previous point. Currently then, the
optimum way of capturing student responses to substantial tasks seems
to be through scanning their papers.

* Scoring those responses: Automatic scoring of student responses to
multiple-choice questions and simple, short answer constructed
responses to short items is effective and economical. While progress
is being made in scoring more complex responses, major challenges
remain to machine scoring of responses to complex tasks in
mathematics and science which, generally, involve sketch diagrams,
algebra, etc, set out in no particular sequence or pattern. There is
an ongoing danger that the administrative attractions of automatic
scoring tempts school systems to sharply limit the variety of task
types and the aspects of student performance that can be credited - a
prime example of the degradation of assessment through sacrificing
validity to statistical reliability and cost.

A different, formative role for automatic assessment is to use
computers to search for patterns in students' responses that reveal
how they are thinking about a mathematical concept(Stacey et. al.,
2009). It is a too-complex task for teachers to go much beyond
tallying number of items correct and observing major common errors.
However, with the right set of questions, a computer can report
diagnostic information to teachers that goes well beyond a measure of
how much a student knows. Moreover, this information can be provided
to teachers and students immediately, ready for input into the next

* Monitoring scoring: We have noted the role of computers in
managing and monitoring on-screen human scoring by injecting standard
responses from time to time. Computer scoring of essays has been used
to alert a second scorer, a valid and less expensive alternative to
double scoring all responses. (We know of no comparable development
for complex tasks in science or mathematics.)

* Reporting the results: Computers are an essential tool for
handling and reporting data for large scale assessments. However,
their limitations in the range of data they can capture mean that
there is currently no substitute for returning responses to teachers
and students, on screen if need be.

Most commercial computer-based assessment systems offer extensive
summary reports and statistical analyses of scores. These are popular
with school management, and are a major selling point. The ability to
return scored papers to students is less common though not, in
principle, impossible.

For designers of a high-quality assessment system, the principle is
clear: Use IT for those things where it is strong and avoid it for
those where it is weak. Look skeptically at the enthusiastic claims
for computer-based testing and scoring systems, especially where
their warrants for success come from other subject areas, and ask
whether they can assess the full range of types of performance in
mathematics required by CCSS. Test their assertions by asking them to
score some real samples of student work on complex tasks.

Sophisticated testing using batteries of multiple choice questions
can capture a large body of evidence from each student. "Adaptive
testing" improves this process by selecting the next question based
on previous answers. This can be valuable as part of the assessment
regime, particularly for "diagnostic testing" and rapid coverage of
the content curriculum, but currently it cannot test a student's
ability to autonomously tackle a substantial, worthwhile mathematical
problem, requiring an extended chain of reasoning, without being led
step-by-step through the solution with strong hints as to which
mathematical technique to apply at each step.

The danger, though, is that economic pressures will drive
computer-based assessment to deliver what is cheap and easy:
multiple-choice and short constructed answer tests with the same
narrow, fragmented focus that makes so many current 'mathematics'
tests invalid as assessments of mathematics.

Wider implications of IT

We conclude this section with some broader questions that need robust
answers before the use of IT in the assessment of mathematics is
extended. Since they imply changes in both curriculum and standards
as well as assessment, they go beyond the main focus of this paper.
So we will be brief.

The limitations in the usefulness of computers in assessing
mathematics is ironic because computers are central to doing
mathematics outside the classroom in everything from simple business
calculations to research in many subjects, including pure
mathematics. This is not yet reflected in schools where computers and
calculators are currently a useful supplement, rather than a powerful
replacement for traditional routine procedures. Current curricula and
tests mean that most students lack any fluency in the use of
spreadsheets[11], computer algebra systems, graphers, dynamic
geometry packages[12] and (the ultimate mathematical computing tool)
programming. These tools would enable them to realize the power of
the computer to develop and support their mathematical thinking. But
these aspects of mathematics are not yet integral to most curricula,
or to CCSS. This suggests the following questions for the future:

* Can computer-based assessments incorporate the authentic use of
computers as a mathematical tool? If students are fluent in the use
of spreadsheets and the other tools just mentioned, then the computer
will become a more natural medium for working, and assessment tasks
can be set, to be answered using these authentic mathematical
computing tools. This will require changes to the taught curriculum
to include the practical use of computers in mathematics - a
worthwhile end in itself. Students will learn transferable
mathematical IT skills with relevance beyond their school's brand of
online test platform.

* Would this help to improve assessment of mathematics? Paper-based
tasks frequently present a blank space for writing and attract there
a range of response elements including sketch graphs and diagrams,
tables and mathematical notation. While all of these can be handled
on a computer, students must either be proficient in using these
input devices before the test, or the devices must be very, very
simple (and hence constrained in what can be entered). There are
dangers: presenting the student with the appropriate tool at each
stage of the problem (e.g. a graph tool where a graph is expected)
can easily reduce an open task to a highly-scaffolded exercise which
does not assess the students' ability to autonomously choose and
apply the best tools and processes for the problem, or to develop
extended chains of reasoning. Inputting answers or other elements of
the response is an additional distraction to the students' thinking.
There are examples of 'microworlds' that are specifically designed to
capture student working but this area of development is still at an
early stage.

* What would you put in an 'essential software toolkit' that
students would be expected to become sufficiently familiar with to
use during assessments? We have listed above a range of candidate
tools, now used in a minority of schools. (Students will still need
desk space for paper and pencil sketching.) For each we should ask:
Would these tools embody transferable mathematical computing skills?
How, and to what extent should these be introduced into the
curriculum in typical schools? This is ultimately a societal
decision, as it has been over the many decades it has been dodged.

* In what ways would standards need to change to encourage the use
of such tools in curriculum and assessment? With the faithful
implementation of CCSS still to work on, this question is premature.
However, if the gross mismatch between the way mathematics is done
inside and outside school is to be addressed, it should be central to
the next revision. Meanwhile, we must focus on what can usefully be
achieved without such change.

* How can computers help in formative assessment? Here we can adopt
a more positive tone. Given the recognition in CCSS of the importance
of modeling, spreadsheets and other computer tools offer rich
possibilities for helping students develop their reasoning skills and
mathematical practices. At the simplest level, spreadsheets provide a
context for exploring relationships, between variables and with data,
that develops insight and provides a 'semi-concrete' bridge between
arithmetic and the greater abstraction of traditional algebra as a
modeling tool. More generally, the potential for use, in combination,
of online discussion boards, tablet PCs and wireless internet access,
to foster a collaborative classroom environment opens up new
possibilities (Webb, 2010).

It is clear that what emerges from further work on these questions is
likely to suggest changes in standards and curricula, as well as in
assessment, that belong in the future.

Pead (2010) discusses in further detail many of the points made in
this section. An adapted extract from this, describing a project to
develop a computer-delivered test using rich problem-solving tasks
appears in this issue (Pead, 2012).


Black, P. & Wiliam, D. (2012). The reliability of assessments pp
214-239 in John Gardner (ed.) Assessment and Learning. London: Sage.

Black, P., Harrison, C., Hodgen, J., Marshall, B. and Serret, N.
(2011) Can teachers' summative assessments produce dependable results
and also enhance classroom learning? Assessment in Education (in

Black, P. (2010) Assessment of and for Learning: improving the
quality and achieving a positive interaction. Invited paper presented
to the June 2010 meeting of representatives of the EU education
ministers. Brussels: European Union

Burkhardt, H. (2009) On Strategic Design. Educational Designer, 1(3).
(Retrieved 22 June 2012).

Dweck, C. S. (2000). Self-theories: their role in motivation,
personality and development. Philadelphia, PA: Psychology Press.

Elmore, R. (1999) Improving The Instructional Core, Harvard Graduate
School of Education.

Hayward, L., Dow, W. and Boyd, B. (2008) Sharing the Standard?
Project Report to Scottish Government. Edinburgh: Scottish Education

He, Qingping & Opposs, Dennis (2010) A Quantitative Investigation
into Public perceptions of Reliability in Examination Results in
England. Coventry : Office of Qualifications and Examinations
Regulation. Available for download on :

Pead, D. (2012). World Class Tests: Summative Assessment of
Problem-solving Using Technology. Educational Designer, 2(5)

Pead, D. (2010). On Computer-Based Assessment of Mathematics (PhD
Thesis). The University of Nottingham.

Pellegrino, J.W., Chudowsky, N. and Glaser, R. (2001) Knowing what
students know: the science and design of educational assessment.
Washington D.C: National Academy Press.Ch.6 p.253-5

Stacey, K., Price, B., Steinle, V., Chick, H., Gvozdenko, E. (2009)
SMART Assessment for Learning. Paper presented at the ISDDE
Conference in Cairns, Australia.

Stobart,G. (2001). The validity of National Curriculum Assessment.
British Journal of Educational Studies 49 (1) 26-29.

Stanley, G., MacCann, R., Gardner, J., Reynolds, L. and Wild, I.
(2009). Review of teacher assessment: evidence of what works best and
issues for development. Oxford: Oxford University Centre for
Educational Development.

Webb, M. (2010) Beginning teacher education and collaborative
formative e-assessment. Assessment and Evaluation in Higher
Education. 35(5) 597-618.

About the ISDDE Working Group on Examinations and Policy

This appendix sketches some of the experience of the authors of this
paper. It was developed from discussions of the working group, which
included, as well as the authors: Rita Crust, Frank Davis, Robert
Floden, Louis Gomez, Vinay Kathotia, Jean-Francois Nicaud, Matthew
Rascoff and Betsy Taleporos.

Paul Black worked as a physicist for twenty years before moving to a
chair in science education. He was Chair of the Task Group of
Assessment and Testing, which advised the UK Government on the design
of the National Curriculum assessment system. He has served on three
assessment advisory groups of the USA National Research Council, as
Visiting Professor at Stanford University, and as a member of the
Assessment Reform Group. He was Chief Examiner for A-Level Physics
for the largest UK examining board, and led the design of Nuffield
A-Level Physics. With Dylan Wiliam, he did the metanalysis of
research on formative assessment that sparked the current realization
of its potential for promoting student learning.

Hugh Burkhardt has directed a wide range of assessment-related Shell
Centre projects in both the US and the UK - often working with test
providers to improve the validity of their examinations. He is a
director of MARS, the Mathematics Assessment Resource Service, which
brings together the products and expertise of this work to help
education systems. This often links high-stakes assessment with
curriculum and professional development. Currently the team's
Mathematics Assessment Project, led by Malcolm Swan, is developing
tools for formative assessment and testing to support school systems
that are implementing CCSS. Hugh was the founding Chair of ISDDE.

Phil Daro was chair of the writing group that designed the Common
Core State Standards for Mathematics. That he was chosen for this
role reflects his wide range of experience as consultant and designer
at all levels from the classroom to state school systems - for
example, as a director of Balanced Assessment for the Mathematics
Curriculum, California and American Mathematics Projects, New
Standards Project in both Mathematics and ELA, and the current
Mathematics Assessment Project. He currently directs the development
of a middle school mathematics program inspired by the Japanese
curriculum, works on advancing the design and use of leadership tools
for change at every level of the educational system, and consults
with states and school districts on their accountability systems and
mathematics programs. He has served on national boards and committees
including: NAEP Validity Committee; RAND Mathematics Education
Research Panel; College Board Mathematics Framework Committee;
ACHIEVE Technical (Assessment) Advisory Group, Mathematics Work
Group; Technical Advisory Committee to National Goals Panel for World
Class Standards, National Governors Association; Commission organized
by Council of Chief State School Officers; Mathematical Sciences
Education Board of the National Research Council; and many others. He
is Vice-Chair of ISDDE.

Ian Jones is a Royal Society Research Fellow who worked with the
Shell Centre team on the mismatch between intentions and outcomes in
the design of high-stakes tests, particularly the UK Grade 10 GCSE

Glenda Lappan has led the design of a sequence of middle grades
mathematics projects in curriculum and professional development. She
is currently the Director of the Connected Mathematics Project and
Co-PI for the NSF-funded Center for the Study of Mathematics
Curriculum. She has served as a Program Director at the National
Science Foundation. She was President of the National Council of
Teachers of Mathematics during the development and release of the
NCTM Principles and Standards for School Mathematics. She is past
Chair of the Conference Board of the Mathematical Sciences and Vice
Chair of the US National Commission on Mathematics Instruction. From
1996 to 2003, she was appointed by the Secretary of Education to
serve on the National Education Research Policy and Priorities Board
for the Department of Education. Glenda shared a 2008 ISDDE Prize
with Elizabeth Phillips, for her work on Connected Mathematics.

Daniel Pead has worked in the design and development of educational
software, including small applets for mathematics education,
multimedia products, and computer-based assessment. He directs the IT
work of the Shell Centre team, which has included work on a number of
assessment projects, notably the World Class Tests of Problem Solving
in Mathematics, Science and Technology for the UK Government. A
recurring interest is how to produce computer-based materials which
support and encourage good teaching and assessment practice, ensuring
that the technology is a means to an end, not an end in itself. He
has recently completed a substantial study of computer-based
assessment in mathematics. He is the Secretary of ISDDE.

Max Stephens' current research interests are, on the one hand, in
student assessment and school improvement and, complementing this, in
studying how quite young students begin to move beyond calculation
with numbers and become able to make profound generalisations long
before they have meet formal algebra in high school. As manager of
Mathematics at the body which is now the Victorian Curriculum &
Assessment Authority, he was closely involved with the design and
implementation of extended assessment tasks in mathematics for the
Victorian Certificate of Education. He is a past president of the
Australian Association of Mathematics Teachers.

Jerry P. Becker
Dept. of Curriculum & Instruction
Southern Illinois University
625 Wham Drive
Mail Code 4610
Carbondale, IL 62901-4610
Phone: (618) 453-4241 [O]
(618) 457-8903 [H]
Fax: (618) 453-4244

Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2018. All Rights Reserved.