The essential problem is that current AI cannot handle the diversity of the long chains of autonomous student reasoning involved in "really doing mathematics". (As long as you stick to short items - no problem) That limitations plies equally to "integrating learning systems".
If you look at the things that AI people (rightly) claim as great achievements, like a robot that will vacuum clean a room, this is not surprising. The big successes are in systems with rigid rules, like chess.
********************************** 4. The roles of IT
There is an understandable enthusiasm in various places for computer-based assessment, although students who have expereienced computer-based tests of mathematics are not always so enthusiastic, (see Pead 2010, pp. 187-193). It appears to offer inexpensive testing with instant reporting of results. Here we shall look at the power of computer-based systems for the various phases of assessment: managing the assessment system, presenting tasks to students, providing a natural working medium for the student, capturing the student's responses, scoring those responses, monitoring scoring, and reporting the results. We shall see that IT, at least in its current state, is invaluable for some of these purposes, useful for others, and very weak for yet others.
* Managing the assessment system: Computers can be a powerful aid to the processes involved in large-scale assessment, even with conventional written tests. Scanning student responses saves paper shipping and checking. Presenting responses to scorers on screen and collecting their scores, item by item, within a task allows scorers to work quickly, often at home, and collects data for reporting and analysis. Inserting standard responses that check scorer reliability facilitates monitoring. Most large scale test providers use such systems - and, crucially, they present no obvious problems for a wider range of task types.
* Presenting tasks to students: The potential gain of presenting tasks on-screen is that it allows a wider variety of tasks to be delivered in an examination setting. Video can be used to present problem contexts much more vividly. Investigative 'microworlds' in science or mathematics can help in assessing the processes of problem solving and scientific reasoning, enabling students to explore, analyze, infer and check the properties of a system that is new to them.
* Providing a natural working medium for the student: This is an aspect that is often overlooked but, if students are to be able really to show what they can do, the mode of working in examinations must be reasonably natural and familiar. In language arts or history, where reading and writing text dominate, word processors provide a natural medium for working, and for constructing written responses. This is a medium that is familiar to most students. However, in mathematics and science, where paper and pencil jottings, sketch graphs and diagrams, tables and mathematical notation are a central part of the way most people think about problems, computers are a clumsy and inhibiting medium. Inputting diagrams, fractions and algebra is slow - a distraction from the problem and an unproductive use of test time. The specialized software that is available takes time to learn, implying changes in curriculum, and standards (see below).
* Capturing the students' responses: This is straightforward in the case of multiple-choice or short constructed responses (such as a number or a few words). It is problematic for richer, more open tasks for the reasons explained in the previous point. Currently then, the optimum way of capturing student responses to substantial tasks seems to be through scanning their papers.
* Scoring those responses: Automatic scoring of student responses to multiple-choice questions and simple, short answer constructed responses to short items is effective and economical. While progress is being made in scoring more complex responses, major challenges remain to machine scoring of responses to complex tasks in mathematics and science which, generally, involve sketch diagrams, algebra, etc, set out in no particular sequence or pattern. There is an ongoing danger that the administrative attractions of automatic scoring tempts school systems to sharply limit the variety of task types and the aspects of student performance that can be credited - a prime example of the degradation of assessment through sacrificing validity to statistical reliability and cost.
A different, formative role for automatic assessment is to use computers to search for patterns in students' responses that reveal how they are thinking about a mathematical concept(Stacey et. al., 2009). It is a too-complex task for teachers to go much beyond tallying number of items correct and observing major common errors. However, with the right set of questions, a computer can report diagnostic information to teachers that goes well beyond a measure of how much a student knows. Moreover, this information can be provided to teachers and students immediately, ready for input into the next lesson.
* Monitoring scoring: We have noted the role of computers in managing and monitoring on-screen human scoring by injecting standard responses from time to time. Computer scoring of essays has been used to alert a second scorer, a valid and less expensive alternative to double scoring all responses. (We know of no comparable development for complex tasks in science or mathematics.)
* Reporting the results: Computers are an essential tool for handling and reporting data for large scale assessments. However, their limitations in the range of data they can capture mean that there is currently no substitute for returning responses to teachers and students, on screen if need be.
Most commercial computer-based assessment systems offer extensive summary reports and statistical analyses of scores. These are popular with school management, and are a major selling point. The ability to return scored papers to students is less common though not, in principle, impossible.
For designers of a high-quality assessment system, the principle is clear: Use IT for those things where it is strong and avoid it for those where it is weak. Look skeptically at the enthusiastic claims for computer-based testing and scoring systems, especially where their warrants for success come from other subject areas, and ask whether they can assess the full range of types of performance in mathematics required by CCSS. Test their assertions by asking them to score some real samples of student work on complex tasks.
Sophisticated testing using batteries of multiple choice questions can capture a large body of evidence from each student. "Adaptive testing" improves this process by selecting the next question based on previous answers. This can be valuable as part of the assessment regime, particularly for "diagnostic testing" and rapid coverage of the content curriculum, but currently it cannot test a student's ability to autonomously tackle a substantial, worthwhile mathematical problem, requiring an extended chain of reasoning, without being led step-by-step through the solution with strong hints as to which mathematical technique to apply at each step.
The danger, though, is that economic pressures will drive computer-based assessment to deliver what is cheap and easy: multiple-choice and short constructed answer tests with the same narrow, fragmented focus that makes so many current 'mathematics' tests invalid as assessments of mathematics.
Wider implications of IT
We conclude this section with some broader questions that need robust answers before the use of IT in the assessment of mathematics is extended. Since they imply changes in both curriculum and standards as well as assessment, they go beyond the main focus of this paper. So we will be brief.
The limitations in the usefulness of computers in assessing mathematics is ironic because computers are central to doing mathematics outside the classroom in everything from simple business calculations to research in many subjects, including pure mathematics. This is not yet reflected in schools where computers and calculators are currently a useful supplement, rather than a powerful replacement for traditional routine procedures. Current curricula and tests mean that most students lack any fluency in the use of spreadsheets, computer algebra systems, graphers, dynamic geometry packages and (the ultimate mathematical computing tool) programming. These tools would enable them to realize the power of the computer to develop and support their mathematical thinking. But these aspects of mathematics are not yet integral to most curricula, or to CCSS. This suggests the following questions for the future:
* Can computer-based assessments incorporate the authentic use of computers as a mathematical tool? If students are fluent in the use of spreadsheets and the other tools just mentioned, then the computer will become a more natural medium for working, and assessment tasks can be set, to be answered using these authentic mathematical computing tools. This will require changes to the taught curriculum to include the practical use of computers in mathematics - a worthwhile end in itself. Students will learn transferable mathematical IT skills with relevance beyond their school's brand of online test platform.
* Would this help to improve assessment of mathematics? Paper-based tasks frequently present a blank space for writing and attract there a range of response elements including sketch graphs and diagrams, tables and mathematical notation. While all of these can be handled on a computer, students must either be proficient in using these input devices before the test, or the devices must be very, very simple (and hence constrained in what can be entered). There are dangers: presenting the student with the appropriate tool at each stage of the problem (e.g. a graph tool where a graph is expected) can easily reduce an open task to a highly-scaffolded exercise which does not assess the students' ability to autonomously choose and apply the best tools and processes for the problem, or to develop extended chains of reasoning. Inputting answers or other elements of the response is an additional distraction to the students' thinking. There are examples of 'microworlds' that are specifically designed to capture student working but this area of development is still at an early stage.
* What would you put in an 'essential software toolkit' that students would be expected to become sufficiently familiar with to use during assessments? We have listed above a range of candidate tools, now used in a minority of schools. (Students will still need desk space for paper and pencil sketching.) For each we should ask: Would these tools embody transferable mathematical computing skills? How, and to what extent should these be introduced into the curriculum in typical schools? This is ultimately a societal decision, as it has been over the many decades it has been dodged.
* In what ways would standards need to change to encourage the use of such tools in curriculum and assessment? With the faithful implementation of CCSS still to work on, this question is premature. However, if the gross mismatch between the way mathematics is done inside and outside school is to be addressed, it should be central to the next revision. Meanwhile, we must focus on what can usefully be achieved without such change.
* How can computers help in formative assessment? Here we can adopt a more positive tone. Given the recognition in CCSS of the importance of modeling, spreadsheets and other computer tools offer rich possibilities for helping students develop their reasoning skills and mathematical practices. At the simplest level, spreadsheets provide a context for exploring relationships, between variables and with data, that develops insight and provides a 'semi-concrete' bridge between arithmetic and the greater abstraction of traditional algebra as a modeling tool. More generally, the potential for use, in combination, of online discussion boards, tablet PCs and wireless internet access, to foster a collaborative classroom environment opens up new possibilities (Webb, 2010).
It is clear that what emerges from further work on these questions is likely to suggest changes in standards and curricula, as well as in assessment, that belong in the future.
Pead (2010) discusses in further detail many of the points made in this section. An adapted extract from this, describing a project to develop a computer-delivered test using rich problem-solving tasks appears in this issue (Pead, 2012).
Black, P. & Wiliam, D. (2012). The reliability of assessments pp 214-239 in John Gardner (ed.) Assessment and Learning. London: Sage.
Black, P., Harrison, C., Hodgen, J., Marshall, B. and Serret, N. (2011) Can teachers' summative assessments produce dependable results and also enhance classroom learning? Assessment in Education (in press).
Black, P. (2010) Assessment of and for Learning: improving the quality and achieving a positive interaction. Invited paper presented to the June 2010 meeting of representatives of the EU education ministers. Brussels: European Union
Stobart,G. (2001). The validity of National Curriculum Assessment. British Journal of Educational Studies 49 (1) 26-29.
Stanley, G., MacCann, R., Gardner, J., Reynolds, L. and Wild, I. (2009). Review of teacher assessment: evidence of what works best and issues for development. Oxford: Oxford University Centre for Educational Development. http://oucea.education.ox.ac.uk/research/publications/
Webb, M. (2010) Beginning teacher education and collaborative formative e-assessment. Assessment and Evaluation in Higher Education. 35(5) 597-618.
About the ISDDE Working Group on Examinations and Policy
This appendix sketches some of the experience of the authors of this paper. It was developed from discussions of the working group, which included, as well as the authors: Rita Crust, Frank Davis, Robert Floden, Louis Gomez, Vinay Kathotia, Jean-Francois Nicaud, Matthew Rascoff and Betsy Taleporos.
Paul Black worked as a physicist for twenty years before moving to a chair in science education. He was Chair of the Task Group of Assessment and Testing, which advised the UK Government on the design of the National Curriculum assessment system. He has served on three assessment advisory groups of the USA National Research Council, as Visiting Professor at Stanford University, and as a member of the Assessment Reform Group. He was Chief Examiner for A-Level Physics for the largest UK examining board, and led the design of Nuffield A-Level Physics. With Dylan Wiliam, he did the metanalysis of research on formative assessment that sparked the current realization of its potential for promoting student learning.
Hugh Burkhardt has directed a wide range of assessment-related Shell Centre projects in both the US and the UK - often working with test providers to improve the validity of their examinations. He is a director of MARS, the Mathematics Assessment Resource Service, which brings together the products and expertise of this work to help education systems. This often links high-stakes assessment with curriculum and professional development. Currently the team's Mathematics Assessment Project, led by Malcolm Swan, is developing tools for formative assessment and testing to support school systems that are implementing CCSS. Hugh was the founding Chair of ISDDE.
Phil Daro was chair of the writing group that designed the Common Core State Standards for Mathematics. That he was chosen for this role reflects his wide range of experience as consultant and designer at all levels from the classroom to state school systems - for example, as a director of Balanced Assessment for the Mathematics Curriculum, California and American Mathematics Projects, New Standards Project in both Mathematics and ELA, and the current Mathematics Assessment Project. He currently directs the development of a middle school mathematics program inspired by the Japanese curriculum, works on advancing the design and use of leadership tools for change at every level of the educational system, and consults with states and school districts on their accountability systems and mathematics programs. He has served on national boards and committees including: NAEP Validity Committee; RAND Mathematics Education Research Panel; College Board Mathematics Framework Committee; ACHIEVE Technical (Assessment) Advisory Group, Mathematics Work Group; Technical Advisory Committee to National Goals Panel for World Class Standards, National Governors Association; Commission organized by Council of Chief State School Officers; Mathematical Sciences Education Board of the National Research Council; and many others. He is Vice-Chair of ISDDE.
Ian Jones is a Royal Society Research Fellow who worked with the Shell Centre team on the mismatch between intentions and outcomes in the design of high-stakes tests, particularly the UK Grade 10 GCSE examination.
Glenda Lappan has led the design of a sequence of middle grades mathematics projects in curriculum and professional development. She is currently the Director of the Connected Mathematics Project and Co-PI for the NSF-funded Center for the Study of Mathematics Curriculum. She has served as a Program Director at the National Science Foundation. She was President of the National Council of Teachers of Mathematics during the development and release of the NCTM Principles and Standards for School Mathematics. She is past Chair of the Conference Board of the Mathematical Sciences and Vice Chair of the US National Commission on Mathematics Instruction. From 1996 to 2003, she was appointed by the Secretary of Education to serve on the National Education Research Policy and Priorities Board for the Department of Education. Glenda shared a 2008 ISDDE Prize with Elizabeth Phillips, for her work on Connected Mathematics.
Daniel Pead has worked in the design and development of educational software, including small applets for mathematics education, multimedia products, and computer-based assessment. He directs the IT work of the Shell Centre team, which has included work on a number of assessment projects, notably the World Class Tests of Problem Solving in Mathematics, Science and Technology for the UK Government. A recurring interest is how to produce computer-based materials which support and encourage good teaching and assessment practice, ensuring that the technology is a means to an end, not an end in itself. He has recently completed a substantial study of computer-based assessment in mathematics. He is the Secretary of ISDDE.
Max Stephens' current research interests are, on the one hand, in student assessment and school improvement and, complementing this, in studying how quite young students begin to move beyond calculation with numbers and become able to make profound generalisations long before they have meet formal algebra in high school. As manager of Mathematics at the body which is now the Victorian Curriculum & Assessment Authority, he was closely involved with the design and implementation of extended assessment tasks in mathematics for the Victorian Certificate of Education. He is a past president of the Australian Association of Mathematics Teachers.
**************************************** -- Jerry P. Becker Dept. of Curriculum & Instruction Southern Illinois University 625 Wham Drive Mail Code 4610 Carbondale, IL 62901-4610 Phone: (618) 453-4241 [O] (618) 457-8903 [H] Fax: (618) 453-4244 E-mail: email@example.com