Purpose Drives Test Design

Aug 26

Determining the purpose is the first step in developing a new assessment system. As underscored during an electrifying working session this week with a school leadership team embarking on redesigning their local benchmark assessment system, this first step is worth the time and effort. Exploring expectations and needs from all subject area teams to come to a common understanding of the intended use of results will lead to greater agreement in design decisions and provide a direction for efficient, purpose-driven test development.

An academic test is a measurement instrument intended to elicit evidence of invisible, cognitive processes in any area of study or discipline, often referred to as skills or knowledge. Measurement is the function, not the purpose. We don’t give tests just to give tests - at least we shouldn’t be! Giving a test because it is required, then never doing anything with those results is a waste of precious time with our students. The purpose of a test is defined by how results are used. As I often say, the devil is in the details, so let’s dig into some details.

Some practical uses of the results from formative assessments may be to…

reflect on the efficacy of instructional approaches, with the intention to adjust future instructional strategies for similar concepts.
ascertain student mastery of developmental skills or knowledge to tailor instruction of more complicated, dependent skills or knowledge.
indicate groups of students that may benefit from additional instructional support or resources.
group students for immediate, subsequent instructional activities designed to expand and deepen understanding of concepts.
gauge student progress within a course or course progression to adjust the course or course progression design.
gauge impact of curriculum materials and/or a program during an implementation cycle to adjust the following cycle(s).
provide an indication of student progress toward end of year expectations to allow for acceleration and remediation during the school year.

No one test can do it all. Each use listed above carries implications for test design in order for the results to include the scope and precision necessary. Purpose drives design, in terms of what is included on the test (blueprint), how the test elicits evidence of cognitive processes (item type and difficulty), and how often the test is administered (frequency of results). To illustrate with practical examples, three scenarios and implications for test design that flow from the purpose are explored in more detail.

Scenario 1: Formative tests that ascertain student mastery of developmental skills or knowledge to adjust future instructional strategies for all students to reach mastery.

Assessing students for mastery of concepts before adjusting subsequent instruction is best served with an assessment blueprint that is focused on a single concept or highly-interrelated concepts and designed to engage with the concept(s) being measured in different ways. Results that indicate a student hit the mark are less useful in this scenario than information of how/why a student has not reached mastery. Item type should be driven by the concepts and constructs seeking to be measured, with an emphasis on similar constructs presented in different ways and at different difficulty levels to uncover misconceptions. Frequency of such a formative assessment should be driven by the presentation of concepts within instructional units, and embedded such that adjustments to subsequent instruction may be made before the unit is concluded. That means that these short, focused formative assessments may be frequent and given as needed.

Scenario 2: Formative tests to provide a course-based benchmark for teachers, or teacher teams, to compare progress between student groups.

Formative tests to be used by teachers to compare classes, within or between school years, are most useful if the blueprint reflects content covered in the course up to the point of test administration, such as at the end of a larger unit of study. If an additional intention for these test results is to evaluate efficacy of instructional practices, then intentional differences in instruction leading up to administration of the test is ideal, as that creates treatment/non-treatment groups for comparison of different instructional strategies. Item type should be driven by the concepts and constructs seeking to be measured, with validation efforts focused on identifying items that are reliable across student groups. Frequency of such a test should be driven by the instructional unit structure, and therefore depend on the length of time dedicated to conceptual units.

Scenario 3: Formative tests to be used to provide an indication of student progress toward end of year expectations.

The end of year assessments include a survey of the content to be covered throughout the entire course, therefore the blueprint of a formative assessment intended for this purpose should also include a representative sample of content from the entire course. Items contributing to the test should also reflect the item types on the end of year assessment. These design choices mitigate test construction differences that interfere with correlation between the formative and end of year tests. Such a formative assessment given once or twice, leading up to the end of year assessment, provides enough actionable data to track progress throughout a school year. There are diminishing returns with more frequent administrations for this purpose for formative assessment. Student performance should be assumed to increase throughout the year, how much to be expected requires validation strategies that include the end of year test results.

Each of these scenarios illustrate different test designs flowing from how the results are intended to be used. It is key to be explicit about purpose from the outset of creating a new assessment system and maintain that purpose as long as that system is in place. Creative, new ways to use results to “get more out of the test” does not make tests more useful. Instead such creativity erodes the relationship between purpose and design, degrading trust and confidence in the assessment system.

A point in every direction is the same as no point at all. ~Harry Edward Nilsson

Mary Cochron https://metimured.com

Purpose Drives Test Design

Reviewing for Sensitivity: A Call to Action

What is Your Favorite Food?

Metimur LLC