States have begun to shift focus from annual summative assessments for students toward shorter assessments that are administered periodically throughout the school year. The Innovative Assessment Demonstration Authority (IADA) program allowed participating states to redesign and replace their traditional summative assessments with more innovative assessment systems in selected districts.

The five approved IADA programs—Louisiana, New Hampshire, North Carolina, and two programs in Georgia—all chose to implement a system of interim assessments. Because they are administered at regular intervals during the school year, interim assessments can be tailored to reflect instruction and learning that students recently experienced. These assessments help track student progress, inform instruction, and generate proficiency scores.

Given their relatively immediate impact on individual students, ensuring the quality of interim assessments is both more complicated and of greater importance relative to standalone summative assessments. Below, we provide several concrete strategies to help state educators verify the quality of the data generated by interim assessments, as well as the processes associated with their administration.

The Appeal of Interim Assessments

For nearly a decade, states have collected annual assessment data to satisfy the requirements of the No Child Left Behind (NCLB) Act and the Every Student Succeeds Act (ESSA). States meet this requirement by administering once-per-year tests, typically in the spring, targeting specific grades and subjects. Test scores are used for school accountability, and in some cases for teacher- or even student-level accountability. Educators often question the value of summative end-of-year assessments, arguing that a single score cannot fully capture student performance, scores arrive too late to be useful, the information provided is not specific enough to be useful for guiding instruction, and preparing for and administering the test takes away valuable learning time.

Certain states have implemented interim assessment systems that are better aligned with instructional practice. For example, Florida recently announced it will replace its statewide summative assessment program with a series of interim assessments referred to as a “progress monitoring,” and the New Hampshire Performance Assessment for Competency Education (PACE) system is embedded into routine instruction and administered when it is most relevant to the curriculum. Scores generated through PACE are immediately available after each assessment element, and students’ summative annual performance determinations are determined by aggregating the results of multiple interim assessments.

Assuring the Quality of Interim Assessments

The increased pacing of these new assessments clearly raises the stakes around quality assurance. Whenever errors occur in a high-stakes assessment program, significant reporting delays and financial costs are the inevitable outcomes of the substantial efforts needed to correct the errors and report new scores to students, parents, educators, school administrators, and stakeholders. As a result, the public often loses faith in the assessment—whether warranted or not.

Given that most documented errors are clerical, such as transposing or miscopying data during complex psychometric processing, our experience teaches us that the most rigorous quality assurance strategy involves third-party replication. Though often associated with end-of-year summative statewide testing, these and other quality assurance strategies also hold great promise for interim alternatives at many phases of the assessment process, including:

  • Administration

    States are only just beginning to map out the administrative logistics associated with interim assessments in general, let alone how educators can use such assessments to inform near-term instruction and curricular planning. At its simplest, this means moving from a single test administration window to three or more windows, or perhaps even administering tests on-demand. Each of these scenarios increases the risk for errors, especially since interim student level results must be aggregated for the annual reporting required under ESSA.

  • Field Testing

    Interim assessment reporting often involves automating much of the psychometric analyses (e.g., item parameter estimation, equating, scaling, scoring) that typically occur in the spring and summer after summative tests are administered. Developing automated student reporting for interim assessments, therefore, typically requires complex field testing and pre-equating procedures to generate pre-established item parameters for fixed or adaptive versions of an operational exam.

  • Generating Student Scores

    While pre-equating doesn’t reduce the psychometric work needed to generate student test scores, it does move this complex process to a time before operational tests are administered. Under this scenario, a critical quality assurance activity involves monitoring item parameters over time to ensure that they do not change (drift) in ways that reflect poorly on the accuracy, reliability, or validity of the assessments.

  • Aggregating Assessment Results

    Interim assessments also introduce new psychometric complexity by requiring score aggregations to support annual determinations for each student. Though these aggregations sometimes involve only simple summation, they are often much more complex—such as when prior scores inform computer adaptive algorithms for each successive interim assessment.

  • Generating Assessment Reports

    If states provide reports after each interim assessment to guide instruction and curriculum development, or even for student assignment to remediation or extension programs, the stakes can increase relative to end-of-year summative assessments. After all, summative assessment results often arrive after the student has changed grades, teachers, and sometimes even schools, and are thus rarely used to make student-level decisions.

Helping Educators Effectively Use the Data

Educators must understand how to effectively use interim assessment data for instruction and curriculum planning, including how to take the limitations of the data into account when making decisions. Accurate data interpretation and use reflect vital validity concerns if decisions about students’ instruction are based on them.

Experience teaches us errors can have a devastating impact on large-scale summative assessment programs. Here, we see how interim assessments raise the stakes even higher, primarily because individual students are often immediately impacted by the results. Rigorous quality assurance procedures, including independent psychometric processing, become more vital than ever for ensuring that these innovative assessment programs deliver on their considerable promise. HumRRO’s expert psychometric and quality assurance services can assist clients in establishing validity and improving the quality of their programs.

About the Authors:

Art Thacker - Principal Scientist

Art Thacker, Ph.D.

Principal Scientist

Yvette Nemeth - Senior Staff Scientist

Yvette Nemeth, Ph.D.

Senior Staff Scientist