Using Natural Language Processing to Bring Efficiencies to Educational Alignment Processes

State education assessment programs are required to conduct alignment studies and include the results in their U.S. Department of Education peer review submissions. These time-intensive validity investigations typically involve human judgment on how well test items match a content standard and/or cognitive level of complexity, or how one set of content standards maps to another.

Broadly speaking, this typically involves comparing one set of text statements, such as the item stem or task, to another set of text statements, such as content standards. Content experts evaluate these separate pieces of text and use their expert judgment to determine the degree of similarity—or alignment—between them.

The task of comparing a large number of text statements can quickly become overwhelming. Imagine, for example, the task of mapping the National Assessment of Educational Progress (NAEP) content standards to a particular state’s content standards. The number of comparisons would easily reach many thousands. Or, consider an item bank consisting of thousands of items and evaluating those items against a set of content standards. Again, this would involve thousands of comparisons. This cognitive load placed on content experts often leads to fatigue—raising a clear risk that judgment errors will result.

Fortunately, advances in natural language processing (NLP) offer a means to bring efficiencies and cost-savings to traditional, highly labor-intensive alignment processes. One advancement is the growing set of Large Language Models (LLMs), like ChatGPT, or many other open-source versions that can be used to classify, label, or compare text statements quickly and efficiently.

We do not propose that NLP algorithms would perform the alignment in place of content experts. Judgments like these should always involve human experts. However, NLP-based methods can identify the text statements that most closely resemble each other, and content experts can then review a smaller set of candidate standards that may be similar instead of reviewing the entire set of content standards. The idea is to use NLP to make big problems smaller, and then ask content experts to review a subset of standards rather than an entire collection of standards.

HumRRO experts have shared research and insights over the past five years aimed at leveraging NLP methods to bring new levels of efficiency to what have traditionally been human-driven processes:

Identifying related occupations in the O*NET program

Advancing the credentialing industry via natural language processing

Predicting knowledge, skills, and ability ratings using job descriptions

Harnessing Large Language Models in automated item generation efforts

Our experts are now applying similar NLP concepts to challenges in education, such as alignment, standard setting, and identifying item enemies. Below, we briefly describe how alignment and standard setting methods benefit from NLP techniques.

Leveraging NLP in Educational Test Alignment Processes

Large Language Models are increasingly capturing the attention and imaginations of social scientists, and we are finding new ways almost daily to apply them. Here are a few ways HumRRO can leverage the power of NLP and LLMs in the education context, specifically in the area of alignment:

Creating crosswalks between different content standards. States often map one set of content standards to another set of standards. For example, a 2016 National Center for Education Statistics report maps Next Generation Science Standards to the NAEP science framework standards. In this study, the research team began with a content mapping activity, which involved forming subsets of standards that could then be evaluated by content experts.

In this context, NLP tools can be leveraged to organize text statements into groups using text classification methods, for example, based on their similarity, or to rank order the text pairs based on their NLP similarity index. Like the NCES content mapping activity, the result would be a subset of text pairs that experts evaluate—significantly reducing the human level of effort and potentially the overall cost associated with the tasks.
Comparing test items to state content standards. State assessment programs must demonstrate in their peer review submissions that their test items match the state’s content standards. This task typically involves content experts reviewing items (either on a test form or in the entire bank) and evaluating the task demand against the state content standards. Experts make decisions on which standard the test item is best aligned with.

The challenge here is that content experts must review items against hundreds if not thousands of content standards. Very often, though, it is information contained directly within the item text itself that drives which standard the item seems best aligned with. Rather than human content experts comparing every item to every standard, text similarity methods can be used to match test items with the most likely subset of standards the item aligns with, thus reducing the search space. This would bring substantial time and cost efficiencies into the alignment process.
Comparing test items to levels of cognitive complexity and Performance Level Descriptors (PLDs).
It is sometimes useful to determine which performance level test items align with, such as in the research, The Relationship between Item Developer Alignment of Items to Range Achievement-Level Descriptors and Item Difficulty: Implications for Validating Intended Score Interpretations. This work began with test developers comparing test items to standards and also comparing them to range achievement level descriptors.

This is yet another example where text similarity approaches can be useful to assist test developers. As in the examples above, NLP-based approaches can use the language contained in the test items and compare it against state standards or the achievement level descriptors. This would significantly reduce the human level of effort and allow for test developers to focus on a subset of standards instead of comparing items to an entire set of standards.

The Bottom Line

These examples illustrate areas in traditional alignment processes where NLP methods can provide efficiencies and, in turn, lower costs for state assessment programs. That said, we are not advocating that such methods wholly replace human judgment in alignment studies. Instead, we advocate for using NLP and other AI methods in combination with human judgment to create more efficient—and potentially, more effective and accurate—processes than can be accomplished by human experts alone.

Authors:

Harold Doran, Ed.D.

Vice President

Contact Harold

Who We Are

Job Opportunities

What We Do

Who We Serve

Contract Vehicles

Contract Vehicles

Job Opportunities

Using Natural Language Processing to Bring Efficiencies to Educational Alignment Processes

Identifying related occupations in the O*NET program

Advancing the credentialing industry via natural language processing

Predicting knowledge, skills, and ability ratings using job descriptions

Harnessing Large Language Models in automated item generation efforts

Leveraging NLP in Educational Test Alignment Processes

The Bottom Line

Harold Doran, Ed.D.

Vice President

Steve Ferrara, Ph.D.

Measure Twice, Cut Once: For High-Stakes Education Assessments, Rigorous Quality Assurance Is a Must

Contributions to Society, Science, and the Profession Highlighted in HumRRO’s 2021-2022 Biennial Report

Helping States Meet ESSA Goals with the ASVAB Career Exploration Program

Advancing the Full Potential of Skills-Based Hiring

Trends in Personnel Assessment: Reflections on the 2023 IPAC Conference

Exploring the Use of Cognitive Diagnostic Models (CDMs) in Formative Assessments to Deliver Tailored Student Instruction and Success

Partner with HumRRO

Interested in partnering with HumRRO for your next project?
Contact us today to get started!

Who We Are

Job Opportunities

What We Do

Who We Serve

Contract Vehicles

Contract Vehicles

Job Opportunities

Using Natural Language Processing to Bring Efficiencies to Educational Alignment Processes

Identifying related occupations in the O*NET program

Advancing the credentialing industry via natural language processing

Predicting knowledge, skills, and ability ratings using job descriptions

Harnessing Large Language Models in automated item generation efforts

Leveraging NLP in Educational Test Alignment Processes

The Bottom Line

Harold Doran, Ed.D.

Vice President

Steve Ferrara, Ph.D.

Share this article:

Related Posts

Measure Twice, Cut Once: For High-Stakes Education Assessments, Rigorous Quality Assurance Is a Must

Contributions to Society, Science, and the Profession Highlighted in HumRRO’s 2021-2022 Biennial Report

Helping States Meet ESSA Goals with the ASVAB Career Exploration Program

Advancing the Full Potential of Skills-Based Hiring

Trends in Personnel Assessment: Reflections on the 2023 IPAC Conference

Exploring the Use of Cognitive Diagnostic Models (CDMs) in Formative Assessments to Deliver Tailored Student Instruction and Success

Partner with HumRRO

Interested in partnering with HumRRO for your next project? Contact us today to get started!

Interested in partnering with HumRRO for your next project?
Contact us today to get started!