The English social scientist Gregory Bateson once said, “Without context, words and actions have no meaning at all.” His pioneering perspective on interdisciplinary thinking highlights an important truth about educational assessment: far too often, we test student knowledge, aptitude, and achievement using methods that ignore the contexts in which those skills play out in real life.

Recognizing this truth is important because even when educational assessments don’t provide contextual information, students still generate it themselves. For example, students are likely to think about the situations and places where they use a particular skill during their everyday lives when deciding how to respond to a test question. Those “contexts” may differ greatly depending on a student’s background and experiences, which may lead to decreased assessment validity and subgroup score differences.

Closing this “context gap” has become a priority for educators and assessors. In our view, solutions can be found by exploring best practices generated within another high-stakes assessment arena—pre-employment testing.

For decades, an assessment methodology known as “situational judgment” has proven to be both a reliable and valid hiring tool and an engaging experience for candidates. While the use of SJTs has recently expanded into domains such as credentialing, they are rarely seen within the educational assessment field. Based on our experience and lessons learned deploying the situational judgment test (SJT) methodology across a number of high-stakes testing programs, we believe SJTs offer promising applications within the educational assessment arena.

What is a Situational Judgment Test (SJT)?

SJTs assess individual judgment and are used to determine behavioral tendencies. The basic structure always includes a scenario describing a challenging situation that requires judgment to resolve, followed by several response options that reflect a variety of different actions that could be taken in response to the challenge.

SJTs are administered in textual, video, or animated formats. For example, HumRRO has created custom animated assessments for clients using the SJT methodology.

One of the key strengths of the SJT methodology is flexibility in both the “what” and the “how” of assessment strategy:

  • SJTs have been used to measure a wide range of competencies—everything from judgment and problem solving to interpersonal skills, leadership, and conflict management.
  • When SJTs target knowledge, they go beyond simply measuring what a person knows and capture whether they actually can apply their knowledge in meaningful real-world settings.
  • Depending on the assessment goal, test-takers can be asked a variety of questions about the scenario, including:
    • What would you do?
    • What is the most effective response?
    • What is the least effective response?
    • How effective is each response option?

SJTs have a proven record of predictive validity evidence, even over and above cognitive ability measures, and they also tend to have relatively small to moderate subgroup score differences.

SJTs in Educational Assessment

Though these likely reflect the proverbial tip of the iceberg, some potential applications of SJT methodology within educational assessment include:

SJT Lessons Learned

While the flexibility of the SJT method is a clear strength, it requires assessment developers to “plan before they leap” by carefully thinking through a number of design, scoring, and psychometric issues. The lessons HumRRO has learned developing dozens of SJTs for a wide variety of assessments can help navigate these waters. Recently, HumRRO scientists highlighted the most critical issues to consider and lessons learned in a blog, Evidence- and Experienced-Based Best Practices: Situational Judgments Tests.

SJTs involve unique development challenges that are quite different from those associated with creating more traditional multiple-choice tests:

  • SJTs do not have traditional “right” or “wrong” answers. While this enhances realism, it also adds complexity to the process of developing a scoring rubric. Because subject matter experts (SMEs) collectively define the level of effectiveness for each response option, it is critical to recruit SMEs who have a deep understanding of the constructs and content the SJT is designed to target.
  • SJT response formats, such as “what would you do” and “pick the best/worst option,” have implications for scoring and equating, and this often involves tradeoffs. For example, having test-takers rate the effectiveness of each response option generates the maximum amount of information from an item, yet that type of scoring can complicate efforts to equate multiple forms relative to simpler scoring systems.
  • SJTs have the potential to be scored using partial-credit rubrics where examinees receive full credit for identifying the most effective option and partial credit for selecting the “next best” choice. However, the ability to deploy such rubrics rests on the accuracy and differentiation of the SME judgments that underlie them. HumRRO has developed a rigorous process that integrates training, practice ratings, and group discussion to ensure that the SME judgments generated for the assessments we develop are highly calibrated.
  • Within-person score standardization can be an effective way to remove the impact of idiosyncratic response styles on the scoring process. However, there are limited options for equating across multiple test forms when within-person standardization is employed, so this type of scoring is better suited for static forms.
  • Because SJTs tend to be inherently multidimensional, commonly used statistical “rules of thumb” for assessing item quality may not always be appropriate for evaluating SJTs. SJT realism often comes at a psychometric cost, with simple factor structures and high item total correlations remaining stubbornly elusive goals. That means validity arguments may need to call on different lines of evidence, such as the construct validity that is built into assessments through careful and rigorous development processes.

Now more than ever, states and school districts are under increasing pressure to measure an ever-wider array of student capabilities and achievements with high levels of accuracy and fidelity. In that quest, there are clearly no silver bullets nor easy answers—but for those seeking to explore “new horizons” in educational assessment, SJTs may be a good place to start.

About the Authors:

Beth Bynum - Manager

Beth Bynum, Ph.D.

Manager

Gavan O'Shea - Manager, Creative Services

Gavan O’Shea, Ph.D.

Manager