Researchers at HumRRO have produced one of the first-ever, practitioner-oriented guides on developing situational judgment tests (SJTs). Drawing on scientific literature and their own extensive research and real-world experience developing and implementing SJTs in high-stakes assessment contexts for public and private sector clients, Deborah L. Whetzel, Ph.D., Taylor S. Sullivan, Ph.D., and Rodney A McCloy, Ph.D., wrote “Situational Judgment Tests: An Overview of Development Practices and Psychometric Characteristics,” published in the journal, Personnel Assessment and Decisions. According to ScholarWorks@BGSU, the article has already been downloaded in the United States and internationally nearly 400 times since its publication in March.

SJTs assess individual judgment by presenting examinees with problems to solve via scenarios and a list of plausible response options. Examinees then evaluate each response option for addressing the problem described in the scenario.

The paper discusses a variety of issues that affect SJTs, including reliability, validity, group differences, presentation modes, faking, and coaching, and provides best-practice guidance for practitioners.

“Consistent with HumRRO’s mission to give back to the profession, we are sharing experience- and evidence-based conclusions and suggestions for improving the development of SJTs,” said Sullivan.

It is clear from both psychometric properties and examinee response behavior that not all SJT designs are equally effective, and not all designs may be appropriate for all intended uses and assessment goals. To help practitioners and researchers alike, the authors provide best practices for developing SJTs:

SJT Best-Practice Guidelines

Scenarios

The use of critical incidents to develop SJT scenarios enhances their realism.
Specific scenarios rely on fewer assumptions, yielding higher levels of validity.
Brief scenarios reduce candidate reading load, which may reduce group differences.
Avoid sensitive topics and balance variety of characters.
Avoid overly simplistic scenarios that yield only one plausible response.
Avoid overly complex scenarios that provide more information than needed.

Response options

Generate response options that have a range of effectiveness levels.
If developing a construct-based SJT, be careful about option transparency.
List only one action in each response option (avoid double-barreled responses).
Distinguish between active bad (do something wrong) and passive bad (do nothing).
Check for tone (use of loaded words can give clues as to effectiveness).

Response instructions

Use knowledge-based (“should do”) instructions for high-stakes settings. (Candidates will engage in impression management and will respond based on what they think should be done even if they would personally respond differently).
Use behavioral tendency (“would do”) instructions if assessing non-cognitive constructs, such as personality.

Response format

Use a format where examinees rate each option, as this method provides the most information for a given scenario, yields higher reliability, and elicits the most favorable candidate reactions.
Single-response SJTs are easily classified into dimensions and have reliability and validity comparable to other SJTs, but they can have higher reading load given each scenario is associated with a single response.

Scoring

Empirical and rational keys have similar levels of reliability and validity.
Rational keys based on SME input are used most often.
Develop “overlength” forms (more scenarios and options per scenario than you will need) and score only those items that function properly)
Use 10–12 raters with a variety of perspectives to establish the scoring key. Outliers may skew results if fewer raters are used.
Use means and standard deviations to select options (means will provide effectiveness levels; standard deviation will provide level of SME agreement).

Reliability

Coefficient alpha (internal consistency) is not appropriate for multidimensional SJTs.
Use a split-half approach, with a Spearman-Brown correction, assuming the SJT content is balanced.

Validity

Because SJTs have small incremental validity over cognitive ability and personality, consider using them in tandem with other assessments to boost validity.
SJTs have been used effectively in military settings for selection and promotion.
SJTs likely measure a general personality factor.
SJTs correlate with other constructs, such as cognitive ability and personality.

Group differences

SJTs have smaller racial group differences than cognitive ability tests.
Women perform slightly better than men on SJTs on average.
Behavioral tendency instructions have smaller group differences than knowledge instructions.
Rating all options has lower group differences than ranking or selecting best/worst.

Presentation methods

Avatar- and video-based SJTs have several advantages in terms of higher face validity and lower group differences, but they may have lower reliability by inserting irrelevant contextual information.
Using avatars may be less costly, but developers should consider the uncanny valley effect when using three-dimensional human images.

Faking

Faking does affect rank ordering of candidates and who is hired.
Particularly in high-stakes settings, knowledge-based instructions (should do) appear to do a better job mitigating faking than behavioral tendency (would do) instructions
SJTs generally appear less vulnerable to faking than traditional personality measures.

Coaching

Use scoring adjustments, such as key stretching and within-person standardization, to reduce the effect of coaching examinees on how to maximize SJT responses.

Who We Are

Job Opportunities

What We Do

Who We Serve

Contract Vehicles

Contract Vehicles

Job Opportunities

Evidence- and Experience-Based Best Practices: Situational Judgment Tests

SJT Best-Practice Guidelines

Scenarios

Response options

Response instructions

Response format

Scoring

Reliability

Validity

Group differences

Presentation methods

Faking

Coaching

About the authors:

Deborah L. Whetzel, Ph.D.

Manager

Taylor Sullivan, Ph.D.

Rodney A. McCloy, Ph.D.

Principal Scientist

Evaluating K-12 Assessment Systems with Reduced Emphasis on Summative End-of-Year Assessments: 4 Lessons Learned About Local Competency-Based Assessments

Evaluating K-12 Assessment Systems with Reduced Emphasis on Summative End-of-Year Assessments: 4 Lessons Learned About Growth Assessments

Evaluating K-12 Assessment Systems with Reduced Emphasis on Summative Year-End Assessments: 5 Lessons Learned About Interim Tests

Critical Role of Interviewer Training in Ensuring Effective Structured Interviews and Benefits of Video-Based Learning

Measure Twice, Cut Once: For High-Stakes Education Assessments, Rigorous Quality Assurance Is a Must

Contributions to Society, Science, and the Profession Highlighted in HumRRO’s 2021-2022 Biennial Report

Partner with HumRRO

Interested in partnering with HumRRO for your next project?
Contact us today to get started!

Who We Are

Job Opportunities

What We Do

Who We Serve

Contract Vehicles

Contract Vehicles

Job Opportunities

Evidence- and Experience-Based Best Practices: Situational Judgment Tests

Evidence- and Experience-Based Best Practices: Situational Judgment Tests

SJT Best-Practice Guidelines

Scenarios

Response options

Response instructions

Response format

Scoring

Reliability

Validity

Group differences

Presentation methods

Faking

Coaching

About the authors:

Deborah L. Whetzel, Ph.D.

Manager

Taylor Sullivan, Ph.D.

Rodney A. McCloy, Ph.D.

Principal Scientist

Share this article:

Related Posts

Evaluating K-12 Assessment Systems with Reduced Emphasis on Summative End-of-Year Assessments: 4 Lessons Learned About Local Competency-Based Assessments

Evaluating K-12 Assessment Systems with Reduced Emphasis on Summative End-of-Year Assessments: 4 Lessons Learned About Growth Assessments

Evaluating K-12 Assessment Systems with Reduced Emphasis on Summative Year-End Assessments: 5 Lessons Learned About Interim Tests

Critical Role of Interviewer Training in Ensuring Effective Structured Interviews and Benefits of Video-Based Learning

Measure Twice, Cut Once: For High-Stakes Education Assessments, Rigorous Quality Assurance Is a Must

Contributions to Society, Science, and the Profession Highlighted in HumRRO’s 2021-2022 Biennial Report

Partner with HumRRO

Interested in partnering with HumRRO for your next project? Contact us today to get started!

Interested in partnering with HumRRO for your next project?
Contact us today to get started!