Ask any test developer about their greatest challenges and there are two words you’ll likely hear—form assembly. Indeed, for any testing program that does not involve adaptive or “on the fly” administration, drafting a test form from among a pool of eligible items is a critical step.

What makes the process so challenging is the need to balance multiple goals, including:

  • Meeting the testing program’s content blueprint requirements, such as assigning three items from content area A.
  • Meeting the testing program’s statistical requirements; for example, the targeted level of item difficulty, test characteristic curve, or test information at a pass–fail cut point.
  • Meeting the testing program’s long-term form plan, such as a rule specifying that 20 percent of the assigned items must have appeared on a previous form.
  • Achieving a similar level of difficulty and/or precision to previous exams or other exams in the current testing cycle.

In a nutshell, the task is cognitively demanding and fraught with potential human error. To further complicate the task, many of the goals of the testing program may be directly (or indirectly) at odds with one another. For example, a client may want to include older “legacy” items with poorer statistical qualities while still maintaining high levels of measurement precision for their exams.

Patterns like these may not be immediately apparent but can become noticeable barriers over time, leading to excessive effort, rework, or even failure to satisfy all requirements simultaneously. Meeting the requirements becomes even more complex when they must be applied across more than one form concurrently, as many testing programs build multiple forms at the same time.

Enter Automated Test Assembly

Automated test assembly (ATA) couples computing power with algorithm-based approaches to manage the complexities and competing goals of test assembly. At HumRRO, our industrial-organizational (I-O) psychologists use an ATA approach that structures the testing program requirements as a “problem” that is then “solved” by an integer programming algorithm, as Willem J. Van Der Linden described in his 2005 book, Linear Models for Optimal Test Design.

The test blueprint essentially creates boundary conditions—for example, each test must include 100 assigned items—within which all viable solutions must exist. The ATA algorithm then identifies the most favorable solution within the pre-specified boundary conditions to meet program goals, such as maximizing test-level reliability. This approach to ATA allows a systematic search of the viable solutions, iteratively improving upon the stated goals.

Over the years, we have designed and deployed custom ATA programs for a range of high-stakes assessments sponsored by organizations such as the Society for Human Resource Management (SHRM), the Federation of State Boards of Physical Therapy (FSBPT), the U.S. Department of State, and the U.S. Department of Defense. This experience allows us to set up and customize ATA processes for new clients efficiently and cost-effectively.

Benefits of ATA

Using ATA instead of a manual approach to test assembly brings notable efficiencies to a testing program. Perhaps most importantly, the ATA algorithm systematically searches the solution space, allowing the testing program to achieve equal or higher quality test forms than those assembled manually. This allows testing programs to effectively navigate shallow or idiosyncratic item pools.

Beyond simply improving manual assembly processes, our applications of ATA have resulted in a variety of benefits for our clients’ testing programs, such as:

  • Balancing constraints (content, statistical) and goals. An ATA algorithm can account for even the most complicated testing blueprint while searching for an optimal solution. This is critical because testing blueprints often have competing goals that are difficult to recognize—let alone resolve—through a manual process.
  • Assembling multiple, parallel forms simultaneously. ATA allows greater consistency in assembled forms, specific to the testing program’s goals, such as difficulty, precision, or estimates of group differences.
  • Quicker reaction time to unforeseen issues. Some testing program problems cannot be forecast. In the unfortunate event that a form is breached or otherwise compromised, an ATA algorithm can quickly construct a new form that is parallel to the compromised one, buying the testing program much-needed time.
  • Avoiding enemies. If a testing program has well-documented item enemy information in their bank, an ATA algorithm can use that information to proactively avoid using those items that share similar content during form assembly.
  • Adaptable to changing test blueprints. An ATA algorithm can be set up to adapt to changing content specifications. For example, if a content area is being phased out over time, the blueprint input can be updated each assembly cycle without making substantive changes to the algorithm.
  • Minimizing “cherry-picking.” Assuming a sufficient item pool, ATA enables the assembly of extra forms to ensure a single administration has not depleted all the “good” items.
  • Addressing content review. With minimal modifications, an ATA algorithm can be re-tooled to make item replacements to address content cueing or other issues identified in content review. If information about “enemy items” is not available during assembly, this is an efficient alternative approach because the ATA can minimize the amount of new content introduced into the form while addressing any content issues. This means that form reviews that a testing program already has in place, such as reviews by subject matter experts (SMEs), can continue while also making the process more efficient.
  • Customized outputs. An ATA program can provide report-ready outputs specific to the needs of the testing program. These might include plots, frequency tables, and descriptive statistics to communicate results and substantiate the quality of the assembled forms. Such outputs can also characterize the quality of the unassigned items, allowing ATA to help streamline critical checks of the long-term trajectory of the item bank.

Straightforward and More Effective

Most professionals responsible for fielding non-adaptive testing programs can relate to the issues and complexities of manual form assembly. However, complicated problems do not always necessitate complicated solutions. In fact, ATA algorithms provide a surprisingly straightforward and elegant way of framing and systematically managing the form assembly problem. The approach simply requires analysts to state the problem in a way that allows a well-established integer programming algorithm to do the legwork for them.

In short, ATA programs allow testing programs to “work smarter, not harder.” The results speak for themselves in terms of higher quality, more parallel exams that are constructed in less time with better documentation.

About the Authors:

Sean Baldwin, Senior Staff Scientist

Sean Baldwin, Ph.D.

Senior Staff Scientist

Genevieve Ainslie, Senior Scientist

Genevieve Ainslie, Ph.D.

Senior Scientist

Taylor Sullivan

Taylor Sullivan, Ph.D.