Lawrence M. Rudner, GMAC vice president of research and development gives a peek under the hood of the GMAT.
By Lawrence M. Rudner
(This explanation of computer adaptive testing is part of a series of occasional articles taking a peek under the hood of the GMAT.)
Rather than using the computer as an electronic page turner, the GMAT uses the computer’s processing power to analyze each examinee’s responses during the test session. By having the computer calculate a final score estimate after each question and using that estimate to select subsequent questions, GMAC is able to provide a more valid, reliable, secure, and shorter test. Here’s a look at the logic and advantages of computer adaptive testing.
Logic
An individual’s test begins with a randomly selected question of average difficulty, drawn from a large pool of test questions. Subsequent questions are then selected from the pool with the following basic steps:
- The examinee responds to the question.
- The computer estimates the examinee’s final score from his responses and the difficulty of the limited number of questions he has received. Correct responses to relatively hard questions will result in higher estimated scores. Incorrect responses to relatively easy questions will result in lower estimated scores.
- The computer then evaluates all eligible questions covering the necessary content to determine which will be the most informative questions to administer next, given the examinee’s current estimated score.
- One of the best “next questions” is administered next. Typically, the best “next questions” will be relatively harder as the estimated score gets higher, and relatively easier as the estimated score gets lower.
- Steps 1 through 4 are repeated until the required number of questions has been administered to ensure test accuracy and reliability.
By estimating the examinee’s final score after each response, the computer tailors the test based on both the difficulty of the previously administered questions and the examinee’s responses. With the right pool of test questions and the right question selection algorithm, CAT can be much more efficient than a traditional, fixed-question test in which all examinees answer the same set of questions. Such a general-purpose test must have questions spanning the entire score range. As a result, the test must be longer, and every test taker sees some questions that are much too difficult and some that are much too easy.