Evaluating Scale Stability of a Computer Adaptive Testing System

November 30, 2005

Overview

Scale stability is an important quality for any large-scale computer adaptive test (CAT) program and should be maintained through research on scale drift evaluations in the CAT operations. However, there is scarcely any literature on evaluating scale drift with CAT using both observed and simulated data. A method for evaluating scale drift is outlined and illustrated in this paper. In this study, a special online data collection method for the GMAT Quantitative measure was designed and implemented. A modified root mean squared difference statistic was used to measure the difference in item parameters. Then an empirical baseline was established using simulations for evaluating the difference. The result showed that scale drift was not detected in the GMAT Quantitative measure and the observed differences between the two sets of item parameters calibrated at two time points were random variations.