Differential item functioning methods are widely used in linear tests for detecting potentially biased items, based on pair-wise comparisons between focal and reference groups. In this paper, the authors redefine item bias for computer adaptive tests (CAT) using the perspective of adverse impact.
In item response theory (IRT)-based CAT tests, pre-calibrated and scaled operational item parameters are used for selecting items and then scoring the tests. Item bias has been defined as the difference between the Item Characteristic Curves (ICC) across subpopulation groups. The focus in this paper is whether the use of the operational item parameters is fair to a subpopulation group. Impact is defined as the difference between the ICC of a group and the ICC defined by the operational item parameters. More specifically, an item is biased if examinees in a subpopulation group with a given ability do not have the same conditional probability of correct answers as those with the same ability in the population (all groups combined in calibrating the operational item parameters). If this is not true, the group will be impacted more positively or negatively. Test fairness might be established if none of the items administered has differential impact on the subpopulation
Statistics for flagging differential item impact (DII) are discussed with an example from a U.S.-based CAT test, the Graduate Management Admission Test®. This method also applies to other IRT-based tests.