Demystifying the GMAT: Understanding the Restriction in Range Problem
The Demystifying the GMAT series of articles is brought to you by the research and development department of GMAC and written by Lawrence M. Rudner, vice president of research and development and chief physchometrician at GMAC.
A key question regarding the use of any measure as part of the admission process is the extent to which the measure is valid, i.e. the extent to which theory and evidence support using the measure as an admission tool. One of the most common metrics of validity is the multiple correlation coefficient. For example, one can compute the multiple correlation of first year grades with a composite of variables such as admission test score, undergraduate grade point average, interview ratings, and work experience ratings. But, such simple calculations do not work in practice without a correction. In this note, I describe the “restriction in range” problem, which is why a simple correlation does not work, and what can be done about it.
The following scatterplot shows admission test scores on the x-axis and first year grades on the y-axis. Ignore the color and shape of the markers for the moment and note the nice positive relationship. In this example the correlation is .70 and the shape of the scatterplot is a tilted ellipse.
However, these data are not realistic. Selective programs can never have a full data set because applicants that were not admitted do not have first year grades. To be more realistic, consider just the red squares. These data points represent the applicants that were admitted, while the blue diamonds represent those that were not. The scatterplot for just the admitted students is more like a rectangle than an ellipse and the correlation for just those test takers is only .20. That is, the correlation among the admitted students is much less than the correlation for the entire applicant pool. This is because the range on the x-axis has been restricted. Looking at just the admitted students is therefore problematic: The correlation is artificially attenuated making the measure look less effective. Further, the correlation among the admitted students addresses the wrong question. If the test is to be used for admissions, then the appropriate correlation is across all applicants, not just a subset.
Correcting for Restriction in Range
The question then is how can one estimate the correlation across all applicants using the data from the restricted sample? There are several statistic solutions, most of which require just one additional piece of information – the test score variation of all applicants to the program. That unrestricted variance is then compared to the restricted test score variance for the admitted students to adjust the observed correlation for the restricted sample. Once the individual correlations have been adjusted, one then runs a multiple regression using the correlation matrix, rather than the raw data, as the input. This restriction in range issue and appropriate correction approaches are commonly taught in statistics classes.
If your program no longer has the applicant pool data or if it is not easy to compute test score variances across all applicants, then GMAC can provide you with the variances of all test takers that sent scores to your program. Or, better yet, you can participate in GMAC’s Validity Study Service. Participating in this free service will help you better understand the relative strengths of different admissions criteria, including the GMAT® exam, and which combinations of criteria work best for your program.