Methodology and Statistics
International Conference, 24 - 27 September 2000
Hotel Bor and Castle Hrib, Preddvor, Slovenia

On the statistical interpretation of score differences
within an IRT framework

Gerhard H. Fischer
Department of Psychology, University of Vienna

Measurement of individual change based on test scores is a notoriously difficult problem which, within the framework of classical test theory, has withstood a convincing solution (cf. Cronbach & Furby, Psychological Bulletin, 1970; Williams & Zimmerman, Applied Psychological Measurement, 1996). Modern Item Response Theory, however, allows new approches to that problem. Building on work by Klauer (Psychometrika, 1991), Liu (Applied Psychological Measurement, 1993), and Fischer (Psychometrika, 1995), uniformly best hypotheses tests and confidence intervals for score differences in repeated measurement designs are presented, allowing a new assessment and interpretation of individual change in test scores. The tests and confidence intervals are 'exact' in the sense that they are based on the exact conditional distribution of the score difference, given the total score of the testee, and thus do not need any asymptotic approximations.

The assumption on which these derivations rest is that the test or scale used has been calibrated by and is conformable to a Partial Credit Model (Masters, Psychometrika, 1982), comprising the Rating Scale Model (Andrich, 1978) and the Rasch Model (Rasch, 1960) as special cases. Therefore, the range of applications, both in ability testing and in other uses of 'scales', e.g., in clinical psychology, is fairly large. It is suggested that tables of significant score differences become part of the standard material (e.g., test handbooks) accompanying test publications.

Key words: Test score differences, measurement of change, repeated measurements