Methodology and Statistics
International Conference, 24  27 September 2000
Hotel Bor and Castle Hrib, Preddvor, Slovenia
On the statistical interpretation of score differences
within an IRT framework
Gerhard H. Fischer
Department of Psychology, University of Vienna
Measurement of individual change based on test
scores is a notoriously difficult problem which, within the framework of
classical test theory, has withstood a
convincing solution (cf. Cronbach & Furby,
Psychological Bulletin, 1970; Williams & Zimmerman, Applied Psychological
Measurement, 1996). Modern Item Response Theory, however,
allows new approches to that problem. Building on work by Klauer
(Psychometrika, 1991), Liu (Applied Psychological
Measurement, 1993), and Fischer (Psychometrika,
1995), uniformly best hypotheses tests and
confidence intervals for score differences in repeated measurement
designs are presented, allowing a new
assessment and interpretation of individual change in test scores. The tests and confidence intervals are 'exact' in the
sense that they are based on the exact conditional distribution of
the score difference, given the total score of the testee, and thus do not need
any asymptotic approximations.
The assumption on which these
derivations rest is that the test or scale used has been calibrated
by and is conformable to a Partial Credit Model (Masters,
Psychometrika, 1982), comprising the Rating Scale Model (Andrich, 1978) and the Rasch Model
(Rasch, 1960)
as special cases. Therefore, the
range
of applications, both in ability
testing and
in other uses of 'scales', e.g., in clinical psychology, is fairly
large.
It is suggested that tables of
significant score
differences become part of the standard material (e.g.,
test handbooks) accompanying test
publications.
Key words: Test score differences, measurement of change, repeated measurements
