Estimation of a Ramsay-Curve IRT Model using the Metropolis-Hastings Robbins-Monro Algorithm

1 / 34 Estimation of a Ramsay-Curve IRT Model using the Metropolis-Hastings Robbins-Monro Algorithm Scott Monroe & Li Cai IMPS 2012, Lincoln, Nebraska

Outline 2 / 34 1 Introduction and Motivation 2 Review of Ramsay Curves 3 Review of the MH-RM Algorithm 4 Simulation Study 5 Empirical Application 6 Conclusion

Outline 3 / 34 1 Introduction and Motivation 2 Review of Ramsay Curves 3 Review of the MH-RM Algorithm 4 Simulation Study 5 Empirical Application 6 Conclusion

Ramsay-Curve IRT 4 / 34 The need to estimate the latent trait distribution has received increasing attention in recent years. One approach is Ramsay Curve IRT, (RC-IRT, Woods & Thissen, 2006), which uses Bock and Aitkin s (1981) EM algorithm for estimation. However, the EM algorithm does not yield the observed information matrix upon convergence. Hence, no standard errors are available.

Ramsay-Curve IRT 5 / 34 The need to estimate the latent trait distribution has received increasing attention in recent years. One approach is Ramsay Curve IRT, (RC-IRT, Woods & Thissen, 2006), which uses Bock and Aitkin s (1981) EM algorithm for estimation. However, the EM algorithm does not yield the observed information matrix upon convergence. Hence, no standard errors are available. A solution: use the Metropolis-Hastings Robbins-Monro (MH-RM, Cai, 2010) algorithm.

Outline 6 / 34 1 Introduction and Motivation 2 Review of Ramsay Curves 3 Review of the MH-RM Algorithm 4 Simulation Study 5 Empirical Application 6 Conclusion

Review of Ramsay Curves For full details, see Woods and Thissen (2006). Given observed latent trait scores, RCs can estimate the distribution s shape. Essentially, the method uses B-spline regression, with a small number of parameters. 0.00 0.01 0.02 0.03 0.04 Densities Ramsay Curve Emp. Hist. -6-4 -2 0 2 4 6 7 / 34

Outline 8 / 34 1 Introduction and Motivation 2 Review of Ramsay Curves 3 Review of the MH-RM Algorithm 4 Simulation Study 5 Empirical Application 6 Conclusion

9 / 34 A Review of MH-RM Metropolis-Hastings Robbins-Monro (Cai, 2010) is motivated by: Fisher (1925) The gradient of the observed log likelihood = the expectation of the gradient of the complete log likelihood.

A Review of MH-RM 10 / 34 Metropolis-Hastings Robbins-Monro (Cai, 2010) is motivated by: Fisher (1925) The gradient of the observed log likelihood = the expectation of the gradient of the complete log likelihood. Iterative approach: Use MH sampler to find an MC approximation of the expectation. Use this approximation to update the parameter estimates.

A Review of MH-RM 11 / 34 Metropolis-Hastings Robbins-Monro (Cai, 2010) is motivated by: Fisher (1925) The gradient of the observed log likelihood = the expectation of the gradient of the complete log likelihood. Iterative approach: Use MH sampler to find an MC approximation of the expectation. Use this approximation to update the parameter estimates. Since the approximations are noisy, use the RM method to filter. The filtering operates through decreasing gain constants.

MH-RM Estimation in a Picture 12 / 34 Stage 1 (iterations 1-800): move to neighborhood of MLE. Parameter Estimate -0.7-0.5-0.3-0.1 0 800 1000 1550 Iteration Number

13 / 34 MH-RM Estimation in a Picture Stage 1 (iterations 1-800): move to neighborhood of MLE. Stage 2 (iterations 801-1000): find starting values for Stage 3 by averaging estimates across Stage 2 iterations. Parameter Estimate -0.7-0.5-0.3-0.1 0 800 1000 1550 Iteration Number

14 / 34 MH-RM Estimation in a Picture Stage 1 (iterations 1-800): move to neighborhood of MLE. Stage 2 (iterations 801-1000): find starting values for Stage 3 by averaging estimates across Stage 2 iterations. Stage 3 (iterations 1001-convergence): obtain MLE by using decreasing gain constants to average out the noise. Parameter Estimate -0.7-0.5-0.3-0.1 0 800 1000 1550 Iteration Number

15 / 34 Standard Errors in MH-RM Louis (1982) The observed information can be expressed as a function of the complete log likelihood, involving its gradient and hessian. With MH-RM, the pieces used in the approximation can be calculated by: recursive approximation, as in Cai (2010); or Monte Carlo approximation, as in Diebolt and Ip (1996).

Outline 16 / 34 1 Introduction and Motivation 2 Review of Ramsay Curves 3 Review of the MH-RM Algorithm 4 Simulation Study 5 Empirical Application 6 Conclusion

Simulation Study 17 / 34 Simulation Study Conditions Sample Size 1000 Number of Items 25 Slope Dist. N (1.7, 0.3) Location Dist. N (0, 1), truncated at ±2 True g(θ η) Normal, Skewed, and Bimodal Replications 100 Estimation Methods MH-RM and BA-EM

18 / 34 True g(θ η) used in Simulation Study 0.00 0.02 0.04 0.06 Simulation Densities Normal Skewed Bimodal -6-4 -2 0 2 4 6 The Skewed and Bimodal densities were found as mixtures of normals. Latent trait scores were drawn by rejection sampling. Item responses were simulated in accordance with the 2PL model.

Parameter Recovery Log likelihood values for MH-RM (x-axis) and BA-EM (y-axis). 11700 11500 11300 11700 11500 11300 Normal MH RM EM 11700 11500 11300 11700 11500 11300 Skewed MH RM EM 11500 11300 11500 11300 Bimodal MH RM EM Both MH-RM and EM perform maximum marginal likelihood estimation, so we shouldn t expect substantial differences in estimation performance. 19 / 34

Parameter Recovery Average RMSE for item parameters within a replication for MH-RM (x-axis) and BA-EM (y-axis). 0.10 0.14 0.18 0.10 0.14 0.18 Normal MH RM EM 0.10 0.15 0.20 0.25 0.10 0.15 0.20 0.25 Skewed MH RM EM 0.10 0.15 0.20 0.25 0.30 0.10 0.20 0.30 Bimodal MH RM EM Takeaway: MH-RM and BA-EM are producing comparable results. 20 / 34

21 / 34 Ramsay Curve Recovery: Normal g(θ η) Density 6 4 2 0 2 4 6 θ estimated RCs by replication shown in different colors true g(θ η) shown in black

22 / 34 Ramsay Curve Recovery: Skewed g(θ η) Density 6 4 2 0 2 4 6 θ estimated RCs by replication shown in different colors true g(θ η) shown in black

23 / 34 Ramsay Curve Recovery: Bimodal g(θ η) Density 6 4 2 0 2 4 6 θ estimated RCs by replication shown in different colors true g(θ η) shown in black

Standard Error Estimation 24 / 34 For item parameters, average standard errors (x-axis) and Monte Carlo standard deviations (y-axis). Normal Skewed Bimodal Monte Carlo SD 0.05 0.15 0.25 Monte Carlo SD 0.05 0.15 0.25 Monte Carlo SD 0.05 0.15 0.25 0.05 0.15 0.25 0.05 0.15 0.25 0.05 0.15 0.25 Standard Error Standard Error Standard Error Note: For all three g(θ η) shapes, the MC standard deviations are slightly larger.

25 / 34 Standard Error Estimation Examining coverage probabilities (based on 100 replications): 68% coverage 95% coverage g(θ η) a c all a c all Normal 0.66 0.60 0.63 0.94 0.92 0.93 Skewed 0.63 0.63 0.63 0.93 0.93 0.93 Bimodal 0.67 0.64 0.66 0.95 0.93 0.94 a = slope c = intercept all = slopes and intercepts

Outline 26 / 34 1 Introduction and Motivation 2 Review of Ramsay Curves 3 Review of the MH-RM Algorithm 4 Simulation Study 5 Empirical Application 6 Conclusion

Example: National Comorbidity Survey 27 / 34 Background: The National Comorbidity Survey (NCS, 1994) was a nationwide household survey of the United States. Ten Likert items (4-categories) measured current emotional distress. Example Items: During the past 30 days how often did you... worry too much about things? feel exhausted for no good reason? feel hopeless about the future? Response options: never (0), rarely (1), sometimes (2), often (3).

28 / 34 Distress Scale Results: Ramsay Curve 0.00 0.01 0.02 0.03 0.04 0.05 Densities Distress Normal -6-4 -2 0 2 4 6 Note: there appears to be a sizeable group of respondents at θ = 3.

Distress Scale Results: Slope Estimates and Standard Errors 29 / 34 Item 1 2 3 4 5 6 7 8 9 10 Slope Estimates 1.99 1.96 2.08 2.82 2.50 2.51 2.86 2.32 2.35 2.47 1.83 1.81 1.90 2.55 2.31 2.29 2.58 2.14 2.14 2.24 Standard Errors.06.07.05.07.06.07.08.06.06.07.08.09.07.08.07.08.10.07.07.08 Blue values correspond to RC estimation. Black values correspond to IRT estimation with a normal distribution.

Outline 30 / 34 1 Introduction and Motivation 2 Review of Ramsay Curves 3 Review of the MH-RM Algorithm 4 Simulation Study 5 Empirical Application 6 Conclusion

31 / 34 Discussion Overall, results support the validity and utility of the MH-RM implementation. Embedding RC-IRT in MH-RM (as opposed to EM) enables the computation of the observed information matrix. This, in turn, facilitates: test assembly limited-information goodness-of-fit testing differential item functioning

References I 32 / 34 Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443 459. Cai, L. (2010). High-dimensional exploratory item factor analysis by a Metropolis-Hastings Robbins-Monro algorithm. Psychometrika, 75, 33 57. Cai, L. (2012). flexmirt: Flexible multilevel item factor analysis and test scoring [Computer software]. Seattle, WA: Vector Psychometric Group, LLC. Diebolt, J., & Ip, E. H. S. (1996). Stochastic EM: method and application. In W. Gilks, S. Richardson, & D. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (p. 259-273). London: Chapman and Hall. Fisher, R. A. (1925). Theory of statistical estimation. Proceedings of the Cambridge Philosophical Society, 22, 700-725.

References II 33 / 34 Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society - Series B, 44(2), 226-233. Mislevy, R. J. (1984). Estimating latent distributions. Psychometrika, 49(3), 359-381. R Development Core Team. (2010). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Available from http://www.r-project.org/ (ISBN 3-900051-07-0) Woods, C. M., & Lin, N. (2008). Item response theory with estimation of the latent density using Davidian curves. Applied Psychological Measurement, 33(2), 102-117. Woods, C. M., & Thissen, D. (2006). Item response theory with estimation of the latent population distribution using spline-based densities. Psychometrika, 71, 281 301.

Acknowledgements 34 / 34 This research is supported by grants from the Institute of Education Sciences (R305B080016 and R305D100039) and the National Institute on Drug Abuse (R01DA026943 and R01DA030466). Thanks to Larry Thomas and Mark Hansen for providing datasets used in the empirical applications.