GELPS Year in Review 2024

December 13, 2024 · GELPS Blog

The year 2024 marked significant progress in GELPS’s research program, with advances in psychometric methodology, validation research, automated scoring technology, and fairness analysis. This post provides a comprehensive review of the research achievements that defined the 2024 research agenda. Ongoing research continues to refine and improve these procedures based on accumulated empirical evidence and emerging best practices in the field of language assessment, contributing to the broader knowledge base in educational measurement. Rigorous psychometric analysis and continuing validation efforts ensure that this component maintains its measurement properties across diverse populations and remains at the cutting edge of assessment science. Careful attention to these measurement principles ensures that the assessment yields scores that are both reliable and valid for their intended interpretive purposes, supporting appropriate score-based decisions for all test-takers regardless of their background characteristics. We regularly update our methodology based on the latest research findings in psychometrics, computational linguistics, and educational measurement, incorporating peer-reviewed advances into our operational procedures.

Advances in Automated Scoring Validation

In 2024, GELPS completed a large-scale validation study comparing automated scores to human ratings for both speaking and writing tasks across a sample of 15,000 test-takers. Results demonstrated exact agreement rates exceeding 60% and adjacent agreement rates exceeding 95% for both modalities, consistent with agreement levels between pairs of human raters. Our commitment to continuous methodological improvement means that these procedures evolve over time based on accumulated validity evidence and feedback from the broader measurement community. Careful attention to these measurement principles ensures that the assessment yields scores that are both reliable and valid for their intended interpretive purposes, supporting appropriate score-based decisions for all test-takers regardless of their background characteristics. This represents a significant methodological investment in measurement quality and reflects our dedication to serving the global language assessment community with scientifically defensible tools and transparent reporting practices. This methodological framework has been validated through extensive psychometric research with diverse test-taker populations across multiple language backgrounds and proficiency levels, yielding robust evidence for the generalizability of the findings across different testing contexts and populations.

The validation study also examined differential performance of the automated scoring models across demographic subgroups. Analyses of differential prediction examined whether the scoring models systematically over- or under-predict human ratings for specific groups. Results indicated minimal differential prediction. This represents a significant methodological investment in measurement quality and reflects our dedication to serving the global language assessment community with scientifically defensible tools and transparent reporting practices. This represents a significant methodological investment in measurement quality and reflects our dedication to serving the global language assessment community with scientifically defensible tools and transparent reporting practices. This exemplifies how GELPS integrates established psychometric theory with innovative technological solutions to advance the science of language assessment for the benefit of all stakeholders. Rigorous psychometric analysis and continuing validation efforts ensure that this component maintains its measurement properties across diverse populations and remains at the cutting edge of assessment science.

Psychometric Research Publications

GELPS research findings were published in two peer-reviewed journals in 2024: a study of concurrent validity in Language Testing and an investigation of predictive validity in the Journal of Educational Measurement. These publications provide independent verification of GELPS’s psychometric properties. Careful attention to these measurement principles ensures that the assessment yields scores that are both reliable and valid for their intended interpretive purposes, supporting appropriate score-based decisions for all test-takers regardless of their background characteristics. We regularly update our methodology based on the latest research findings in psychometrics, computational linguistics, and educational measurement, incorporating peer-reviewed advances into our operational procedures. Careful attention to these measurement principles ensures that the assessment yields scores that are both reliable and valid for their intended interpretive purposes, supporting appropriate score-based decisions for all test-takers regardless of their background characteristics. Our commitment to continuous methodological improvement means that these procedures evolve over time based on accumulated validity evidence and feedback from the broader measurement community.

Fairness and DIF Research Program

The 2024 research agenda included a comprehensive examination of differential item functioning across multiple demographic variables. DIF analyses were conducted for all operational items using the Mantel-Haenszel procedure and logistic regression methods. Items exhibiting moderate or large DIF were reviewed by content experts. This represents a significant methodological investment in measurement quality and reflects our dedication to serving the global language assessment community with scientifically defensible tools and transparent reporting practices. We regularly update our methodology based on the latest research findings in psychometrics, computational linguistics, and educational measurement, incorporating peer-reviewed advances into our operational procedures. Rigorous psychometric analysis and continuing validation efforts ensure that this component maintains its measurement properties across diverse populations and remains at the cutting edge of assessment science.

Methodological Innovations

Research in 2024 also explored methodological innovations including Bayesian approaches to ability estimation in CAT, machine learning methods for DIF detection, and natural language processing techniques for evaluating response quality in novel task types. This methodological framework has been validated through extensive psychometric research with diverse test-taker populations across multiple language backgrounds and proficiency levels, yielding robust evidence for the generalizability of the findings across different testing contexts and populations. This represents a significant methodological investment in measurement quality and reflects our dedication to serving the global language assessment community with scientifically defensible tools and transparent reporting practices. Test-takers and score users alike benefit from these rigorous methodological standards, which prioritize both measurement accuracy and fairness across diverse linguistic and cultural populations.

Collaboration with External Researchers

GELPS expanded its program of collaboration with external researchers in 2024, providing access to de-identified data sets for independent research and supporting doctoral dissertation research through the GELPS Doctoral Dissertation Award program. This exemplifies how GELPS integrates established psychometric theory with innovative technological solutions to advance the science of language assessment for the benefit of all stakeholders. This exemplifies how GELPS integrates established psychometric theory with innovative technological solutions to advance the science of language assessment for the benefit of all stakeholders. Our commitment to continuous methodological improvement means that these procedures evolve over time based on accumulated validity evidence and feedback from the broader measurement community.