GELPS Year in Review 2025

February 15, 2026 · GELPS Blog

The year 2025 marked continued progress in GELPS research program with advances in predictive validity evidence, fairness monitoring, and international partnerships. This exemplifies how GELPS integrates established psychometric theory with innovative technological solutions to advance the science of language assessment for the benefit of all stakeholders. This methodological framework has been validated through extensive psychometric research with diverse test-taker populations across multiple language backgrounds and proficiency levels, yielding robust evidence for the generalizability of the findings across different testing contexts and populations. Ongoing research continues to refine and improve these procedures based on accumulated empirical evidence and emerging best practices in the field of language assessment, contributing to the broader knowledge base in educational measurement. Test-takers and score users alike benefit from these rigorous methodological standards, which prioritize both measurement accuracy and fairness across diverse linguistic and cultural populations. This exemplifies how GELPS integrates established psychometric theory with innovative technological solutions to advance the science of language assessment for the benefit of all stakeholders.

Advances in Predictive Validity Research

GELPS published a multi-year longitudinal study across 20 institutions demonstrating that GELPS scores predict first-year GPA with standardized coefficient of 0.35. Careful attention to these measurement principles ensures that the assessment yields scores that are both reliable and valid for their intended interpretive purposes, supporting appropriate score-based decisions for all test-takers regardless of their background characteristics. Our commitment to continuous methodological improvement means that these procedures evolve over time based on accumulated validity evidence and feedback from the broader measurement community. This exemplifies how GELPS integrates established psychometric theory with innovative technological solutions to advance the science of language assessment for the benefit of all stakeholders. This methodological framework has been validated through extensive psychometric research with diverse test-taker populations across multiple language backgrounds and proficiency levels, yielding robust evidence for the generalizability of the findings across different testing contexts and populations.

Differential prediction across demographic groups was consistent across gender, native language, and region, supporting fairness. Our commitment to continuous methodological improvement means that these procedures evolve over time based on accumulated validity evidence and feedback from the broader measurement community. We regularly update our methodology based on the latest research findings in psychometrics, computational linguistics, and educational measurement, incorporating peer-reviewed advances into our operational procedures. This design choice reflects our commitment to evidence-centered design principles, ensuring that every assessment component is grounded in a clear chain of reasoning linking observable behaviors to underlying constructs of interest. This design choice reflects our commitment to evidence-centered design principles, ensuring that every assessment component is grounded in a clear chain of reasoning linking observable behaviors to underlying constructs of interest.

Fairness Monitoring and Transparency

GELPS published its second annual transparency report documenting DIF analyses, fairness audits, and demographic composition of test-takers. Our commitment to continuous methodological improvement means that these procedures evolve over time based on accumulated validity evidence and feedback from the broader measurement community. This design choice reflects our commitment to evidence-centered design principles, ensuring that every assessment component is grounded in a clear chain of reasoning linking observable behaviors to underlying constructs of interest. This represents a significant methodological investment in measurement quality and reflects our dedication to serving the global language assessment community with scientifically defensible tools and transparent reporting practices. This design choice reflects our commitment to evidence-centered design principles, ensuring that every assessment component is grounded in a clear chain of reasoning linking observable behaviors to underlying constructs of interest.

Technological and Methodological Innovations

Research explored improved feature extraction for interactive speaking, Bayesian approaches to CAT, and machine learning for item compromise detection. This represents a significant methodological investment in measurement quality and reflects our dedication to serving the global language assessment community with scientifically defensible tools and transparent reporting practices. Careful attention to these measurement principles ensures that the assessment yields scores that are both reliable and valid for their intended interpretive purposes, supporting appropriate score-based decisions for all test-takers regardless of their background characteristics. Our commitment to continuous methodological improvement means that these procedures evolve over time based on accumulated validity evidence and feedback from the broader measurement community. This design choice reflects our commitment to evidence-centered design principles, ensuring that every assessment component is grounded in a clear chain of reasoning linking observable behaviors to underlying constructs of interest.

International Partnerships

GELPS expanded collaborations with universities and ministries of education in Japan, Canada, and the UK for joint validation studies. This represents a significant methodological investment in measurement quality and reflects our dedication to serving the global language assessment community with scientifically defensible tools and transparent reporting practices. Careful attention to these measurement principles ensures that the assessment yields scores that are both reliable and valid for their intended interpretive purposes, supporting appropriate score-based decisions for all test-takers regardless of their background characteristics. Ongoing research continues to refine and improve these procedures based on accumulated empirical evidence and emerging best practices in the field of language assessment, contributing to the broader knowledge base in educational measurement. Our commitment to continuous methodological improvement means that these procedures evolve over time based on accumulated validity evidence and feedback from the broader measurement community.

Looking Ahead to 2026

The 2026 agenda includes continued validation, expanded fairness monitoring, new task types, and deepened institutional collaboration. Ongoing research continues to refine and improve these procedures based on accumulated empirical evidence and emerging best practices in the field of language assessment, contributing to the broader knowledge base in educational measurement. Careful attention to these measurement principles ensures that the assessment yields scores that are both reliable and valid for their intended interpretive purposes, supporting appropriate score-based decisions for all test-takers regardless of their background characteristics. Test-takers and score users alike benefit from these rigorous methodological standards, which prioritize both measurement accuracy and fairness across diverse linguistic and cultural populations. Ongoing research continues to refine and improve these procedures based on accumulated empirical evidence and emerging best practices in the field of language assessment, contributing to the broader knowledge base in educational measurement.