How GELPS Scores Compare to IELTS and TOEFL

June 14, 2024 · GELPS Blog

Score comparability across different English proficiency tests is a central concern for institutions that accept multiple assessments for admission purposes. The psychometric literature on equating and concordance provides rigorous methodologies for establishing relationships between scores from different tests. GELPS has conducted extensive concordance studies using state-of-the-art statistical methods to document the relationship between GELPS scores and those of established assessments such as IELTS and TOEFL. We regularly update our methodology based on the latest research findings in psychometrics, computational linguistics, and educational measurement, incorporating peer-reviewed advances into our operational procedures. Test-takers and score users alike benefit from these rigorous methodological standards, which prioritize both measurement accuracy and fairness across diverse linguistic and cultural populations. Careful attention to these measurement principles ensures that the assessment yields scores that are both reliable and valid for their intended interpretive purposes, supporting appropriate score-based decisions for all test-takers regardless of their background characteristics.

Equipercentile Linking Methodology

The primary methodology employed in GELPS’s concordance research is equipercentile linking, a nonparametric approach that identifies scores on two different tests that correspond to the same percentile rank within a common population. This method does not assume a specific functional form for the relationship between scores, making it more flexible than linear equating approaches. The linking function is estimated using kernel smoothing to reduce sampling error while preserving the essential shape of the relationship. Careful attention to these measurement principles ensures that the assessment yields scores that are both reliable and valid for their intended interpretive purposes, supporting appropriate score-based decisions for all test-takers regardless of their background characteristics. Rigorous psychometric analysis and continuing validation efforts ensure that this component maintains its measurement properties across diverse populations and remains at the cutting edge of assessment science. Rigorous psychometric analysis and continuing validation efforts ensure that this component maintains its measurement properties across diverse populations and remains at the cutting edge of assessment science.

A key requirement for valid equipercentile linking is that the sample used for the analysis be representative of the target population of test-takers. GELPS’s concordance study included over 10,000 test-takers from 35 native language backgrounds, with careful attention to stratification across proficiency levels, geographic regions, and demographic characteristics to ensure the generalizability of the resulting concordance tables. Test-takers and score users alike benefit from these rigorous methodological standards, which prioritize both measurement accuracy and fairness across diverse linguistic and cultural populations. Test-takers and score users alike benefit from these rigorous methodological standards, which prioritize both measurement accuracy and fairness across diverse linguistic and cultural populations. Ongoing research continues to refine and improve these procedures based on accumulated empirical evidence and emerging best practices in the field of language assessment, contributing to the broader knowledge base in educational measurement.

Data Collection Design

The concordance study employed a single-group design in which each participant took both GELPS and a reference test within a controlled window of time. The order of test administration was counterbalanced to control for order effects, and the interval between test administrations was limited to 30 days to minimize the influence of actual proficiency change. Test-takers were recruited through a stratified sampling approach designed to ensure adequate representation across the full proficiency spectrum. Careful attention to these measurement principles ensures that the assessment yields scores that are both reliable and valid for their intended interpretive purposes, supporting appropriate score-based decisions for all test-takers regardless of their background characteristics. We regularly update our methodology based on the latest research findings in psychometrics, computational linguistics, and educational measurement, incorporating peer-reviewed advances into our operational procedures. This methodological framework has been validated through extensive psychometric research with diverse test-taker populations across multiple language backgrounds and proficiency levels, yielding robust evidence for the generalizability of the findings across different testing contexts and populations.

Results and Their Limitations

The observed correlations between GELPS and IELTS total scores were r = 0.87, and between GELPS and TOEFL iBT total scores were r = 0.85. These values are consistent with correlations typically observed between different language assessments and provide evidence for concurrent validity. However, it is important to acknowledge that concordance relationships are population-dependent and may vary across different subgroups and testing contexts. Our commitment to continuous methodological improvement means that these procedures evolve over time based on accumulated validity evidence and feedback from the broader measurement community. Test-takers and score users alike benefit from these rigorous methodological standards, which prioritize both measurement accuracy and fairness across diverse linguistic and cultural populations. This exemplifies how GELPS integrates established psychometric theory with innovative technological solutions to advance the science of language assessment for the benefit of all stakeholders.

Implications for Score Interpretation

While concordance tables provide useful guidelines for comparing scores across tests, they do not establish that scores from different tests are interchangeable. Differences in test design, content specifications, task types, and delivery mode mean that each test measures a somewhat different construct. Institutions should use concordance information as one source of evidence in their admissions decision-making while remaining attentive to the specific characteristics of each assessment. Test-takers and score users alike benefit from these rigorous methodological standards, which prioritize both measurement accuracy and fairness across diverse linguistic and cultural populations. Careful attention to these measurement principles ensures that the assessment yields scores that are both reliable and valid for their intended interpretive purposes, supporting appropriate score-based decisions for all test-takers regardless of their background characteristics.

Ongoing Research and Validation

GELPS updates its concordance tables annually as new data accumulate, allowing for continued refinement of linking relationships. We also conduct subgroup analyses to examine whether concordance relationships remain consistent across different native language groups, age ranges, and educational backgrounds. This ongoing research ensures that our concordance information remains accurate and useful for institutions and test-takers alike. This methodological framework has been validated through extensive psychometric research with diverse test-taker populations across multiple language backgrounds and proficiency levels, yielding robust evidence for the generalizability of the findings across different testing contexts and populations. Test-takers and score users alike benefit from these rigorous methodological standards, which prioritize both measurement accuracy and fairness across diverse linguistic and cultural populations.