The most common method of describing performance test scores in clinical and forensic reports is by qualitative descriptors (e.g., average, normal, above average, impaired) (Guilmette, Hagan, & Giuliano, 2008). However, as any graduate student in applied psychology can confirm, different assessment supervisors often apply different qualitative labels to the same test scores. This can create contradictions and confusion not only among psychology trainees but also for referral sources, consumers, other clinicians, and the courts.
The inconsistent application of qualitative descriptors for performance test scores is not new. For example, Bruce Caplan, in his 1995 presidential address to Division 22 (Rehabilitation Psychology) of the American Psychological Association stated, “Terms such as ‘moderately impaired’ and ‘within normal limits’ frequently lack quantitative referents and thus are subject to differing interpretations across individuals and contexts” (p. 236). Hebben and Milberg (2009) also asserted that labels such as average or below average are imprecise and are applied idiosyncratically among clinicians.
In the most thorough empirical study of test score labeling in the literature, Guilmette et al. (2008) surveyed American Academy of Clinical Neuropsychology (AACN) board-certified neuropsychologists who were asked to assign a descriptive label (e.g., superior, average, normal, impaired, etc.) to 12 different standard scores (SS), from 50 to 130, from a memory test based on a brief case scenario. The 110 survey respondents assigned a mean of 14 different descriptive labels to each of the 12 standard scores (SD = 5.4, range = 6 – 23). The lower the standard score, the greater inconsistency among the participants. These data confirmed the lack of uniformity in assigning qualitative descriptors to specific test scores.
Although some labeling systems have existed for years within psychology, such as the Wechsler (Wechsler, 1981, 1997, 2008) and Heaton (Heaton et al., 1991, 2004) classifications, there has not been any consensus in the field as to which should be adopted by clinicians to promote greater consistency. To further complicate the uniform labeling of test scores, many test publishers assign idiosyncratic labels to various standard scores, which adds further complications for clinicians as they may feel compelled to use one descriptor for one test and a different descriptor for another test, even for the same standard score.
AACN Consensus Conference on Uniform Test Score Labeling
In the first concerted effort by a major professional organization to make recommendations for uniform labeling of performance test scores, the AACN Board of Directors granted approval in 2017 for the creation of a consensus panel and conference to address this issue. Beginning at the AACN annual meeting in 2018, the consensus panel, which was composed of 22 psychologists/neuropsychologists from 17 states, the District of Columbia, and Canada from diverse backgrounds (e.g., adult/pediatric, culture, work setting, forensics) considered multiple test score label options and received feedback from the AACN membership via its listserv over the next two years. A consensus was finally reached within the panel with the recommendations published in The Clinical Neuropsychologist (TCN) in 2020 (Guilmette et al., 2020). Because TCN is a neuropsychology journal, our recommendations may not have reached other applied psychology disciplines. Thus, the purpose of this article is to provide a summary of AACN’s recommendations in the hope that other psychologists will adopt them, resulting in greater consistency across psychological disciplines, which we believe is the mark of a maturing profession. For more detailed information about the process and rationales behind the consensus conference statement, the reader is encouraged to refer to the original Guilmette et al. (2020) article.
General guidelines and limitations of the AACN recommendations
- Recommendations were made for ability test scores only, and not for self-report questionnaires or rating scales.
- Recommendations are not mandates or requirements, even for members of AACN, and thus differ from practice standards. Rather, they represent suggestions for “best practices” as derived from input from multiple sources for the purpose of providing guidance to others within the profession.
- Recommendations were meant to simplify test score descriptors to improve communication.
- Test scores cannot be “impaired” and thus should not be labeled as such. Only functions or abilities can be impaired.
- Test score labels should be based on the rarity or frequency of scores relative to a normal curve distribution, and not based on pathology.
- The labeling of scores is different from the interpretation of scores. The AACN recommendations are in no way meant to interfere with or restrict clinical judgment. Recommended test score labels are meant to be descriptive and not interpretive.
Recommendations for test score labels of tests with normal distributions
The consensus conference panel considered multiple factors in making recommendations for labels for test scores that are normally distributed in the population. Some of these considerations included recommending labels that were simple and unambiguous to the greatest extent possible. In addition, the consensus conference panel held:
“a strong belief that test score labels should be free of terms that appear judgmental, biased or would be viewed as representing a clinical conclusion, and, instead should reflect only a score position within the normal distribution. Specifically, the intent was that score labels not appear to convey the separate process of clinical interpretation, which is the necessary step in determination of impairment or deficit.” (Guilmette et al., 2020, p. 443)
The consensus conference panel also considered adopting outright one of the existing classification systems, such as the Wechsler and Heaton models, but concluded that those did not conform to the overall objectives established by the consensus conference.
Table 1 lists the AACN recommended test score labels for normally distributed test scores that fall within specific ranges. The reader will note that the consensus conference panel chose to maintain the 7-category model established by the Wechsler system, but with some changes in the descriptors, as this was considered closer to what most clinicians would find familiar and thus could be more easily applied to clinical practice.
Recommendations for test score labels of tests with non-normal or skewed distributions
Psychologists may use a variety of ability tests that have non-normal or skewed distributions in the neurotypical population (e.g., Boston Naming Test, Montreal Cognitive Assessment, clock drawing, and figure drawings). These types of tests generally show very little variability among neurotypical adults. Consequently, the qualitative or descriptive labels applied to scores within normal distributions are not as meaningful when used in distributions that are not symmetrical.
Rather than listing test results as standard scores, as one would typically do with normal distributions, the consensus conference panel recommended that percentiles be used instead, because percentile ranks are more comparable and meaningful than transformed scores when the distribution is skewed. In addition, it was recommended that the same qualitative descriptors used for normally-distributed tests be used for non-normally distributed scores at the lower end but not the upper end of the distribution. The rationale is that it may be misleading to label a test score high average or above average because these types of tests are designed to identify deficits and not exceptional performance due to ceiling and floor effects. Thus, for scores that fall above the 24th percentile (e.g., average range or higher), the consensus conference panel recommended the descriptive terms normal range, within normal expectations, or within normal limits. For test scores below the 25th percentile, the consensus conference panel recommended the same qualitative descriptors as those for normally distributed test scores: 9-24th percentile – low average score; 2-8th percentile – below average score; and < 2nd percentile – exceptionally low score.
Performance validity tests (PVTs), procedures designed to assess suboptimal effort and malingering, are also not normally distributed. The consensus conference panel received a great deal of input from the AACN membership regarding the labeling of PVT scores, likely due to the forensic implications. The recommendations from the consensus conference for labeling PVT scores involved a 3-tiered approach that intentionally avoided interpretation based on the score alone. In reporting PVT scores, the recommendation was that clinicians refer to scores, not the overall test profile as this would be interpretive, as falling within the valid range, within the intermediate range, or within the invalid range.
Summary
The 2020 AACN consensus conference statement on test score labeling was the first effort within organized psychology or neuropsychology to attempt to provide much-needed consistency in reporting psychological/neuropsychological ability test scores. Although the recommendations were suggested by a neuropsychological professional organization, the consensus conference panel’s recommendations are applicable to any ability testing conducted by clinical, counseling, school, rehabilitation, or forensic psychologists. We recognize that not all applied psychologists will adopt the AACN recommendations. However, we are hopeful that this information informs those clinicians who were unfamiliar with the consensus conference panel’s recommendations and encourages them to consider adopting them in their practices and with their trainees.
For more information on the American Academy of Clinical Neuropsychology’s Recommendations on Uniform Labeling, see Guilmette et al., 2020.
References
Caplan, B. (1995). Choose your words. Rehabilitation Psychology, 40(3), 233-240. https://doi.org/10.1037/h0092829
Guilmette, T .J., Hagan, L., & Giuliano, A. J. (2008). Assigning qualitative descriptors to test scores in neuropsychology: Forensic implications. The Clinical Neuropsychologist, 22, 122-139. Https://doi.org/10.1080/13854040601064559
Guilmette, T. J., Sweet, J. J., Hebben, N., Koltai, D., Mahone, E. M., Spiegler, B. J., Stucky, K., Westerveld, M., & Conference Participants. (2020). American Academy of Clinical Neuropsychology consensus conference statement on uniform labeling of performance test scores. The Clinical Neuropsychologist 43(3), 437-453. https://doi.org/10.1080/13854046.2020.1722244
Heaton, R. K., Grant, I., & Matthews, C. G. (1991). Comprehensive norms for the extended Halstead-Reitan battery: Demographic corrections, research findings, and clinical applications. Odessa, TX: Psychological Assessment Resources.
Heaton, R. K., Miller, S. W., Taylor, M. J., & Grant, I. (2004). Revised comprehensive norms for an expanded Halstead-Reitan battery: Demographically adjusted neuropsychological norms for African-American and Caucasian adults professional manual. Lutz, FL: Psychological Assessment Resources.
Hebben, N., & Milberg, W. (2009). Essentials of neuropsychological assessment (2nd ed.). New York, NY: John Wiley and Sons.
Wechsler, D. (1981). Wechsler Adult Intelligence Scale-Revised. New York: The Psychological Corporation.
Wechsler, D. (1997). Wechsler Adult Intelligence Scale (3rd ed.). San Antonio, TX: Psychological Corporation.
Wechsler, D. (2008). Wechsler Adult Intelligence Scale (4th ed.). San Antonio, TX: Psychological Corporation.
Table 1. AACN Recommended Descriptive Test Score Labels for Tests with Normal Distributions
Standard Scores | T Scores | Corresponding Percentiles | Descriptive Labels |
130 and higher | 70 and higher | 98 and higher | Exceptionally High |
120-129 | 63-69 | 91-97 | Above Average |
110-119 | 57-62 | 75-90 | High Average |
90-109 | 43-56 | 25-74 | Average |
80-89 | 37-42 | 9-24 | Low Average |
70-79 | 30-36 | 2-8 | Below Average |
Below 70 | Below 30 | Less than 2 | Exceptionally Low |
Thomas J. Guilmette, PhD, ABPP-CN
Board Certified in Clinical Neuropsychology
Correspondence: tguilmet@providence.edu
Leigh D. Hagan, PhD, ABPP
Board Certified in Forensic Psychology
Correspondence: lhagan@leighhagan.com
Eric Y. Drogin, JD, PhD, ABPP
Board Certified in Forensic Psychology
Correspondence: edrogin@bwh.harvard.edu