This article is part of a Special Section of On Board with Professional Psychology that focuses on the intersection of professional psychology and Artificial Intelligence (AI). Learn more about ABPP’s Artificial Intelligence Task Force.
According to recent estimates, 14% of the world’s population, which is over 1 billion people, have a mental health condition, and this proportion has grown at a faster rate than the population itself (World Health Organization, 2025). In the U.S., the growth in mental health disorders and associated demand for mental health services has contributed to what has been labeled a national mental health crisis in recognition of the stark reality that a substantial proportion of individuals in need of mental health services are unable to receive them (Stringer, 2024). These challenges affect all areas of professional psychology including specialties focused on assessment.
Clinical neuropsychology is one such specialty area in which these issues are prominent. With fewer than 6,000 clinical neuropsychologists practicing in the U.S. (Morrison, 2021), demand for neuropsychological evaluation (NPE) far outpaces supply. Traditional approaches to NPE compound the access limitations with labor-intensive workflows involving several hours of standardized test administration, hand scoring, and preparation of lengthy evaluation reports (Sweet et al., 2021). Complicating the situation further, many widely used cognitive tests still rely on paper-and-pencil formats that produce coarse data such as total correct or completion time, and advances in adaptive testing and modern psychometrics remain underutilized (Parsons & Duffield, 2020).
Although telehealth visits and computerized measures have modestly expanded access, such incremental changes are insufficient to address workforce shortages or the inefficiencies inherent in traditional NPE. Access challenges extend across practice settings and patient populations limiting the availability of timely psychological assessment and resulting in unacceptably long wait times for pediatric, adult, and lifespan providers alike. Generative artificial intelligence (AI) offers a promising path forward. Beyond efficiency, it has the potential to enrich data collection, accelerate test innovation, and enhance training. This commentary highlights three areas in which generative AI is poised to offer the greatest immediate benefit to assessment-focused specialties within professional psychology: test administration and scoring, test development, and training simulations.
Test Administration and Scoring
The interactive capability of generative AI presents an opportunity to extend digital test administration and scoring approaches. Over half of the tests in a typical battery used in NPE include verbal responses. One option would involve development of an ambient audio monitoring system that captures verbal responses from examinees in real time, scores them according to standardized instructions and normative data, automatically generates score reports, and stores item-level responses in a local clinical dataset. All cognitive tests in a typical NPE include verbal presentation of test instructions and rely on use of standardized and unstandardized prompts or cues to address examinee questions and redirect off-task behavior.
Digital versions of audio stimuli exist for several measures, and automated screening batteries have been implemented on tablets. These tools, however, typically do not clarify instructions or provide redirection. A logical extension leveraging current AI capabilities would combine automated presentation of instructions and stimuli with the ability to answer common questions and redirect off-task responses as they arise.
These AI enhancements to current computerized test administration would offer benefits that extend beyond NPE. Structured assessments are widely used across clinical psychology and comprise core job functions in the majority of specialty areas certified by the American Board of Professional Psychology. Automating routine aspects of administration and scoring will reduce examiner burden, improve standardization, and expand access across specialties. The role of the clinician would remain central for data integration and interpretation, but much of the administrative workload could be streamlined.
Test Development
Traditional test development is slow and expensive. Prior to administering a new test to human participants, extensive work is required to generate and refine the items. Content experts create a large pool of potential items relevant to the target construct. Items are screened to eliminate redundancy and to ensure appropriate characteristics such as readability and difficulty. Once an initial pool is developed, items are administered to human participants to establish psychometric properties such as internal consistency and reliability. Subsequently, normative data must be gathered that reflect the population with whom the test will be used. The process is time-intensive and costly.
Given the speed with which generative AI can produce language and its ability to modulate tone, style, and linguistic features, there is enormous potential for its use in new test development. Initial work has shown promising results using large language models to generate items for self-report measures of personality and related constructs (Kowal et al., 2025). There is also evidence that early item revision can be automated (Russell-Lasalandra et al., 2024). Using such a system, a large pool of items can be rapidly prototyped and refined before administration to human participants to establish basic psychometric properties. Early research focused primarily on self-report measures (Gotz et al., 2023) with recent extension into high-stakes assessment used in personnel selection (Kowal et al., 2025). It seems reasonable that the approach might extend to cognitive test development, given that many cognitive tests include verbal stimuli and responses.
A complementary application involves use of generative AI to simulate human responses to tests under development. One approach might use digital twins, virtual counterparts to human participants matched on key demographic and psychological characteristics (Sun et al., 2023). Preliminary work in marketing research indicates that survey responses produced by digital twins closely approximate human responses, with conclusions based on simulated and real-world data aligning 95% of the time (Korst et al., 2025). While there are limits to how closely these responses mirror those of human participants, digital twins may offer an efficient and economically viable starting point. Combined with AI-generated item pools, this approach could yield preliminary psychometric insights at a fraction of the cost of traditional test development. In short, AI-systems for rapid prototyping may greatly accelerate early-stage psychological test development while reducing overhead and streamlining data collection.
Training in NPE
A final way in which AI may contribute to NPE is by enhancing clinical training. Two key evaluation elements, the neurobehavioral status exam (NBSE) and standardized test administration, are well suited to AI-supported simulation. Promising early results have been shown through use of AI-simulation to teach foundational clinical interviewing skills (Garrison, 2024), offering an important proof of concept. A logical next step would be to extend such systems to the NBSE, which involves structured inquiry into cognitive and somatic complaints, mental status examination, and elicitation of functional and contextual information that guide test selection. Learning these skills typically involves a progression from foundational knowledge gained in graduate coursework to observation and supervised practice in clinical settings. An AI-based NBSE simulation platform could provide trainees with additional opportunities to rehearse and refine these skills across a wide range of simulated patients. Patient profiles might incorporate demographic variability, diverse presenting concerns, and a spectrum of relevant clinical conditions. Trainees could initiate sessions remotely, engage in virtual clinical interactions, and receive real-time or post-session feedback aligned with predefined learning objectives.
The process of learning standardized test administration also involves a combination of knowledge acquisition and hands-on experience. In the typical approach, students practice test administration with friends, relatives, and/or classmates, which raises ethical concerns and is subject to the availability and willingness of others. An AI examinee could offer a scalable alternative. Using a digital training portal, students could practice administering standardized tests across simulated examinee profiles. The system might begin with straightforward examinee styles and gradually introduce more challenging examinee behavior or responses associated with cognitive impairment. As with the NBSE simulator, the system could record the training sessions for supervisory review and include automated performance evaluation based on fidelity to standardized procedures or another rubric.
Conclusion
Generative AI will not replace the expertise of board-certified psychologists, but it offers promising possibilities that may extend our reach. Although this commentary has focused on positive outcomes from such use of AI, there remain several practical realities that warrant careful consideration. Automation can enhance clinical practice only to the extent that it supports clinical decision making rather than supplanting it. AI models remain vulnerable to bias, and the algorithms underlying particular decisions or recommendations are often not accessible nor readily interpreted (Yu et al., 2024). Ethical considerations, especially related to privacy and data security, must be addressed. We present these limitations as preliminary counterpoints to our decidedly optimistic perspective and encourage continued dialogue including consideration of a follow-up article in this journal focused on these and other challenges related to clinical AI.
In summary, generative AI offers promising opportunities for improving test administration, test development, and training across professional psychology specialties. With streamlined processes and greater scalability, AI systems can facilitate modernization of assessment methods while maintaining scientific and clinical rigor. With thoughtful consideration of ethical concerns and explicit inclusion of human values, AI tools can be developed that support but do not replace human clinical expertise.
References
Garrison, L. (2024, November 8-9). AI enhanced clinical psychology training: A future forward approach. Texas Psychological Association Annual Convention, Fort Worth, TX.
Gotz, F. M., Maertens, R., Loomba, S., & van der Linden, S. (2024). Let the algorithm speak: How to use neural networks for automatic item generation in psychological scale development. Psychological Methods, 29(3), 494–518. https://doi.org/10.1037/met0000540
Korst, J., Puntoni, S., & Toubia, O. (2025). How gen AI is transforming market research. Harvard Business Review, 103 (3), 1-15. https://hbr.org/2025/05/how-gen-ai-is-transforming-market-research
Kowal, J. M., Bryant, K. H., Segall, D., & Kantrowitz, T. (2025). Harnessing generative AI for assessment item development: Comparing AI‐generated and human‐authored items. International Journal of Selection and Assessment, 33(3), e70021. https://onlinelibrary-wiley-com.libproxy.uthscsa.edu/doi/10.1111/ijsa.70021?af=R
Morrison C. (2021). How many neuropsychologists are there in the U.S.? The Clinical Neuropsychologist, 35(4), 792. https://www-tandfonline-com.libproxy.uthscsa.edu/doi/full/10.1080/13854046.2021.1900401
Parsons, T., & Duffield, T. (2020). Paradigm shift toward digital neuropsychology and high-dimensional neuropsychological assessments: Review. Journal of Medical Internet Research, 22(12), e23777. https://doi.org/10.2196/23777
Russell-Lasalandra, L. L., Christensen, A. P., & Golino, H. (2024). Generative psychometrics via AI-GENIE: Automatic item generation and validation via network-integrated evaluation. PsyArXiv. https://doi.org/10.31234/osf.io/fgbj4
Stringer, H. (2024). Mental health care is in high demand. Psychologists are leveraging tech and peers to meet the need. Monitor on Psychology, 55(1). https://www.apa.org/monitor/2024/01/trends-pathways-access-mental-health-care
Sun, T., He, X., & Li, Z. (2023). Digital twin in healthcare: Recent updates and challenges. Digital Health, 9, 20552076221149651. https://doi.org/10.1177/20552076221149651
Sweet, J. J., Klipfel, K. M., Nelson, N. W., & Moberg, P. J. (2021). Professional practices, beliefs, and incomes of U.S. neuropsychologists: The AACN, NAN, SCN 2020 practice and “salary survey.” The Clinical Neuropsychologist, 35(1), 7–80. https://doi.org/10.1080/13854046.2020.1849803
World Health Organization (2025). World mental health today: Latest data. Author. https://iris.who.int/bitstream/handle/10665/382343/9789240113817-eng.pdf
Yu, K. H., Healey, E., Leong, T. Y., Kohane, I. S., & Manrai, A. K. (2024). Medical artificial intelligence and human values. The New England Journal of Medicine, 390(20), 1895–1904. https://doi.org/10.1056/NEJMra2214183

Jeremy Davis, PsyD, MBA, ABPP
Board Certified in Clinical Neuropsychology
Correspondence: davisj20@uthscsa.edu

Anthony Rios, PhD
Correspondence: anthony.rios@utsa.edu