Developers can use VERA-MH to get better guidance on what safe AI looks like, helping them spot problems and make improvements faster. Employers and health plans should require VERA-MH scores to establish a consistent, clinical benchmark for AI safety. This standardizes vendor oversight, allowing objective tool comparisons, and mitigates risk as AI adoption scales. Benefits consultants can more consistently and fairly evaluate AI mental health solutions and make informed suggestions by requesting VERA-MH scores as part of client RFPs. Researchers and Policymakers gain a common language to create guidelines, oversight, and future regulations.

The Industry Standard for AI Safety in Mental Health

Q: What does VERA-MH measure?

The VERA-MH tool scores AI chatbots on how well they: Detect Potential Risk - Does the chatbot detect statements indicating the user is at potential risk of suicide? Confirm Risk - Does the chatbot ask follow-up questions when needed to determine whether the individual is having suicidal thoughts? Guide to Human Care - Does the chatbot provide appropriate resources and guide to human support when risk is identified? Communicate Effectively - Does the chatbot use an appropriate tone, style of communication, and level of validation? Maintain Safe Boundaries - Does the chatbot remind of the limitations of AI and avoid fueling potentially harmful behavior?

Q: How does VERA-MH compare to expert human clinician scoring?

VERA-MH is highly accurate compared to expert human clinician scoring.

Q: How can I get involved with VERA-MH as a developer?

There are several meaningful ways to participate: Run VERA-MH on your own AI tools by downloading the open-source code. Share feedback and help shape what's next through the feedback form. Contribute to the development of the code by submitting contributions to the github repository. Share results by posting your VERA-MH scores to help the community learn together and move toward making safety a real, shared standard.

Meet the council

AI in Mental Health Safety & Ethics Council

The AI Mental Health Safety & Ethics Council comprises worldwide technology and clinical experts. This distinguished group played a pivotal role in VERA-MH development. Their ongoing oversight ensures that VERA-MH continues to set the industry standard for clinical safety.

Dr. Nina Vasan, MD, MBA

Author and leading voice at the intersection of clinical mental healthcare and AI; Founder & Director of Brainstorm: The Stanford Lab for Mental Health Innovation

Dr. Tim Hahn, PhD

Heisenberg Professor of Machine Learning & Predictive Analytics in Psychiatry at the Institute of Translational Psychiatry, University of Münster

Dr. Nicholas C. Jacobson, PhD

Associate Professor of Biomedical Data Science, Psychiatry, and Computer Science, Center for Technology and Behavioral Health, Geisel School of Medicine, Dartmouth College and creator of Therabot

Dr. John Krystal, MD

Robert L. McNeil, Jr. Professor of Translational Research; Professor of Psychiatry, Neuroscience, and Psychology, Yale School of Medicine

Dr. Nils Opel, MD

Professor of Affective Disorders and Deputy Director at the Department of Psychiatry & Neuroscience, Charité – Berlin University Hospital

Dr. Julian De Freitas, PhD

Assistant Professor of Business Administration, Harvard Business School; Director, Ethical Intelligence Lab

Fred Thiele

Vice President of Global Benefits & Mobility, Microsoft

Lilly Wyttenbach

Managing Director, Head of Global Wellness, JPMorgan Chase

Dr. Don Mordecai, MD

National Leader for Mental Health and Addiction Care, Kaiser Permanente

Hans Hage

Chief Product Officer Consumer, UnitedHealthcare

Dr. Doug Nemecek, MD

Chief Medical Officer, Behavioral Health, Evernorth Health Services

FAQ

Frequently Asked Questions

Why was VERA-MH developed?

People are turning to AI for mental health support. Without clear safeguards, some AI chatbots can increase distress, reinforce harmful thoughts, and miss risk-warning signals. As cases of real-world harm emerged, it became clear that the field needed collaboratively developed, clinically grounded, safety standards to reliably protect people in their most vulnerable moments.

This urgent unmet need led to the creation of VERA-MH. Open source safety standards ensure that anyone turning to an AI tool for mental health is protected from harm.

Spring Health worked in close collaboration with the AI in Mental Health Safety & Ethics Council, a coalition of experts, to create the initial standards, which were then improved with feedback from AI experts, technologists, clinicians, and organizations who share a similar commitment to AI safety.

How does VERA-MH evaluate safety?

VERA-MH works in two steps by simulating multiple chatbot conversations with different individuals experiencing different levels of suicide risk.

First, a “user agent” (an AI model) plays the role of a member or patient using one of many realistic profiles (background, mental health conditions, demographics, and communication styles). The chatbot responds to input in real time.

Next, a separate “judge agent” reviews the resulting multi-turn conversation and scores the chatbot against the rubric. The rubric is a clinically validated score card, developed with very high safety standards and industry suicide prevention best practices.

The scoring rubric is built on best-practice clinical guidance and designed so that different real-life expert human clinicians would score the same conversation in the same way. VERA-MH applies those same rules to its judge agent, producing consistent, dependable scores you can trust when comparing one chatbot to another.

‍

What does VERA-MH measure?

The VERA-MH tool scores AI chatbots on how well they:

Detect Potential Risk: Does the chatbot detect statements indicating the user is at potential risk of suicide?
Confirm Risk: Does the chatbot ask follow-up questions when needed to determine whether the individual is having suicidal thoughts?
Guide to Human Care: Does the chatbot provide appropriate resources and guide to human support when risk is identified?
Communicate Effectively: Does the chatbot use an appropriate tone, style of communication, and level of validation?
Maintain Safe Boundaries: Does the chatbot remind of the limitations of AI and avoid fueling potentially harmful behavior?

Who can use VERA-MH?

Our code is open source, so any developer or researcher can plug VERA-MH code into their AI chatbot to receive a safety score and easily determine how well and safely a chatbot responds to conversations involving suicide risk.

Developers can use VERA-MH to get better guidance on what safe AI looks like, helping them spot problems and make improvements faster.
Employers and health plans should require VERA-MH scores to establish a consistent, clinical benchmark for AI safety. This standardizes vendor oversight, allowing objective tool comparisons, and mitigates risk as AI adoption scales.
Benefits consultants can more consistently and fairly evaluate AI mental health solutions and make informed suggestions by requesting VERA-MH scores as part of client RFPs.
Researchers and Policymakers gain a common language to create guidelines, oversight, and future regulations.

Why is this the gold standard for AI safety in mental health?

VERA-MH applies more rigorous, clinically grounded safety benchmarks than other evaluation tools available today.
Chatbot performance is scored by measuring each response against clinically accepted best-practice expectations set by expert clinicians.
VERA-MH has been developed in partnership with many external, objective stakeholders (clinicians, developers, vendors, suicide prevention and mental health experts).
The AI in Mental Health Safety & Ethics Council and Spring Health researchers sought and incorporated input from a broad range of experts during a request-for-feedback period.
VERA-MH is entirely open source and automated which allows for ongoing evaluation criteria updating as guidelines and clinical best practices evolve.

How does VERA-MH compare to expert human clinician scoring?

Research shows that the VERA-MH AI judge scoring conversations is highly accurate and consistently aligns with the judgment of expert clinicians. In this study, the AI matched independent clinician scoring, performing at a level of reliability comparable to the human "gold standard."

What’s next for VERA-MH?

The VERA-MH team plans to publish several peer-reviewed scientific papers in 2026. The focus of this research will be further evaluation of AI tools and the development of scorecards for additional safety risks in mental health.

How can I get involved with VERA-MH as a developer?

There are several meaningful ways to participate:.

Run VERA-MH on your own AI tools: Download the open-source VERA-MH code and run the evaluation on your AI chatbots. This provides a standard rating for high-risk mental health scenarios and identifies areas for safety improvement.
Share feedback and help shape what’s next: VERA-MH is designed to evolve with the community. Submit feedback through this link to help refine the framework.
Contribute to the development of the code: Submit contributions to the github repository.
Share results: Post your VERA-MH scores. Transparency helps the community learn together and move toward making safety a real, shared standard, not just a claim.

What questions should I ask when assessing the safety of AI as an employer or as a benefits consultant?

Use the following questions in RFIs and RFPs to better understand the AI safety and security of vendor products:

Is there a 24/7/365 defined human clinician escalation path for ambiguous or high-risk cases?
Do you have a multi-layer AI safety framework?
Do you have a zero-retention policy to ensure AI systems donʼt store or use data for training purposes?
What governance, compliance, and transparency controls are in place?
Is the AI assisting clinicians or replacing clinical judgment?
Are members explicitly informed when they are interacting with AI, how it’s being used, and whether they can choose a human-only interaction?
What independent evidence demonstrates that the AI is safe, especially in high-risk cases?
How are models monitored, updated, and governed over time?
What is the VERA-MH safety score for the mental health tool?

The Industry Standard for AI Mental Health Safety

AI: The promise and risk in mental health

What is VERA-MH?

How it works

Detect Potential Risk

Confirm Risk

Guide to Human Care

Communicate Effectively

Maintain Safe Boundaries

Initial VERA-MH Findings

AI safety score rankings by VERA-MH v1

Model Safety Evolution

For Employers and Health Plans

For Developers

For Consultants