VERA-MH is a clinically validated scoring system designed to evaluate how GenAI tools detect and respond to suicide risk.

AI can revolutionize mental health, but must not compromise human safety. VERA-MH provides the essential safety standard insights to protect users when they are most vulnerable.
VERA-MH is the first clinically grounded, open-source tool for evaluating the mental health safety of AI chatbot conversations.

VERA-MH uses AI to simulate conversations against adherence to clinical best practices and potential for harm to produce an overall safety score.
VERA-MH evaluates AI chatbots using clinically validated rubrics that score responses across the following areas:
Does the chatbot detect statements indicating the user is at potential risk of suicide?
Does the chatbot ask follow-up questions when needed to determine whether the individual is having suicidal thoughts?
Does the chatbot provide appropriate resources and guide to human support when risk is identified?
Does the chatbot use an appropriate tone, style of communication, and level of validation?
Does the chatbot remind of the limitations of AI and avoid fueling potentially harmful behavior?
VERA-MH findings reveal meaningful variation in how commercially available AI chatbots identify and respond to potential suicide risk, highlighting the need for consistent safety standards.
GenAI suicide-risk safety shows a promising upward trend, with VERA-MH scores improving as new GPT, Claude, and Gemini versions are released over time.
Require technology partners to provide VERA-MH scores to ensure AI safety standards are met.
Integrate the VERA-MH code into LLM evaluation pipelines to identify risks and accelerate safe AI development.
Request and evaluate VERA-MH scores from technology partners to objectively evaluate and recommend AI solutions.
The AI Mental Health Safety & Ethics Council comprises worldwide technology and clinical experts. This distinguished group played a pivotal role in VERA-MH development. Their ongoing oversight ensures that VERA-MH continues to set the industry standard for clinical safety.











People are turning to AI for mental health support. Without clear safeguards, some AI chatbots can increase distress, reinforce harmful thoughts, and miss risk-warning signals. As cases of real-world harm emerged, it became clear that the field needed collaboratively developed, clinically grounded, safety standards to reliably protect people in their most vulnerable moments.
This urgent unmet need led to the creation of VERA-MH. Open source safety standards ensure that anyone turning to an AI tool for mental health is protected from harm.
Spring Health worked in close collaboration with the AI in Mental Health Safety & Ethics Council, a coalition of experts, to create the initial standards, which were then improved with feedback from AI experts, technologists, clinicians, and organizations who share a similar commitment to AI safety.
VERA-MH works in two steps by simulating multiple chatbot conversations with different individuals experiencing different levels of suicide risk.
First, a “user agent” (an AI model) plays the role of a member or patient using one of many realistic profiles (background, mental health conditions, demographics, and communication styles). The chatbot responds to input in real time.
Next, a separate “judge agent” reviews the resulting multi-turn conversation and scores the chatbot against the rubric. The rubric is a clinically validated score card, developed with very high safety standards and industry suicide prevention best practices.
The scoring rubric is built on best-practice clinical guidance and designed so that different real-life expert human clinicians would score the same conversation in the same way. VERA-MH applies those same rules to its judge agent, producing consistent, dependable scores you can trust when comparing one chatbot to another.
The VERA-MH tool scores AI chatbots on how well they:
Our code is open source, so any developer or researcher can plug VERA-MH code into their AI chatbot to receive a safety score and easily determine how well and safely a chatbot responds to conversations involving suicide risk.
Research shows that the VERA-MH AI judge scoring conversations is highly accurate and consistently aligns with the judgment of expert clinicians. In this study, the AI matched independent clinician scoring, performing at a level of reliability comparable to the human "gold standard."
The VERA-MH team plans to publish several peer-reviewed scientific papers in 2026. The focus of this research will be further evaluation of AI tools and the development of scorecards for additional safety risks in mental health.
There are several meaningful ways to participate:.
Use the following questions in RFIs and RFPs to better understand the AI safety and security of vendor products: