ai psychologist

I sit at an unusual intersection of clinical psychology and computer science. When colleagues, from either field, ask me whether AI belongs in therapy, the best answer I have is: it depends on the type of system, the clinical task, and the level of risk involved.

Where AI therapists do a good job

The empathy findings surprised me. While technically AI’s empathy is shallow empathy, licensed clinicians preferred the LLM chatbot’s answers 78.6% of the time and rated them as more empathetic and higher quality on average than physician’s responses 1 2

Vignette based analysis of risk and psychopathology is also on par with trained clinicians in some domains. GPT based models’ estimates of suicide risk on standardised vignettes are similar to mental health professionals’ judgments, although AI can overestimate psychological pain and underestimate resilience. 3 While large language models generally match or exceed clinicians on several diagnostic and triage vignettes, AI still struggles with complex, heterogeneous real world cases. 4 5

AI is good at structured screening. A recent systematic review found that voice analysis (acoustic features) for depression symptoms can discriminate depressed from non depressed participants with AUCs between 0.71 and 0.93, which is comparable to established screening tools. 6 7 This signal-processing and pattern recognition is well beyond what clinicians are trained to do.

AI deliver health information mostly well. AI provide broadly correct psychoeducation, explain diagnoses and treatments in plain language, and can walk people through basic CBT style skills practice. 8 5 Sustained, low stakes, supportive conversations, like “I’m anxious about this exam,”, or “I feel lonely at night” is an area where current systems can do reasonably well and are often rated as helpful and empathic by users.

As access to mental services is still generally poor, AI can help to fill this gap. Roughly 18–19% of adolescents (12–17yo) meet criteria for a past year major depressive episode, but only about 40–50% of those teens receive any mental health treatment. In other words, half or more get no help at all. 9 This statistic is even worse in low and middle income countries, where 70–75% never receive mental health support. That is, three out of four people with a mental disorder never sees mental health care worker. 10 11 These are not abstract statistics. These are people who do not have any support unless AI could step in to narrow the gap, at least on the low intensity end of the problems.

When AI is not a good therapist

AI is not ready for complex clinical assessment. Nor for risk management.

While large language models do well on structured or relatively clean tasks, they perform inconsistently on complex, comorbid, or high stakes cases. These tools can mishandle nuanced cultural or gender factors, underestimate or misjudge suicide risk in some scenarios, and sometimes produce overly confident but very wrong recommendations. 3 4 5 

To make this concrete, consider a patient presenting with low mood who also has a recent loss, financial stress, and a history of trauma. A clinician reads the hesitation before answering, the flat affect, the psychomotor slowing, and recalibrates risk estimate in real time. An LLM receives words, and those words often do not carry the extra information the clinician relied on. The model may conclude “mild depression, monitor” when a human would have escalated.

These errors are not edge cases, but are structurally likely when complexity outpaces the information available in pure text.

Unfortunately, errors are hard to predict and change depending on how the system is prompted. The exact wording used, the model version in play matter, all factors clinicians cannot reliably anticipate or control.

Bias, harm, avoidance

Bias is also a challenge. Studies find that LLMs generate different diagnostic impressions and treatment suggestions when demographics are changed, and empathy ratings and quality of advice vary across racial and diagnostic groups. 4 5 We can see lower accuracy and problematic errors for under represented groups and non English contexts, unless models are explicitly tuned and evaluated for fairness.

Harm is not hypothetical. Recent evaluations and case reports show that chatbots can endorse clearly harmful ideas in a substantial minority of scenarios. For some patients, intense chatbot use itself worsens delusions or mania. This can occur through over-validation of user beliefs and failure to challenge distorted thinking. 5 12 At population scale, even single digit percentages of problematic conversation translate into many people being nudged in the terribly wrong direction.

There is also a quieter failure mode, over-reliance and stagnation. Users may begin to substitute AI interactions for human support, reinforcing avoidance of difficult interpersonal situations. This can stabilise distress in the short term while maintaining the underlying problem, leading to getting stuck rather than improvement over time.

Multi-modal AI as therapist

Even with the multimodal models (text, voice, image), the interpretation of non-verbal cues remains a challenge. While some AI systems can process audio and video to track vocal prosody, facial micro-expressions, or gaze, they still lack the integration required to interpret these signals in a clinical context. While the popular claim that “93% of communication is nonverbal” is a misinterpretation of Mehrabian’s study, the underlying principle is sound: when verbal and nonverbal signals conflict, people weight nonverbal cues heavily in judging emotional meaning.

In practice, this means AI misses dissociation cues, flat affect in depression, or the psychomotor slowing that a trained clinician reads as a risk signal before a patient has put it into words. These are not subtle refinements, in high-stakes assessments they can be the most diagnostically important information in the room. While AI is no longer technically blind to these channels, it is profoundly “short-sighted”.

The right place for an AI psychologist

Winnicott wrote about “good enough” parenting, the idea that caregiving does not need to be perfect, just needs to be reliably adequate and safe over time. That framing translates well to AI therapy. The key question is whether it is safer than no support at all. AI can function as a “good enough” first point of contact for many people, provided the boundaries are honest and clearly maintained, and it is deployed within a broader system that catches the cases it cannot safely handle.

The Good Enough AI therapist

Good enough” AI here means that human oversight for escalation is always there, along with clear acknowledgment of uncertainty and transparency about what the system can and cannot do. Current research suggest that AI does well with psychoeducation, note drafting, summarisation, and structured skill coaching, but not autonomous diagnosis, formulation, let alone risk management. 4 8 5

Rule based systems like Woebot (shut down mid-2025) illustrate the promise and the limits of chat-therapy. Woebot’s scripted CBT chatbot showed measurable reductions in depressive symptoms but by the time of its shutdown, the founders’ own assessment was that AI had moved faster than their rule-based model could keep up with, rather than regulation alone being the deciding factor. The ceiling may have been both architectural and bureaucratic. 13 14 In other words, even relatively simple, well studied, rule based tools are hard to sustain and govern, never mind fully general AI “therapists.”.

Large language models are more flexible but bring new risks that are hard to quantify. For instance, fine-tuning models to be better therapist with parameter efficient fine tuning (LoRA) significantly degrade the model’s safety alignment, leading to unpredictable harmful outputs even when the fine tuning data are clinically sound. 15 16 Methods like safety-aligned LoRA (SaLoRA) try to preserve a fixed safety module while adapting other parameters, but they exist precisely because the default approach strips away guardrails.

Feeling safe is not the same as improving

Disclosure to AI is generally faster and more complete than to a human therapist. Disclosing to AI can be just as emotionally and relationally beneficial as disclosing to a human partner, and people often feel less fear of judgment when talking to a chatbot. 17 18 At the same time, they still report greater trust in human partners, which is an interesting asymmetry and that asymmetry matters clinically. Lower inhibition is what gets someone through the door, but trust is what allows the deeper, more uncomfortable work of therapy to actually happen. AI may lower the threshold to disclose, but it may not earn the foundation that supports change. For some this will be fine, a lower bar to accessing support is still a good thing.

For others, particularly those with relational trauma or attachment problems, an AI may feel safe precisely because it asks nothing of them. Unlike a human therapist, it carries no risk of rejection or disappointment. That felt safety can get people talking who otherwise wouldn’t. But in therapy, some of that “unsafe” friction is the mechanism of change itself. Learning to tolerate vulnerability with another person, repairing ruptures in the therapeutic relationship, being seen by someone who could reject you but doesn’t, are therapeutic in themselves. The problem is not that AI feels safe. It is that feeling safe is not the same as changing.

Does AI belong in therapy then?

AI is not a therapist yet, but it is also not without value.

The real question is not whether it can replace a human therapist entirely, but where it reduces harm and where it introduces it. Current evidence suggests that, for now, that line falls roughly between information delivery and supportive, low stakes conversation on one side, and complex clinical assessment and risk management on the other. 4 8 5 On the low intensity side, AI can provide much needed access, offer skills and psychoeducation, and provide “good enough” companionship and support for many people who currently get nothing. On the high stakes side, it remains brittle, wanders off-topic, biased, and hard to govern in ways clinicians, regulators, and clients would accept as safe.

In practice, however, this boundary is messy. Low-stakes interactions can escalate quickly, users do not reliably self-triage, and current systems do not consistently detect when a conversation has suddenly become high-risk.

That distinction is worth taking seriously.

Want to talk to a human psychologist? 🚫🤖

We are here to support you.