Vectara HHEM

Detect hallucinations in RAG outputs using a dedicated classification model

Vectara HHEM is profiled here as a Evaluation tool for engineering teams. Read about features, pricing, and how it compares to related options in the tools directory.

EvaluationOpen Source

Visit Website GitHub

Description

HHEM, the Hughes Hallucination Evaluation Model, is a specialized hallucination detection model built by Vectara and named in memory of colleague Simon Mark Hughes. Founded by Amr Awadallah, who previously co-founded Cloudera and took it to a $5.3 billion acquisition, and now led by co-founder Tallat M Shafaat following Awadallah's departure in December 2025, Vectara built HHEM to address a specific limitation in RAG evaluation: LLM-as-judge approaches like Ragas-Faithfulness with GPT-4 can take up to 35 seconds per evaluation and produce inconsistent scores. HHEM is a pure classification model that takes a premise and hypothesis as inputs, returns a calibrated 0.0 to 1.0 factual consistency score, and processes a 2,000-token input in 1.5 seconds on a standard CPU.

Key Capabilities

Pure classification architecture: HHEM scores whether a generated output is supported by a source document without calling an LLM judge, outperforming GPT-3.5-Turbo and GPT-4 on AggreFact-SOTA, RAGTruth-Summ, and RAGTruth-QA benchmarks
HHEM-2.1-Open (open weights): Available on Hugging Face and Kaggle with over 2 million downloads, runs on consumer hardware under 600MB RAM at 32-bit precision, English-only
HHEM-2.3 (commercial): Accessible via Vectara API with multilingual support across eight languages including Arabic, Chinese, Korean, Portuguese, and Spanish, with an unlimited context window and improved recall and precision over the open version
Hallucination Leaderboard: A public benchmark ranking frontier models by hallucination rate on summarization tasks, scored using HHEM and widely referenced as a community standard for comparing LLM reliability in RAG contexts
Vectara Hallucination Corrector (VHC): A companion product that corrects hallucinated content in RAG and agent outputs by replacing inaccurate generated text with claims supported by the retrieved context
Open-RAG-Eval framework: An open-source RAG evaluation library that uses HHEM alongside retrieval metrics, groundedness scoring, and citation accuracy checks for teams building full RAG evaluation pipelines

See Vectara HHEM pricing details →

Alternative tools

Gentrace
Testing and evaluation for generative AI applications
HELM
Reproducible, multi-scenario benchmarking of foundation models
lm-evaluation-harness
Standard framework for benchmarking language models
garak
Vulnerability scanner for large language models
DeepChecks
Validate ML models, LLM applications, and AI agent decisions across every development stage
Evidently AI
Evaluate, test, and monitor traditional ML models and LLM applications from one framework

Used in Stacks

No saved stacks include this tool yet.

Browse more in Evaluation