Vectara HHEM
Detect hallucinations in RAG outputs using a dedicated classification model
Description
HHEM, the Hughes Hallucination Evaluation Model, is a specialized hallucination detection model built by Vectara and named in memory of colleague Simon Mark Hughes. Founded by Amr Awadallah, who previously co-founded Cloudera and took it to a $5.3 billion acquisition, and now led by co-founder Tallat M Shafaat following Awadallah's departure in December 2025, Vectara built HHEM to address a specific limitation in RAG evaluation: LLM-as-judge approaches like Ragas-Faithfulness with GPT-4 can take up to 35 seconds per evaluation and produce inconsistent scores. HHEM is a pure classification model that takes a premise and hypothesis as inputs, returns a calibrated 0.0 to 1.0 factual consistency score, and processes a 2,000-token input in 1.5 seconds on a standard CPU.
Key Capabilities
Pure classification architecture: HHEM scores whether a generated output is supported by a source document without calling an LLM judge, outperforming GPT-3.5-Turbo and GPT-4 on AggreFact-SOTA, RAGTruth-Summ, and RAGTruth-QA benchmarks
HHEM-2.1-Open (open weights): Available on Hugging Face and Kaggle with over 2 million downloads, runs on consumer hardware under 600MB RAM at 32-bit precision, English-only
HHEM-2.3 (commercial): Accessible via Vectara API with multilingual support across eight languages including Arabic, Chinese, Korean, Portuguese, and Spanish, with an unlimited context window and improved recall and precision over the open version
Hallucination Leaderboard: A public benchmark ranking frontier models by hallucination rate on summarization tasks, scored using HHEM and widely referenced as a community standard for comparing LLM reliability in RAG contexts
Vectara Hallucination Corrector (VHC): A companion product that corrects hallucinated content in RAG and agent outputs by replacing inaccurate generated text with claims supported by the retrieved context
Open-RAG-Eval framework: An open-source RAG evaluation library that uses HHEM alongside retrieval metrics, groundedness scoring, and citation accuracy checks for teams building full RAG evaluation pipelines
Alternative tools
- Groq Cloud
LPU-powered inference cloud for real-time AI applications.
- Fireworks AI
High-performance inference cloud for open-source models at enterprise scale.
- Together AI
Full-stack AI cloud for inference, training, and fine-tuning
- Replicate
Run open-source AI models through a single API.
