TruLens

Evaluate LLM applications with programmatic feedback functions

TestingOpen Source

Description

TruLens is an open-source Python evaluation framework originally created by TruEra and now maintained by Snowflake following its acquisition in May 2024. TruEra was founded by ML interpretability researchers, and that lineage is visible in TruLens's core abstraction: feedback functions that attach to any input, output, or intermediate trace step and return explainable float scores rather than opaque pass/fail results. For teams running on Snowflake Cortex, TruLens is the native evaluation layer, with traces logging directly to Snowflake and feedback functions doubling as production guardrails.

Feedback functions and Metric class API: Programmatic evaluators that score inputs, outputs, and intermediate steps on a 0.0–1.0 scale, with chain-of-thought reasoning support for interpretable results

RAG Triad built-in metrics: Context Relevance, Groundedness, and Answer Relevance, a three-metric evaluation framework for RAG pipelines that has been widely adopted beyond TruLens itself
Snowflake Cortex native integration: Traces log directly to Snowflake; feedback functions run against Cortex Search and Cortex LLM Functions without additional configuration
Stack-agnostic instrumentation: Works with LangChain, LlamaIndex, LangGraph, and raw Python across OpenAI, Anthropic, HuggingFace, and Snowflake Cortex as feedback LLM providers
OpenTelemetry-native tracing: OTel span support makes TruLens trace data portable to Datadog, New Relic, and other observability stacks
Built-in local dashboard: Visualizes feedback scores and traces across evaluation runs via session.start_dashboard(), with no external service required

Alternative tools

Claude Code
Agentic coding tool that runs in your terminal
OpenAI Codex CLI
Terminal coding agent built on OpenAI reasoning models
Aider
AI pair programming in your terminal
Cline
Open-source AI coding agent for any editor
Braintrust Evals
Trace every step your LLM agent takes, from prompt to response
Giskard
Scan AI agents for vulnerabilities before and after deployment

Used in Stacks

No saved stacks include this tool yet.

Browse more in Testing