W&B Weave

Trace, evaluate, and monitor LLM applications systematically

W&B Weave is profiled here as a Observability tool for engineering teams. Read about features, pricing, and how it compares to related options in the tools directory.

ObservabilityFree

Visit Website GitHub

Description

Weave is the LLM evaluation and observability toolkit from Weights & Biases, the ML developer platform founded by Lukas Biewald, Chris Van Pelt, and Shawn Lewis and acquired by CoreWeave in 2025. A single decorator traces every function and model call, links the traces to versioned datasets and scorers, and turns those pieces into repeatable evaluations that quantify each prompt or model change. Weave shares an account with W&B Models, so teams already tracking training experiments extend the same workspace to LLM application work.

Key Capabilities:

Call tracing with automatic input, output, and code versioning
Evaluation framework with custom scorers and LLM-as-judge support
Dataset versioning for reproducible test sets
Playground for side-by-side model and prompt comparison
Online monitors and guardrail scorers for production traffic
Python and TypeScript SDKs with OpenTelemetry compatibility

See W&B Weave pricing details →

Alternative tools

HoneyHive
Evaluation and observability platform for AI agents
Sentry
Error tracking and performance monitoring for developers
SigNoz
Open-source, OpenTelemetry-native observability platform
Datadog
Unified observability for metrics, traces, and logs
Arize AX
Enterprise platform for AI observability and evaluation
OpenTelemetry
Vendor-neutral standard for traces, metrics, and logs

Used in Stacks

No saved stacks include this tool yet.

Browse more in Observability