Lilypad

Tagline: Python-native LLM versioning and tracing via a single decorator.

Lilypad is profiled here as a Observability tool for engineering teams. Read about features, pricing, and how it compares to related options in the tools directory.

ObservabilityOpen Source

Visit Website GitHub

Description

Short Intro: Lilypad is an MIT-licensed open-source tool for versioning, tracing, and annotating LLM calls, built by William Bakst (ex-Google, ex-Stanford) as part of the Mirascope ecosystem. The Mirascope GitHub README states that Lilypad will remain open-source and available but is no longer the team's primary development focus, with active work moving to the Mirascope Python toolkit. Teams evaluating Lilypad for production use should read it alongside Mirascope, the companion LLM toolkit Lilypad was built to complement.

Key Capabilities:

@trace decorator that automatically versions every LLM function call with its full execution context, including input data, model settings, and surrounding code
Framework-agnostic tracing that works with any Python LLM library, not just Mirascope
Non-deterministic function support extending tracing to embedding lookups and RAG pipeline steps
Playground for domain experts to edit prompt templates and review outputs without writing code
Version comparison, A/B testing, and rollback across prompt and code changes
Multi-provider support for OpenAI, Anthropic, Azure, AWS Bedrock, Gemini, Mistral, and Vertex AI
Human annotation and dataset management for continuous evaluation
Self-hostable with a local run option and an enterprise edition available on request

See Lilypad Pricing Details →

Alternative tools

HoneyHive
Evaluation and observability platform for AI agents
Sentry
Error tracking and performance monitoring for developers
SigNoz
Open-source, OpenTelemetry-native observability platform
Datadog
Unified observability for metrics, traces, and logs
Arize AX
Enterprise platform for AI observability and evaluation
OpenTelemetry
Vendor-neutral standard for traces, metrics, and logs

Used in Stacks

No saved stacks include this tool yet.

Browse more in Observability