BentoML

Python framework for packaging and serving ML models in production.

BentoML is profiled here as a Deployment tool for engineering teams. Read about features, pricing, and how it compares to related options in the tools directory.

DeploymentOpen Source

Visit Website GitHub

Description

Short Intro: BentoML is an open-source Python framework for deploying ML models as production REST APIs, founded in 2019 by Chaoyu Yang after he spent five years at Databricks watching enterprise teams struggle to move trained models into production serving. In February 2026, Modular AI, the company founded by Chris Lattner (creator of LLVM and Swift), acquired BentoML to integrate its packaging, adaptive batching, and Kubernetes orchestration into the MAX inference platform, while keeping the project Apache 2.0 with active maintenance continuing. Over 10,000 organizations including 50+ Fortune 500 companies used BentoML before the acquisition.

Key Capabilities:

REST API server generation from any model inference script using Python type hints
Automatic Docker container generation with reproducible dependency management
Adaptive batching delivering up to 100x the throughput of standard Flask-based model servers
Multi-model inference graph orchestration for multi-stage pipelines
LLM serving with vLLM backend and OpenAI-compatible API
RAG pipeline deployment with open-source embedding and language models
Image generation serving with Stable Diffusion and configurable batch processing
Agentic pipeline and embeddings serving
Deployment targets spanning AWS SageMaker, Lambda, GCP Cloud Run, Azure Functions, and Kubernetes
ComfyUI pipeline support for reproducible workflow execution
OpenTelemetry tracing with Jaeger, Zipkin, and OTLP support
gRPC server support alongside HTTP REST
RBAC, SSO, and audit logs for enterprise team access control
BentoCloud managed cloud service for teams that prefer not to self-host

See BentoML pricing details →

Alternative tools

Dokku
Self-hosted platform-as-a-service on your own server
Heroku
Managed platform for deploying apps with git push
Porter
Platform-as-a-service that runs in your own cloud account
Kamal
Deploy containerized apps to your own servers
Coolify
Self-hosted deployment platform for any server
Netlify
Git-driven platform for deploying modern web frontends

Used in Stacks

No saved stacks include this tool yet.

Browse more in Deployment