Together AI
Full-stack AI cloud for inference, training, and fine-tuning
Description
Together AI is an AI infrastructure platform founded in June 2022 by Vipul Ved Prakash, Ce Zhang, Chris Ré, Percy Liang, and Tri Dao, whose individual contributions include FlashAttention, the HELM evaluation benchmark, and the Stanford Center for Research on Foundation Models. Headquartered in San Francisco and backed by $534M+ in funding, the platform gives developers serverless access to 100+ open-source models alongside dedicated clusters, batch inference, fine-tuning, and GPU infrastructure running in Together AI's own data centers in Maryland, Memphis, and Sweden. Cursor, Decagon, and Cartesia are among the production customers running inference on the platform.
Key Capabilities:
Serverless inference API across 100+ open-source models including Llama, Mistral, and DeepSeek
Together Reasoning Clusters for dedicated low-latency inference at up to 110 tokens per second
Batch inference for async workloads scaling to 30 billion tokens per model
Dedicated deployments on purpose-built infrastructure with NVIDIA Blackwell GPU clusters
Instant and reserved GPU cluster access for training and fine-tuning
Full model pre-training and fine-tuning pipelines on custom datasets
GPU infrastructure for generative media workloads covering video, audio, and image models
Together Kernel Collection for performance optimization across cluster deployments
FlashAttention and ThunderKittens research baked into the inference stack
Model evaluation tools and code execution environments
Voice AI and agent infrastructure support
OpenAI-compatible API for drop-in integration
Alternative tools
- Groq Cloud
LPU-powered inference cloud for real-time AI applications.
- Fireworks AI
High-performance inference cloud for open-source models at enterprise scale.
- Replicate
Run open-source AI models through a single API.
