EPISODE · Apr 21, 2026 · 5 MIN
AI Inference Costs Are Crushing SaaS Gross Margins — Here's What to Do About It
from SaaS Metrics School · host Ben Murray
Is your AI SaaS company skating on thin ice because of exploding compute costs you're not tracking? In episode #365, Ben Murray tackles one of the most pressing financial challenges facing AI-first SaaS companies: the structural margin compression caused by LLM inference costs. Traditional SaaS was built on near-zero marginal cost per customer — that era is over. If you're building on top of AI, every prompt, query, and agentic workflow is a hard COGS line that scales with revenue, and if you're not managing it, it will quietly destroy your unit economics. Why AI-first SaaS companies are running 50–60% gross margins (vs. 70–80% for legacy SaaS) — and what Bessemer data shows about AI supernovas with margins as low as 25%. How inference and compute costs differ fundamentally from traditional SaaS COGS — and why they won't scale down the way hosting costs did Why token costs vary wildly (from $1–2 per million to $30–180+ for frontier models) and how that variability makes feature-level economics a CFO priority 5 tactical ways to reduce LLM spend: model routing, prompt caching, context compaction, semantic caching, and batch processing How to set up your GL accounts and COGS tracking to allocate inference costs by feature — so you actually understand the economics of what you've built Tune in before your next board meeting — because if you're not tracking AI inference costs at the feature level, you're flying blind on your most important unit economics. Resources Mentioned The SaaS CFO: https://www.thesaascfo.com/ Ray Rike — AI to ROI Newsletter: https://ai2roi.substack.com/ Tomas Tunguz: https://tomtunguz.com/ Fungies.io — 5 Ways to Save on LLM Costs: https://fungies.io
NOW PLAYING
AI Inference Costs Are Crushing SaaS Gross Margins — Here's What to Do About It
No transcript for this episode yet
Similar Episodes
Jun 4, 2025 ·9m
May 21, 2025 ·11m