Why Your AI is Still a Demo: Lessons from Braintrust’s Field CTO episode artwork

EPISODE · May 15, 2026 · 46 MIN

Why Your AI is Still a Demo: Lessons from Braintrust’s Field CTO

from The Neon Show · host Siddhartha Ahluwalia

85% of AI teams will hit a serious production failure this year. The only thing separating them from the 15% who don't? Evals.After nearly two decades of building AI systems at Microsoft, Facebook, and Dropbox, Ameya Bhatawdekar is now Field CTO at Braintrust, the AI observability platform used by Airtable, Notion, Stripe, Dropbox, Vercel, Cloudflare, Lovable, and Replit.We discuss a shift that most teams underestimate. The winners in AI are not just shipping faster. They are building systems that behave predictably, improve continuously, and earn user trust over time. As traditional monitoring breaks down in a probabilistic world, observability now requires learning how an AI system reasons, not just how it performs. This leads to a new paradigm where agents are no longer just executing tasks, but also analyzing and debugging other agents.The episode also traces the evolution of machine learning itself. From feature engineering to deep learning to transformers , each leap increased capability and reduced control. Evaluation is now where control sits.Ameya is clear on one point. Moving fast with weak evaluations feels like velocity, but it compounds into technical debt, unpredictable failures, and ultimately a loss of user trust. The teams that win are the ones that invest early in rigor, especially in understanding context, which is quickly becoming the hardest and most critical layer in AI systems.If you are a founder or engineer moving beyond the demo phase and trying to build durable, high-quality AI systems, this episode will change how you think about shipping.0:00 — Trailer00:55 — What’s Braintrust?05:01 — What agents are shipping today07:54 — What evals look like in practice for Notion & Zapier09:44 — Evals vs Classic monitoring11:33 — Who is the Field CTO?16:35 — What goes wrong when agents fail18:26 — Agents analyzing other agents24:17 — Evals are existential in vibecoding25:52 — Ship fast with weak evals or slow with strong evals?25:41 — What makes enterprises trust an LLM?29:25 — Do AI startups know how good their product is?30:23 — 3 ML systems: Microsoft, Dropbox, Meta36:30 — How the 2017 transformer paper changed everything38:20 — All algorithms are predicting the next word43:40 — What LLMs will do in 1 year-------------India’s talent has built the world’s tech—now it’s time to lead it.This mission goes beyond startups. It’s about shifting the center of gravity in global tech to include the brilliance rising from India.What is Neon Fund?We invest in seed and early-stage founders from India and the diaspora building world-class Enterprise AI companies. We bring capital, conviction, and a community that’s done it before.Subscribe for real founder stories, investor perspectives, economist breakdowns, and a behind-the-scenes look at how we’re doing it all at Neon.-------------Check us out on:Website: https://neon.fund/Instagram: https://www.instagram.com/theneonshoww/LinkedIn: https://www.linkedin.com/company/beneon/Twitter: https://x.com/TheNeonShowwConnect with Siddhartha on:LinkedIn: https://www.linkedin.com/in/siddharthaahluwalia/Twitter: https://x.com/siddharthaa7-------------This video is for informational purposes only. The views expressed are those of the individuals quoted and do not constitute professional advice.Send us Fan Mail

85% of AI teams will hit a serious production failure this year. The only thing separating them from the 15% who don't? Evals. After nearly two decades of building AI systems at Microsoft, Facebook, and Dropbox, Ameya Bhatawdekar is now Field CTO at Braintrust, the AI observability platform used by Airtable, Notion, Stripe, Dropbox, Vercel, Cloudflare, Lovable, and Replit. We discuss a shift that most teams underestimate. The winners in AI are not just shipping faster. They are building syste...

NOW PLAYING

Why Your AI is Still a Demo: Lessons from Braintrust’s Field CTO

0:00 46:30

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Frequently Asked Questions

How long is this episode of The Neon Show?

This episode is 46 minutes long.

When was this The Neon Show episode published?

This episode was published on May 15, 2026.

What is this episode about?

85% of AI teams will hit a serious production failure this year. The only thing separating them from the 15% who don't? Evals.After nearly two decades of building AI systems at Microsoft, Facebook, and Dropbox, Ameya Bhatawdekar is now Field CTO at...

Can I download this The Neon Show episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!