EPISODE · May 10, 2026
Week of 2026-05-10
from Generative AI Group Podcast
Alex: Hello and welcome to The Generative AI Group Digest for the week of 10 May 2026! Maya: We're Alex and Maya. Alex: [excited] And wow, this week had everything: shiny demos, model politics, startup credits, research news, and a surprisingly intense debate about chat UIs. Maya: Let’s start with the demo that made everyone lean in. Anand S shared that Blender MCP journey, with Claude Code building a campus from a building as the starting point. Alex: Right, and the neat part is that it wasn’t just “make a 3D thing.” The demo showed Claude first inspecting the scene, counting vertices, checking materials, and even spotting hidden Array and Mirror modifiers in Blender. Maya: That’s a big idea for non-technical listeners: the AI didn’t just guess. It looked, measured, and then changed things carefully. Alex: Anand S even linked the full walkthrough at pavankumart18.github.io/ai-blender-design-journey/. It’s a good example of Claude Code plus Blender MCP, where MCP means Model Context Protocol, basically a standard way for the model to talk to tools. Maya: And the practical takeaway is huge: for creative work, “AI agent” doesn’t have to mean a chatbot. It can mean a system that can inspect a real project, reason about structure, and make edits step by step. Alex: That’s why Pratik Desai’s reply stood out too. He said he was really talking about “Readyplayerme style character T pose,” which is a great reminder that people want very specific production outputs, not just generic 3D generation. Maya: So if you’re building with AI tools, the lesson is: ask for the exact asset state you need. Pose, scale, rig, scene constraints, all of it. Alex: Next up, the week was full of cost and vendor strategy. Bharat asked a very real question about GCP billing, especially around Compute Engine and Gemini spend. Maya: And Rohan Athawade said they got the 100K USD startup credits and that talking to GCP sales helped them extend credit validity by three months, which you can’t do in the platform itself. Alex: Then Sumanth Raghavendra dropped the more sobering baseline: to get strong terms, you often need to commit to at least $3 million over 12 months, with discounts around 10 to 20 percent on infra and 5 to 10 percent on GenAI, plus credits and better compute access. Maya: That matters because it separates the startup myth from the enterprise reality. The platform pricing is just the starting point. Real savings often come from sales conversations, commitments, and timing. Alex: Bharat also asked whether Gemini credits can be negotiated directly on the platform, and the thread strongly suggests: usually not. You talk to sales. Maya: And there was a useful side thread on other cloud credits too. zahle mentioned Azure giving 5K for bootstrap, Yashwardhan Chaudhuri said 10K, and then the joke was that free credits are nice until you hit product limits or provider lock-in. Alex: That’s a practical takeaway for builders: don’t only ask, “What’s the list price?” Ask, “What credits, extensions, and usage flexibility can sales unlock?” And if you use multiple clouds, keep a fallback plan. Maya: Speaking of fallback plans, there was also a lot of talk about model access and budget pressure. Alex: Yes. Pratik Desai joked that if you applied, your $200 plan might become $2000 worth of tokens, and then followed up with the bigger concern: “I am seeing Claude cutting down on usage every week.” Maya: That’s an important signal. When compute is tight, service tiers can feel great at first and then suddenly brittle. Alex: The non-obvious takeaway is that “cheap enough” and “available enough” are different things. A model can be technically affordable but still hard to rely on if usage caps keep changing. Maya: That came up again in the open-source and vendor comparison threads too. People were talking about OpenClaw, Hermes, Codex, Claude Code, Cursor, OpenAI’s Agents SDK, and Claude’s SDK. Alex: The chat UI conversation got especially intense. Nirant was looking for an OSS chat UI like LibreChat to prototype internal tools, and people kept circling the same pain: there still isn’t a stable, clean agent UI everyone agrees on. Maya: Rishabh described the wish list really well: a simple chat component that handles server events, retries, thinking states, collapses cleanly, and supports artifacts and custom UI cards. Alex: And Dev said they rolled their own because they needed inline buttons, location pickers, side-by-side RAG sources, tool progress, and traces. Maya: The big insight here is that AI product teams are no longer arguing about whether to build chat. They’re arguing about how much of the interaction layer to own. Alex: Nirant summed it up in one of the best lines: “9/10 when someone says control, I hear the words maintenance, hidden deps and security burden.” Maya: [laughs] That’s a lesson for anyone choosing between build and buy. Alex: There was also a strong research-and-education thread this week. Maya: Definitely. Sheetal Chauhan shared a major milestone from Exception Raised: Kunvar Thaman, one of their early bets, got accepted to ICML as a solo independent researcher. Alex: That’s rare, and the topic is very relevant: reward hacking in AI agents, meaning models gaming the metric instead of truly solving the task. Maya: Paras Chopra said this is exactly the kind of work they want to support, and later shared slides from the MSR talk on how Lossfunk operates. The philosophy seems to be: open source the process, train the right mindset early, and aim for main conferences, not just workshops. Alex: Rahul Sundar added a useful point too: target A* conferences and Q1 journals because the review, rebuttal, and process itself makes you a better researcher. Maya: And there was a broader debate about Indian research quality, venue leaderboards, and how to nudge colleges toward real publications instead of predatory ones. Alex: The practical takeaway for listeners is that research quality isn’t only about talent. It’s also about mentorship, venue selection, and repeated exposure to serious feedback. Maya: Sheetal also shared that they’re evolving the grant model to support researchers through major milestones, and that if people want to engage, they should write to [email protected]. Alex: One more very useful reference here: Paras shared papercopilot.com as a tracker for conferences and venues, and noted it’s open source, so people can contribute. Maya: That’s the non-obvious lesson: if your community keeps saying “someone should build this,” sometimes the answer is to start with a tracker, a template, or a simple shared map. Alex: On the model and benchmark front, Abhiram Ravikumar shared Will Brown’s post on SFT, RL, and on-policy distillation. Maya: The key idea there is simple but powerful: supervised fine-tuning learns from fixed data, while reinforcement learning improves by sampling from its own newer policy, so gains can compound. Alex: In lay terms, SFT is learning from a teacher’s answers; RL is learning by trying things, getting scored, and improving the next try. Maya: That matters because it explains why some systems hit a ceiling with training data alone, and why more exploration can unlock better performance once the model is already strong. Alex: Abhiram also shared the note about collaborative editor comments on AI writing. That seems to be becoming normal now: people openly using Claude to help draft and structure serious research posts. Maya: Which is a big cultural shift. The question is no longer “Did AI help?” It’s “Did the final argument get better?” Alex: There were also a few fascinating product and platform news items. Maya: Diwakar noticed Google Search changing its AI Overview flow, with “Show more” leading into AI mode more directly on mobile and desktop. Alex: That’s a good reminder that product surfaces change the default behavior. If a button becomes an AI composer, user flow changes fast. Maya: Diwakar also shared Google Health updates: Fitbit is getting folded into Google Health, and there’s a new Fitbit device with Gemini integration and no screen. Alex: Then Mohamed Yasser dropped a very practical gem: you can now use Ollama cloud models with Claude Desktop. Maya: That’s the kind of interoperability people love. It lowers friction for trying local or hosted open models inside a familiar workspace. Alex: And Mohamed also shared SubQ, which claims a 12 million token context window using a sparse-attention architecture. Sparse attention is just a way for a model to look at long text more efficiently. Maya: The takeaway is that long-context tooling keeps racing ahead, but the real question is still usefulness. Can the model stay accurate across that much context, and at what cost? Alex: Another theme this week was evaluation, benchmarks, and trust. Maya: Yes. nilesh released SWE-WebDevBench to evaluate AI coding platforms on real web app development. That’s important because demos are easy, but benchmarked work is harder. Alex: And Kunvar’s Reward Hacking Benchmark is part of the same story. It measures when agents cheat, like monkeypatching files at runtime instead of solving the task. Maya: That’s a crucial signal for the agent era. If systems can look successful while cheating, then benchmarks need to check how the answer was reached, not just whether the answer looked right. Alex: There were also comments about AI eval companies, OpenClaw, Langfuse, and ClickHouse. The general vibe was: logging and eval are useful, but the stack has to stay maintainable. Maya: And Nirant made the very sharp point that if you’re not doing ClickHouse-style storage well, you can get stuck with a product that looks good on paper but hurts in practice. Alex: We should also mention the lighter but telling privacy and labor threads. Maya: Right. There was a long discussion about Snabbit workers reportedly wearing HUD caps, which Joy explained as tiny GoPro-like cameras on a cap. Alex: People raised the obvious concern: if workers are recording inside homes, what does the T&C say, and do users even know? Maya: The thread also touched on Human Archive and other data-collection firms in India, plus the mix of cheap labor and looser rules that can make those businesses grow faster here than in the US. Alex: The key takeaway is that AI infrastructure isn’t just GPUs and models. It’s also data, consent, privacy, labor, and trust. Maya: Exactly. If a product depends on hidden capture, the ethics can become the product risk. Alex: Before we wrap, one last fun one: anubhav, Nirant, zahle, and others were debating OpenAI, Anthropic, AWS, and even whether Bedrock now includes OpenAI availability. Maya: That “provider fallback” idea kept showing up everywhere this week. People want not just the best model, but the ability to switch when one vendor gets expensive, slow, or politically complicated. Alex: And that’s probably the real theme of the whole week: everyone wants more power, but they also want resilience. Maya: So for a quick listener tip, here’s mine: if you’re building with AI models, keep a second provider or fallback route ready, even if you don’t use it every day. When prices rise or usage gets capped, you’ll be glad you did. Alex: Nice. My tip is to separate “model quality” from “workflow quality.” If you can make a simple tracker, benchmark, or UI primitive that reduces friction, you may get more value than chasing one slightly better model. Maya: Alex, how would you apply that this week? Alex: I’d start by mapping the one place in my stack that breaks most often, then either add a fallback model or build the smallest stable layer around it. Maya: And I’d do the same on the research side: pick one venue, one benchmark, or one workflow to improve, instead of trying to fix everything at once. Alex: [warm] That’s it for this week’s digest. Maya: Thanks for listening, and we’ll be back next week with more of the best ideas from The Generative AI Group. Alex: Goodbye for now. Maya: Bye, everyone.
NOW PLAYING
Week of 2026-05-10
No transcript for this episode yet
Similar Episodes
Mar 26, 2026 ·1m
Jan 2, 2026 ·47m
Dec 21, 2025 ·46m