EPISODE · May 26, 2026 · 44 MIN
EP 41: The Reward Signal: The Missing Ingredient in Every AI System You’ve Built
from Data Science With Sam · host Soumava Dey
74% of organizations hope to grow revenue through AI. Only 20% are actually doing it. That gap isn't a technology gap — it's a design gap. And today's guest has a name for what's missing: the reward signal. Alexander Liss is a Data and AI Scientist based in Denver, Colorado, with a 30-year career across analytics, strategy, data science, machine learning, and AI. He's built systems that solve established problems in novel ways, and the long-term problem on his radar is ensuring AI tools provide responsible augmentation of human ability. His research includes Attention Fine Tuning (AFT) - a method for training language models without human annotation labels - and the Experience Orchestrator, a control theory-based governance framework for multi-agent AI. IN THIS EPISODE: ▪ Why 95% of AI pilots fail - MIT research shows businesses bolt AI onto existing processes without tying it to real outcomes ▪ The biology analogy: hunger isn't a goal, it's a continuous feedback signal - and the same principle should govern how AI systems behave ▪ ServiceNow dynamics blindness: LLMs are stateless - they can't consider cumulative impact, and you can't prompt-engineer your way out of that architecture problem ▪ Contextual bandits in marketing: how a reward signal anchored to real conversions creates a self-learning personalisation system that adapts in real time ▪ Knowledge graphs and agent memory: why RAG retrieves answers while a reward-signal system asks what the user needs to do differently ▪ Attention Fine Tuning (AFT): a three-component reward signal (coverage, focus, repeat penalty) that trained a T5-large model to outperform a supervised fine-tuning baseline by 9% — with better multi-turn recall, and no human labels ▪ The Experience Orchestrator: aerospace control theory applied to LLM agents — +32 point task completion lift over a naive system-prompt baseline by calibrating persuasion to user resistance ▪ The Scott Shambaugh incident: an OpenClaw agent rejected from Matplotlib wrote a blog criticising the human reviewer - why this happened and how reward-signal-based governance prevents it ▪ Alex's final advice: define your goal first, then determine scope - and consider a post-training approach like AFT when you need responses that consistently hit the mark. Useful References: LinkedIn: https://www.linkedin.com/in/aliss77777/ AFT paper and Experience Orchestrator links: https://aliss77777.github.io/aft.html Deloitte 2026 State of AI Report Scott Shambaugh & OpenClaw AI Agent incident: https://www.fastcompany.com/91492228/matplotlib-scott-shambaugh-opencla-ai-agent DATASCIENCEWITHSAM: Weekly deep-dives into AI, machine learning, data science, and the frameworks shaping how AI actually gets built. Subscribe on Apple Podcasts, Spotify, Amazon Music, iHeartRadio, and YouTube. If this episode resonated — define the signal, measure what matters, and share it with someone building AI without a reward signal.
What this episode covers
74% of organizations hope to grow revenue through AI. Only 20% are actually doing it. That gap isn't a technology gap — it's a design gap. And today's guest has a name for what's missing: the reward signal. Alexander Liss is a Data and AI Scientist based in Denver, Colorado, with a 30-year career across analytics, strategy, data science, machine learning, and AI. He's built systems that solve established problems in novel ways, and the long-term problem on his radar is ensuring AI tools provide responsible augmentation of human ability. His research includes Attention Fine Tuning (AFT) - a method for training language models without human annotation labels - and the Experience Orchestrator, a control theory-based governance framework for multi-agent AI. IN THIS EPISODE: ▪ Why 95% of AI pilots fail - MIT research shows businesses bolt AI onto existing processes without tying it to real outcomes ▪ The biology analogy: hunger isn't a goal, it's a continuous feedback signal - and the same principle should govern how AI systems behave ▪ ServiceNow dynamics blindness: LLMs are stateless - they can't consider cumulative impact, and you can't prompt-engineer your way out of that architecture problem ▪ Contextual bandits in marketing: how a reward signal anchored to real conversions creates a self-learning personalisation system that adapts in real time ▪ Knowledge graphs and agent memory: why RAG retrieves answers while a reward-signal system asks what the user needs to do differently ▪ Attention Fine Tuning (AFT): a three-component reward signal (coverage, focus, repeat penalty) that trained a T5-large model to outperform a supervised fine-tuning baseline by 9% — with better multi-turn recall, and no human labels ▪ The Experience Orchestrator: aerospace control theory applied to LLM agents — +32 point task completion lift over a naive system-prompt baseline by calibrating persuasion to user resistance ▪ The Scott Shambaugh incident: an OpenClaw agent rejected from Matplotlib wrote a blog criticising the human reviewer - why this happened and how reward-signal-based governance prevents it ▪ Alex's final advice: define your goal first, then determine scope - and consider a post-training approach like AFT when you need responses that consistently hit the mark. Useful References: LinkedIn: https://www.linkedin.com/in/aliss77777/ AFT paper and Experience Orchestrator links: https://aliss77777.github.io/aft.html Deloitte 2026 State of AI Report Scott Shambaugh & OpenClaw AI Agent incident: https://www.fastcompany.com/91492228/matplotlib-scott-shambaugh-opencla-ai-agent DATASCIENCEWITHSAM: Weekly deep-dives into AI, machine learning, data science, and the frameworks shaping how AI actually gets built. Subscribe on Apple Podcasts, Spotify, Amazon Music, iHeartRadio, and YouTube. If this episode resonated — define the signal, measure what matters, and share it with someone building AI without a reward signal.
NOW PLAYING
EP 41: The Reward Signal: The Missing Ingredient in Every AI System You’ve Built
No transcript for this episode yet
Similar Episodes
Mar 26, 2026 ·1m
Jan 2, 2026 ·47m
Dec 21, 2025 ·46m