AI Based Paper Discussions

PODCAST · technology

AI Based Paper Discussions

A RSS feed with papers which are von interest in AI Safety Control and Agent Behavior

  1. 15

    Anthrophic Mythos

    Anthrophic Mythos

  2. 14

    Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems

    Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems

  3. 13

    Public Finance in the Age of AI

    Public Finance in the Age of AI

  4. 12

    Neural Steering Vectors Reveal Dose and Exposure-Dependent Impacts of Human-AI Relationships

    Neural Steering Vectors Reveal Dose and Exposure-Dependent Impacts of Human-AI Relationships

  5. 11

    The Generative AI Paradox

    The Generative AI Paradox

  6. 10

    Training Agents to Self-Report Misbehavior

    Training Agents to Self-Report Misbehavior

  7. 9

    Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models

    Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models

  8. 8

    The More You Automate, the Less You See: Hidden Pitfalls of AI Scientist Systems

    The More You Automate, the Less You See: Hidden Pitfalls of AI Scientist Systems

  9. 7

    The Generative AI Paradox: How Synthetic Realities Erode Shared Epistemic Ground

    The Generative AI Paradox: How Synthetic Realities Erode Shared Epistemic Ground

  10. 6

    Evaluating Frontier Models for Stealth and Situational Awareness

    Evaluating Frontier Models for Stealth and Situational Awareness

  11. 5

    Evaluating Frontier Models for Stealth and Situational Awareness

    Evaluating Frontier Models for Stealth and Situational Awareness

  12. 4

    RASP Discovering Interpretable Algorithms by Decompiling Transformers into Human-Readable Programs

    Discovering Interpretable Algorithms by Decompiling Transformers into Human-Readable Programs

  13. 3

    Reducing Harmful Generative AI Outputs via Consensus Sampling

    Reducing Harmful Generative AI Outputs via Consensus Sampling

  14. 2

    AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers

    AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers

  15. 1

    Bridging Skill Gaps for the Future: New Jobs Creation in the AI Age

    The demand and supply of new skills—especially in IT and AI—are reshaping labor markets,impacting wages and hiring. About 1 in 10 job vacancies in advanced economies demands at least one newskill, often appearing first in the United States. The incidence is about half of that in emerging marketeconomies. These skills boost average wages and employment but deepen polarization, mostly benefittinghigh- and—through higher consumption of services—low-skilled workers, and potentially contributing to theshrinking of the middle class. Vacancies demanding AI skills post higher wages, but the diffusion of such skillsis linked to lower employment in occupations with high exposure and low complementarity to AI, posingchallenges for the youth. A Skill Imbalance Index reveals wide cross-country differences. Economies facingstrong demand should prioritize education and reskilling, while those facing strong supply should foster firms’absorption through innovation and access to credit.

  16. 0

    Reasoning Models Struggle to Control their Chains of Thought

    Chain-of-thought (CoT) monitoring is a promising tool for detecting misbehaviorsand understanding the motivations of modern reasoning models. However, ifmodels can control what they verbalize in their CoT, it could undermine CoTmonitorability. To measure this undesirable capability — CoT controllability —we introduce the CoT-Control evaluation suite, which includes tasks that requiremodels to solve problems while adhering to CoT instructions, e.g., reasoning abouta genetics question without using the word “chromosome”. We show that reasoningmodels possess significantly lower CoT controllability than output controllability;for instance, Claude Sonnet 4.5 can control its CoT only 2.7% of the time but 61.9%when controlling its final output. We also find that CoT controllability is higherfor larger models and decreases with more RL training, test-time compute, andincreased problem difficulty. CoT controllability failures extend even to situationsin which models are given incentives (as opposed to direct requests) to evade CoTmonitors, although models exhibit slightly higher controllability when they aretold they are being monitored. Similarly, eliciting controllability by adversariallyoptimizing prompts does not meaningfully increase controllability. Our resultsleave us cautiously optimistic that CoT controllability is currently unlikely tobe a failure mode of CoT monitorability. However, the mechanism behind lowcontrollability is not well understood. Given its importance for maintaining CoTmonitorability, we recommend that frontier labs track CoT controllability in futuremodels.

Type above to search every episode's transcript for a word or phrase. Matches are scoped to this podcast.

Searching…

We're indexing this podcast's transcripts for the first time — this can take a minute or two. We'll show results as soon as they're ready.

No matches for "" in this podcast's transcripts.

Showing of matches

No topics indexed yet for this podcast.

Loading reviews...

ABOUT THIS SHOW

A RSS feed with papers which are von interest in AI Safety Control and Agent Behavior

HOSTED BY

Sigurd

CATEGORIES

URL copied to clipboard!