ibl.ai

PODCAST · technology

ibl.ai

ibl.ai is a generative AI education platform based in NYC. This podcast, curated by its CTO, Miguel Amigot, focuses on high-impact trends and reports about AI.

  1. 100

    Microsoft: Education AI Toolkit – A Navigator for Education Institutions to Plan their AI Journey

    Summary of https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/microsoft/bade/documents/products-and-services/en-us/education/Microsoft-Education-AI-Toolkit1.pdf This toolkit from Microsoft provides a comprehensive guide for education institutions to embark on their AI journey. It outlines a five-step implementation process that covers exploration, planning, data preparation, governance, and policy development, emphasizing responsible and ethical AI use. The resource also showcases real-world examples of how various Microsoft AI tools are being integrated globally for student success and institutional innovation, alongside insights into creating effective prompts and accessing professional learning opportunities. Exploration and planning This initial phase involves practical steps for education leaders, engaging the community, and defining the institution's goals for AI implementation. Data and infrastructure prep This step focuses on preparing the necessary data and infrastructure. It includes strengthening governance and policies, breaking down data silos, and implementing security measures to protect sensitive data for students and faculty during AI deployment. Pilot implementation This phase involves offering professional learning opportunities to integrate AI tools into workflows and running a pilot program for the AI tools. Scale and optimize In this stage, institutions introduce AI-driven administration and tools such as Microsoft 365 Copilot Chat, and gather feedback to optimize their use. Evaluate and review The final step involves assessing the impact of AI, monitoring and analyzing its influence on the institution's goals and objectives, and iterating based on the results.

  2. 99

    Nature: Large Language Models Are Proficient in Solving and Creating Emotional Intelligence Tests

    Summary of https://www.nature.com/articles/s44271-025-00258-x Explores the emotional intelligence capabilities of Large Language Models (LLMs), specifically their ability to solve and create emotional intelligence tests. It highlights that several LLMs, including ChatGPT-4, consistently outperformed human averages on various established emotional intelligence assessments. The research also investigated LLMs' capacity to generate new, psychometrically sound test items, finding that these AI-created questions demonstrated comparable difficulty and a strong correlation with original human-designed tests. While some minor differences were observed in clarity, realism, and content diversity, the study ultimately suggests that LLMs can reason accurately about human emotions and their regulation, indicating their potential for use in socio-emotional applications and psychometric development. LLMs demonstrate superior performance in solving emotional intelligence tests compared to humans. Six widely used Large Language Models (LLMs), including ChatGPT-4, ChatGPT-o1, Gemini 1.5 flash, Copilot 365, Claude 3.5 Haiku, and DeepSeek V3, collectively achieved an average accuracy of 81% on five standard emotional intelligence (EI) tests, significantly outperforming the human average of 56% reported in original validation studies. All tested LLMs scored more than one standard deviation above the human mean, with ChatGPT-o1 and DeepSeek V3 exceeding two standard deviations above it. LLMs are proficient at generating new, high-quality emotional intelligence test items. ChatGPT-4 successfully generated new test items (scenarios and response options) for five different ability EI tests, and these new versions demonstrated statistically equivalent test difficulty compared to the original tests when administered to human participants. Importantly, ChatGPT-4 did not simply paraphrase existing items; participants perceived a low level of similarity to any original test scenario in 88% of the newly created scenarios. LLM-generated tests exhibit psychometric properties largely comparable to original human-designed tests, though with some minor differences. While not all psychometric properties (such as perceived item clarity, realism, item content diversity, internal consistency, and correlations with vocabulary or other EI tests) were statistically equivalent between original and ChatGPT-generated versions, any differences observed were small (Cohen’s d less than ±0.25) and none of the 95% confidence interval boundaries exceeded a medium effect size (d ± 0.50). Furthermore, original and ChatGPT-generated tests were strongly correlated (r=0.46), suggesting they measure similar constructs. LLMs show potential for "cognitive empathy" and consistent application of emotional knowledge. The findings support the idea that LLMs can generate responses consistent with accurate knowledge of emotional concepts, emotional situations, and their implications, indicating they fulfill the aspect of cognitive empathy. LLMs offer advantages such as processing emotional scenarios based on extensive datasets, which may lead to fewer errors, and providing consistent emotional knowledge unaffected by human variability like mood, fatigue, or personal preferences. LLMs can significantly aid psychometric test development but cannot fully replace human validation processes. The research highlights that LLMs like ChatGPT can be powerful tools for assisting in the psychometric development of standardized assessments, particularly in the domain of emotion, by generating complete tests with generally acceptable psychometric properties using few prompts. However, the study also notes that while valuable for creating an initial item pool, LLMs cannot replace the necessary pilot and validation studies to refine or eliminate poorly performing items.

  3. 98

    OpenAI: Multi-Agent Portfolio Collaboration with OpenAI Agents SDK

    Summary of https://cookbook.openai.com/examples/agents_sdk/multi-agent-portfolio-collaboration/multi_agent_portfolio_collaboration This guide from OpenAI introduces a multi-agent collaboration system built using the OpenAI Agents SDK, specifically designed for complex tasks like investment research. It demonstrates a "hub-and-spoke" architecture where a central Portfolio Manager agent orchestrates specialized agents (Macro, Fundamental, Quantitative) as callable tools. The system leverages various tool types, including custom Python functions, managed OpenAI tools like Code Interpreter and WebSearch, and external MCP servers, to provide deep, high-quality analysis and scalable workflows. The document emphasizes modularity, parallelism, and auditability through structured prompts and tracing, offering a blueprint for building robust, expert-collaborative AI systems. Multi-Agent Collaboration is Essential for Complex Tasks The core concept is that multiple autonomous LLM agents can coordinate to achieve overarching goals that would be difficult for a single agent to handle. This approach is particularly useful for complex systems, such as financial analysis, where different specialist agents (e.g., Macro, Fundamental, Quantitative) can each handle a specific subtask or expertise area. The "Agent as a Tool" Pattern is Highly Effective This guide specifically highlights and uses the "agent as a tool" collaboration model. In this pattern, a central agent (the Portfolio Manager) orchestrates the workflow by calling other specialist agents as if they were tools for specific subtasks. This design maintains a single thread of control, simplifies coordination, ensures transparency, and allows for parallel execution of sub-tasks, which is ideal for complex analyses. Modular Design Fosters Specialization, Parallelism, and Maintainability Breaking down a complex problem into specialized agents, each with a clear role, leads to deeper, higher-quality research because each agent can focus on its domain with the right tools and prompts. This modularity also makes the system easier to update, test, or improve without affecting other components, and allows independent agents to work concurrently, dramatically reducing task completion time. Flexible Integration of Diverse Tool Types Enhances Agent Capabilities The OpenAI Agents SDK provides significant flexibility in defining and using various tool types. Agents can leverage custom Python functions for domain-specific logic, managed tools like Code Interpreter (for quantitative analysis) and WebSearch (for real-time information), and external MCP (Model Context Protocol) servers for standardized access to external data sources like Yahoo Finance. Structured Orchestration and Observability are Crucial for Robust Systems The Head Portfolio Manager agent's system prompt is central to the workflow, encoding the firm's philosophy, clear tool usage rules, and a multi-step process. This ensures consistent, auditable, and high-quality outputs. Furthermore, OpenAI Traces provide detailed visibility into every agent and tool call, allowing for real-time monitoring, debugging, and full transparency of the workflow.

  4. 97

    BCG: AI-First Companies Win the Future

    Summary of https://media-publications.bcg.com/BCG-Executive-Perspectives-AI-First-Companies-Win-the-Future-Issue1-10June2025.pdf This Boston Consulting Group (BCG) Executive Perspectives document, from June 2025, addresses how companies can become "AI-first" to achieve future success. It explains that the democratization of AI, shifting business economics, and the ability of AI-native firms to scale rapidly with lean teams necessitate this transformation. The report details five key characteristics of an AI-first organization: a wider competitive moat, a reshaped profit and loss (P&L) model, a decentralized tech foundation, an AI-first operating model, and specialized, scalable talent. It also provides five actionable steps for executives to begin their AI transformation journey, emphasizing a business-led AI agenda and the importance of demonstrating measurable impact. Wider Competitive Moat Companies will increase their ability to capitalize on key assets such as brand, intellectual property (IP), and talent. Brand trust, direct relationships with customers, ownership of innovations (including patents, trademarks, and copyrights), and exclusive, high-quality data sets become crucial as AI democratizes access and commoditizes content and advice. Reshaped P&L Model There will be high technology spending to support AI, with the value unlocked from efficiencies being reinvested. This involves a significant increase in tech spending (estimated 25-45%) and a decline in labor spending as AI reduces reliance on human-driven processes, ultimately boosting operating margins by redeploying value into growth priorities. Decentralized Tech Foundation Business units will be empowered to lead AI adoption and deploy AI solutions with increased speed and independence, while IT provides and maintains enterprise-wide AI platforms, agent ecosystems, and the overall tech, data, and cyber foundation. AI-First Operating Model Organizations will streamline their operations through reusable AI workflows and reduced duplication. This model shifts from traditional, people-centric processes supplemented by digital tools to processes built around AI agents, with human oversight for gap closure. This leads to flattened hierarchies, real-time governance, and an AI-embracing culture. Specialized, Scalable Talent Companies will develop lean, high-performing teams with specialized skills, focusing roles on judgment, strategy, and human-AI collaboration. AI will automate routine tasks, reshaping roles and potentially reducing headcount, while increasing productivity for top performers and intensifying the competition for skilled AI-fluent talent who will command a premium.

  5. 96

    McKinsey: Seizing the Agentic AI Advantage – A CEO Playbook

    Summary of https://www.mckinsey.com/~/media/mckinsey/business%20functions/quantumblack/our%20insights/seizing%20the%20agentic%20ai%20advantage/seizing-the-agentic-ai-advantage.pdf McKinsey & Company report, "Seizing the Agentic AI Advantage," examines the current "gen AI paradox," where widespread adoption of generative AI has led to minimal organizational impact. The authors explain that AI agents, which are autonomous and goal-driven, can overcome this paradox by transforming complex business processes beyond simple task automation. The report outlines a strategic shift required for CEOs to implement agentic AI effectively, emphasizing the need to move from scattered experiments to integrated, large-scale transformations. This includes reimagining workflows around agents, establishing a new agentic AI mesh architecture, and addressing the human and governance challenges associated with deploying autonomous AI. Ultimately, the text argues that successful adoption of agentic AI will redefine how organizations operate, compete, and create value. The Generative AI Paradox: Despite widespread adoption, nearly eight in ten companies using generative AI (gen AI) report no significant bottom-line impact. This "gen AI paradox" stems from an imbalance where easily scaled "horizontal" enterprise-wide tools (like copilots and chatbots) provide diffuse, hard-to-measure gains, while more transformative "vertical" (function-specific) use cases remain largely stuck in pilot mode. Agentic AI as the Catalyst: AI agents offer a way to overcome this paradox by automating complex business processes. Unlike reactive gen AI tools, agents combine autonomy, planning, memory, and integration to become proactive, goal-driven virtual collaborators, unlocking potential far beyond mere efficiency gains. Reinventing Workflows is Crucial: Realizing the full potential of agentic AI requires more than simply plugging agents into existing workflows; it necessitates reimagining and redesigning those workflows from the ground up, with agents at the core. This involves reordering steps, reallocating responsibilities between humans and agents, and leveraging agents' strengths like parallel execution and real-time adaptability for transformative impact. New Architecture and Enablers for Scale: To effectively scale agents, organizations need a new AI architecture paradigm called the "agentic AI mesh". This composable, distributed, and vendor-agnostic framework enables agents to collaborate securely across systems while managing risks like uncontrolled autonomy and sprawl. Additionally, scaling requires critical enablers such as upskilling the workforce, adapting technology infrastructure, accelerating data productization, and deploying agent-specific governance mechanisms. The CEO's Mandate and Human Challenge: The primary challenge in scaling agentic AI is not technical but human: earning trust, driving adoption, and establishing proper governance for autonomous systems. CEOs must lead this transformation by concluding the experimentation phase, realigning AI priorities with strategic programs, redesigning AI governance, and launching high-impact agent-driven projects to redefine how their organizations operate.

  6. 95

    LEGO/The Alan Turing Institute: Understanding the Impacts of Generative AI Use on Children

    Summary of https://www.turing.ac.uk/sites/default/files/2025-05/combined_briefing_-_understanding_the_impacts_of_generative_ai_use_on_children.pdf Presents the findings of a research project on the impacts of generative AI on children, combining both quantitative survey data from children, parents, and teachers with qualitative insights gathered from school workshops. The research, guided by a framework focusing on children's wellbeing, explores how children use generative AI for activities like creativity and learning. Key findings indicate that nearly a quarter of children aged 8-12 have used generative AI, primarily ChatGPT, with usage varying by factors such as age, gender, and educational needs. The document also highlights parent, carer, and teacher concerns regarding potential exposure to inappropriate content and the impact on critical thinking skills, while noting that teachers are generally more optimistic about their own use of the technology than its use by students. The research concludes with recommendations for policymakers and industry to promote child-centered AI development, improve AI literacy, address bias, ensure equitable access, and mitigate environmental impacts. Despite a general lack of research specifically focused on the impacts of generative AI on children, and the fact that these tools have often not been developed with children's interests, needs, or rights in mind, a significant number of children aged 8-12 are already using generative AI, with ChatGPT being the most frequently used tool. The patterns of generative AI use among children vary notably based on age, gender, and additional learning needs. Furthermore, there is a clear disparity in usage rates between children in private schools (52% usage) and those in state schools (18% usage), indicating a potential widening of the digital divide. There are several significant concerns shared by children, parents, carers, and teachers regarding generative AI, including the risk of children being exposed to inappropriate or inaccurate information (cited by 82% and 77% of parents, respectively), worries about the negative impact on children's critical thinking skills (shared by 76% of parents/carers and 72% of teachers), concerns about environmental impacts, potential bias in outputs, and teachers reporting students submitting AI-generated work as their own. Despite concerns, the research highlights potential benefits of generative AI, particularly its potential to support children with additional learning needs, an area children and teachers both support for future development. Teachers who use generative AI also report positive impacts on their own work, including increased productivity and improved performance on teaching tasks. To address the risks and realize the benefits, the sources emphasize the critical need for child-centred AI design, meaningful participation of children and young people in decision-making processes, improving AI literacy for children, parents, and teachers, and ensuring equitable access to both the tools and educational resources about them.

  7. 94

    OpenAI: Disrupting Malicious Uses of AI – June 2025

    Summary of https://cdn.openai.com/threat-intelligence-reports/5f73af09-a3a3-4a55-992e-069237681620/disrupting-malicious-uses-of-ai-june-2025.pdf Report detailing OpenAI's efforts to identify and counter various abusive activities leveraging their AI models. It presents ten distinct case studies of disrupted operations, including deceptive employment schemes, covert influence operations, cyberattacks, and scams. The report highlights how threat actors, often originating from China, Russia, Iran, Cambodia, and the Philippines, utilized AI for tasks ranging from generating social media content and deceptive resumes to developing malware and social engineering tactics. OpenAI emphasizes that their use of AI to detect these activities has paradoxically increased visibility into malicious workflows, allowing for quicker disruption and sharing of insights with industry partners. OpenAI's mission is to ensure that artificial general intelligence (AGI) benefits all of humanity by deploying AI tools to solve difficult problems and defend against various abuses. This includes preventing AI use by authoritarian regimes, and combating covert influence operations (IO), child exploitation, scams, spam, and malicious cyber activity. OpenAI has successfully detected, disrupted, and exposed a range of abusive activities by leveraging AI as a force multiplier for their expert investigative teams. These malicious uses of AI include social engineering, cyber espionage, deceptive employment schemes (like the "IT Workers" case), covert influence operations (such as "Sneer Review," "High Five," "VAGue Focus," "Helgoland Bite," "Uncle Spam," and "STORM-2035"), cyber operations ("ScopeCreep," "Vixen," and "Keyhole Panda"), and scams (like "Wrong Number"). These malicious operations originated from various global locations, demonstrating a widespread threatscape. Four of the ten cases in the report likely originated from China, spanning social engineering, covert influence operations, and cyber threats. Other disruptions involved activities from Cambodia (task scam), the Philippines (comment spamming), and covert influence attempts potentially linked with Russia and Iran. Additionally, deceptive employment schemes showed behaviors consistent with North Korea (DPRK)-linked activity. Threat actors utilized AI to evolve and scale their operations, yet this reliance also increased their exposure and aided in their disruption. For example, AI was used for automating resume creation, generating social media content, translating messages for social engineering, and developing malware. Paradoxically, this integration of AI into their workflows provided OpenAI with insights, enabling quicker identification and disruption of these threats. AI investigations are an evolving discipline, and ongoing disruptions help refine defenses and contribute to a broader understanding of the AI threatscape. OpenAI emphasizes that each disrupted operation improves their understanding of how threat actors abuse their models, allowing them to refine their defenses and share findings with industry peers and authorities to strengthen collective defenses across the internet.

  8. 93

    Oakland University: The Memory Paradox –Why Our Brains Need Knowledge in an Age of AI

    Summary of https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5250447 Argues that human memory remains crucial even in the age of AI. It explores the neuroscience behind learning, detailing how the brain utilizes declarative and procedural memory systems and organizes knowledge into schemata and neural manifolds. The authors propose that cognitive offloading to digital tools, while seemingly efficient, can undermine these internal cognitive processes, potentially contributing to phenomena like the reversal of the Flynn Effect. They advocate for educational approaches that balance technology use with the active internalization of knowledge, suggesting that understanding the brain's natural learning mechanisms is key to designing effective education in the digital age. The central "Memory Paradox" is that in the age of generative AI and ubiquitous digital tools, increasing reliance on external aids to store or handle information can weaken human cognitive capacities by reducing the exercise of internal memory systems. Neuroscience explains that developing deep understanding, fluency, and intuition requires internalizing knowledge through repeated practice, allowing information to transition from the declarative memory system (facts and concepts) to the procedural memory system (skills and routines); excessive reliance on external tools prevents this crucial "proceduralization". Building robust internal mental frameworks, known as schemata, which are supported by optimized neural patterns called neural manifolds, is essential for organizing knowledge, enabling efficient thinking, detecting errors, and supporting critical thinking and creativity; constantly looking information up hinders the formation of these internal structures. Shifts in educational practices away from emphasizing memorization and explicit content instruction, coinciding with the rise of digital tools and cognitive offloading, are linked to the recent reversal of the Flynn Effect—the decline in IQ scores observed in developed countries—suggesting societal-level consequences for cognitive performance when internal memory is devalued. Effective learning in the digital age requires balancing the use of external technology to support internal cognitive work rather than replacing it. Strategies should promote active engagement, structured practice, memorization of foundational knowledge, and utilizing tools that encourage the brain's natural learning mechanisms like prediction error detection and schema formation.

  9. 92

    Pearson: Asking to Learn – What Student Queries to Generative AI Reveal About Cognitive Engagement

    Summary of https://plc.pearson.com/sites/pearson-corp/files/asking-to-learn.pdf Analyzing student queries to an AI-powered study tool reveals that while many questions focus on basic factual and conceptual knowledge, a significant portion demonstrates higher-order thinking skills, suggesting the tool can support deeper learning. Insights from this study are being used to develop features that encourage students to ask more complex questions. The authors emphasize that meaningfully integrating AI tools into learning can foster a richer, more active educational experience. A large-scale study analyzed 128,725 student queries from 8,681 unique users interacting with the "Explain" feature of an AI-powered study tool embedded in an eTextbook. The analysis focused on the open-ended nature of the Explain feature queries as insights into student thought processes. Using Bloom's Taxonomy, the analysis found that 80% of student inputs related to basic Factual or Conceptual knowledge, such as definitions or understanding connections. This aligns with the introductory biology course context. However, the data also showed that about one-third of inputs reflected more advanced cognitive complexity, and 20% were at levels suggesting higher-order thinking skills (Analyze and above), indicating potential for deeper learning beyond basic recall. The presence of higher-level queries suggests that many students are actively framing their inquiries rather than passively seeking information, pointing to the tool's potential to foster more advanced cognitive skills when thoughtfully integrated. Insights from the analysis have directly informed the development of a new "Go Deeper" feature which suggests follow-up questions targeting higher cognitive levels to encourage deeper engagement.

  10. 91

    Apple: The Illusion of Thinking – Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

    Summary of https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf Explores the capabilities and limitations of Large Reasoning Models (LRMs), which generate detailed thinking processes, compared to standard Large Language Models (LLMs). The authors use controllable puzzle environments like Tower of Hanoi and River Crossing to systematically evaluate performance as complexity increases. Findings indicate that LRMs outperform LLMs on medium-complexity tasks but both struggle and eventually fail at high complexities. Surprisingly, LRMs show a decrease in reasoning effort (measured by tokens) as problems become extremely difficult, and they exhibit limitations in executing precise algorithmic steps. Current Large Reasoning Models (LRMs) face a complete accuracy collapse beyond certain complexity levelswhen evaluated using controllable puzzle environments. This study found three distinct performance regimesbased on problem complexity: standard LLMs perform better at low complexity, LRMs show an advantage at medium complexity, and both types of models fail at high complexity. LRMs exhibit a counter-intuitive scaling limit in their reasoning effort (measured by inference thinking tokens) relative to problem complexity. While reasoning effort initially increases with complexity, it declines as problems approach the complexity threshold where accuracy collapses, even when ample token budget is available. Analysis of the intermediate reasoning traces ("thoughts") reveals complexity-dependent reasoning patterns. For simple problems, LRMs often find correct solutions early but continue exploring incorrect alternatives, a phenomenon termed "overthinking". At moderate complexity, correct solutions tend to emerge later in the thinking process, after exploring incorrect paths. Beyond a certain high complexity threshold, models fail to generate any correct solutions within their thought process. The research questions the reliance on established mathematical and coding benchmarks for evaluating LRMs, noting issues like data contamination and lack of insight into reasoning traces. Controllable puzzle environments were adopted to allow for systematic variation of complexity while maintaining consistent logical structures and enabling detailed analysis of solutions and internal reasoning. Surprising limitations were uncovered in LRMs' ability to perform exact computation and follow explicit algorithms. For instance, providing the solution algorithm for the Tower of Hanoi puzzle did not improve performance or prevent the accuracy collapse. Models also demonstrated inconsistent reasoning, succeeding on some puzzles with higher move counts (like Tower of Hanoi with N=5 requiring 31 moves) but failing much earlier in others with lower required move counts (like River Crossing with N=3 having an 11-move solution).

  11. 90

    OpenAI: A Practical Guide to Building Agents

    Summary of https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf Practical guide explains that agents are advanced systems utilizing large language models (LLMs) to independently perform multi-step workflows by leveraging tools. It identifies suitable applications for agents in scenarios involving complex decisions, unstructured data, or unwieldy rule-based systems, emphasizing that simpler LLM applications are not considered agents. The document outlines the fundamental components of an agent as an LLM model, external tools for interaction, and explicit instructions. It also explores orchestration patterns, from single-agent systems to more complex multi-agent architectures, and stresses the importance of robust guardrails and planning for human intervention to ensure safe and reliable agent operation. Agents are LLM-powered systems capable of independently accomplishing complex, multi-step tasks by managing workflow execution and leveraging tools to interact with external systems. Agents are particularly well-suited for workflows involving complex decision-making, difficult-to-maintain rules, or heavy reliance on unstructured data, where traditional automation methods encounter friction. The foundational components of an agent include the Model (the LLM for reasoning), Tools (external functions/APIs to take action), and Instructions (explicit guidelines for behavior). Agent orchestration can follow Single-agent systems (using tools within a loop) or Multi-agent systems(coordinating specialized agents via a manager or peer-to-peer handoffs), often starting with a single agent and scaling up as complexity requires. Implementing Guardrails (such as relevance/safety classifiers and tool safeguards) and planning for Human Intervention (for failures or high-risk actions) are critical to ensure agents operate safely, predictably, and reliably.

  12. 89

    Vanderbilt University: The AI Labor Playbook – How to Build, Lead, and Scale Generative AI and AI Agents in Your Organization

    Summary of https://www.gaiin.org/the-ai-labor-playbook Advocates a fundamental shift in how organizations view and utilize generative AI, proposing it be treated as a new form of labor rather than simply a tool. The author argues that success hinges on a conceptual change: recognizing AI as a workforce to be led and scaled, emphasizing the importance of strategic labor planning over mere technology procurement. A core concept introduced is the "labor-to-token exchange," where prompts represent tasks delegated to AI and tokens are the units of work and cost. The paper stresses the need to train all employees to effectively lead AI labor through natural language chat interfaces, which are presented as the primary marketplace for this new workforce. Finally, it highlights that organizational architecture and strategy should prioritize modular, open systems to ensure access to the best AI labor at competitive costs, ultimately aiming to amplify human capability and drive innovation rather than focusing solely on cost reduction. AI is labor, not software. Organizations should shift from thinking about AI as a tool or product to procure, and instead treat it as a workforce or labor to be led, developed, and scaled. Prompts are tasks assigned to this AI labor market, and AI models are programmable workers that require oversight, guidance, and leadership. Labor-to-token exchanges are fundamental. These exchanges convert traditionally human tasks into interactions with generative AI systems, measured and priced in tokens. This transforms labor into a fluid, scalable, and programmable form, enabling tasks previously not possible for computers, especially cognitive ones, to be delegated through natural language. The cost of an exchange is measured by the input and output tokens. AI labor amplifies human potential, rather than replacing it. The primary strategic shift is recognizing that this transformation is about doing more, doing new things, and unlocking latent capacity for innovation, not just cutting costs or headcount. Humans remain essential as orchestrators, supervisors, and integrators of AI labor, providing the creativity, ethical reasoning, and context that AI cannot replicate. The goal is to empower humans to amplify their thinking and enhance the enjoyment of their work. Effective deployment requires strategic architectural and cultural changes. A major barrier is that directing AI labor is a new skill requiring training in communication, problem-solving, and system design. Organizations must avoid vendor lock-in and siloed AI within tools; instead, they should build open, modular systems, decoupling the AI labor interface (enterprise chat), the reasoning engine, the system integration (APIs), and the supervisory layer. Enterprise chat emerges as a crucial interface for accessing and assigning tasks to AI labor using natural language. AI labor strategy must focus on empowering the workforce. The greatest returns come from distributing AI widely and training everyone to lead it effectively. Success requires overcoming fear and misunderstanding, creating champions, building learning into daily work, normalizing exploration, and emphasizing conversation and persistence. Teaching how to collaborate with AI labor, including prompt engineering and problem decomposition, is the new digital literacy essential for unlocking scale, creativity, and agility.

  13. 88

    OpenAI: AI in the Enterprise

    Summary of https://cdn.openai.com/business-guides-and-resources/ai-in-the-enterprise.pdf Outlines OpenAI's approach to enterprise AI adoption, focusing on practical lessons learned from working with seven "frontier" companies. It highlights three key areas where AI delivers measurable improvements: enhancing workforce performance, automating routine tasks, and powering products with more relevant customer experiences. The text emphasizes an iterative development process and an experimental mindset for successful AI integration, detailing seven essential strategies such as starting with rigorous evaluations, embedding AI into products, investing early, customizing models, empowering experts, unblocking developers, and setting ambitious automation goals, all while ensuring data security and privacy are paramount. Embrace an iterative and experimental approach: Successful companies treat AI as a new paradigm, adopting an iterative development approach to learn quickly, improve performance and safety, and get to value faster with greater buy-in. An open, experimental mindset is key, supported by rigorous evaluations and safety guardrails. Start early and invest for compounding benefits: Begin AI adoption now and invest early because the value compounds through continuous testing, refinement, and iterative improvements. Encouraging organization-wide familiarity and broad adoption helps companies move faster and launch initiatives more efficiently. Prioritize strategic implementation with evaluations: Instead of broadly injecting AI, start with systematic evaluations to measure how models perform against specific use cases, ensuring quality and safety. Align implementation around high-return opportunities such as improving workforce performance, automating routine operations, or powering products. Customize models and empower experts: Investing in customizing and fine-tuning AI models to specific data and needs can dramatically increase value, improve accuracy, relevance, and consistency. Getting AI into the hands of employees who are closest to the processes and problems is often the most powerful way to find AI-driven solutions. Set bold automation goals and unblock developers: Aim high by setting bold automation goals to free people from repetitive tasks so they can focus on high-impact work. Unblock developer resources, which are often a bottleneck, by accelerating AI application builds through platforms or automating aspects of the software development lifecycle.

  14. 87

    Microsoft Research: Shifting Work Patterns with Generative AI

    Summary of https://arxiv.org/pdf/2504.11436 Details a large-scale randomized experiment involving over 7,000 knowledge workers across multiple industries to study the impact of a generative AI tool integrated into their workflow. The researchers measured changes in work patterns over six months by comparing workers who received access to the AI tool with a control group. Key findings indicate that the AI tool primarily influenced individual behaviors, significantly reducing time spent on email and moderately speeding up document completion, while showing no significant effect on collaborative activities like meeting time. The study highlights that while AI adoption can lead to noticeable shifts in personal work habits, broader changes in job responsibilities and coordinated tasks may require more systemic organizational adjustments and widespread tool adoption. A 6-month, cross-industry randomized field experiment involving 7,137 knowledge workers from 66 large firms studied the impact of access to Microsoft 365 Copilot, a generative AI tool integrated into commonly used applications like email, document creation, and meetings. Workers who used the AI tool regularly spent 3.6 fewer hours per week on email, a 31% reduction from their pre-period average. Intent-to-treat estimates showed a 1.3 hour reduction per week. This time saving condensed email work, opening up almost 4 hours per week of concentration time and reducing out-of-hours email activity for regular users. While there was suggestive evidence that users completed documents moderately faster (5-25% faster for regular users), especially collaborative documents, there was no significant change in time spent in meetings or the types of meetings attended. There was also no change in the number of documents authored by the primary editor. The observed changes primarily impacted behaviors workers could change independently, such as managing their own email inbox. Behaviors requiring coordination with colleagues or significant organizational changes, like meeting duration or reassigning document responsibilities, did not change significantly. This suggests that in the early adoption phase, individual exploration and time savings on solitary tasks were more common than large-scale workflow transformations. Copilot usage intensity varied widely across workers and firms, but firm-specific differences were the strongest predictor of usage, explaining more variation than industry differences, pre-experiment individual behavior, or the share of coworkers with access to Copilot.

  15. 86

    Springer: Why AI Will Not Democratize Education – A Critical Pragmatist Perspective

    Summary of https://link.springer.com/article/10.1007/s13347-025-00883-8 This academic paper argues from a Deweyan perspective that artificial intelligence (AI), particularly in its current commercial Intelligent Tutoring System form, is unlikely to democratize education. The author posits that while proponents focus on AI's potential to increase access to quality education, a truly democratic education, as defined by John Dewey, requires cultivating skills for democratic living, providing experience in communication and cooperation, and allowing for student participation in shaping their education. The paper suggests that the emphasis on individualization, mastery of curriculum, and automation of teacher tasks in current educational AI tools hinders the development of these crucial democratic aspects, advocating instead for public development of AI that augments teachers' capabilities and fosters collaborative learning experiences. The paper argues that current commercial AI, especially Intelligent Tutoring Systems (ITS), is likely to negatively impact democratic education based on John Dewey's philosophy. A Deweyan understanding of democratic education involves preparing students for democratic living, incorporating democratic practices, democratic governance, and ensuring equal access. The paper contrasts this with a narrow view often used by AI proponents, which primarily focuses on increasing access to quality education. Current commercial educational AI tools are characterized by an emphasis on the individualization of learning, a narrow focus on the mastery of the curriculum, and the automation of teachers' tasks. These characteristics are seen as obstacles to democratic education because they can deprive children of experiences in democratic living, hinder the acquisition of communicative and collaborative skills, habituate them to environments with little control, and reduce opportunities for intersubjective deliberation and experiencing social differences. Increased reliance on AI from private companies also poses a threat by reducing public influence and democratic governance over education and creating environments where students have little say. While current AI poses challenges, the author suggests alternative approaches like using AI to augment teachers or for simulations could better serve democratic goals.

  16. 85

    McKinsey: Open Source Technology in the Age of AI

    Summary of https://www.mckinsey.com/~/media/mckinsey/business%20functions/quantumblack/our%20insights/open%20source%20technology%20in%20the%20age%20of%20ai/open-source-technology-in-the-age-of-ai_final.pdf Based on a survey of technology leaders and senior developers, the document explores the increasing adoption of open source solutions within AI technology stacks across various industries and geographies. It highlights that over half of respondents utilize open source AI in data, models, and tools, driven by benefits like performance, ease of use, and lower costs compared to proprietary alternatives. However, the report also acknowledges perceived risks associated with open source AI, including cybersecurity, regulatory compliance, and intellectual property concerns, and discusses the safeguards organizations are implementing to mitigate these issues. Ultimately, the survey indicates a strong expectation for continued growth in the use of open source AI technologies, often in conjunction with proprietary solutions. Open source AI is widely adopted and its use is expected to grow, with over 50 percent of respondents using it in data, models, and tools areas of the tech stack. Seventy-five percent of respondents anticipate increasing their use of open source AI technologies in the next few years. Key benefits driving the adoption of open source AI include lower implementation costs (60 percent of respondents) and lower maintenance costs (46 percent) compared to proprietary tools. Performance and ease of use are also top reasons for satisfaction. Developers value experience with open source tools for their careers and job satisfaction. Despite the benefits, organizations perceive higher risks with open source AI, particularly regarding cybersecurity (62 percent of respondents), regulatory compliance (54 percent), and intellectual property (50 percent). Organizations are implementing safeguards like guardrails and third-party evaluations to manage these risks. Organizations show a preference for partially open models (models with open weights but potentially non-OSI-approved licenses or limited data), which may be influenced by the performance of such models and the ability to self-host them for better data privacy and control. The AI technology landscape is evolving towards a hybrid approach, with most organizations open to using a mixture of open source and proprietary solutions across their tech stack. Popular open source tools are often developed by large technology companies like Meta (Llama) and Google (Gemma).

  17. 84

    BCG: AI Agents, and the Model Context Protocol

    Summary of https://www.scribd.com/document/855023851/BCG-AI-Agent-Report-1745757269 Outlines the evolution of AI Agents from simple applications to increasingly autonomous systems. It highlights the growing adoption of Anthropic's open-source Model Context Protocol (MCP) by major technology companies as a key factor in enhancing AI Agent reliability and safety. The document underscores the need for continued progress in AI's reasoning, integration, and social understanding capabilities to achieve full autonomy. Furthermore, it discusses the emergence of product-market fit for agents in various sectors, while also addressing the critical importance of measuring and improving their effectiveness. Finally, the report examines the role of MCP in enabling agentic workflows and the associated security considerations. The open-source Model Context Protocol (MCP), launched by Anthropic, is rapidly gaining traction among major tech companies like OpenAI, Microsoft, Google, and Amazon, marking a shift in how AI Agents observe, plan, and act with their environments, thereby enhancing reliability and safety. AI Agents are significantly evolving, moving beyond simple workflow systems and chatbots towards autonomous and multi-agent systems capable of planning, reasoning, using tools, observing, and acting. This maturity is driving a shift from predefined workflows to self-directed agents. Agents are demonstrating growing product-market fit, particularly coding agents, and organizations are gaining significant value from agentic workflows through benefits such as reduced time-to-decision, reclaiming developer time, accelerated execution, and increased productivity. While AI Agents can currently reliably complete tasks taking human experts up to a few minutes, measuring their reliability and effectiveness is an ongoing focus, with benchmarks evolving to assess tool use and multi-turn tasks, and full autonomy dependent on advancements in areas like reasoning, integration, and social understanding. Building and scaling agents involves implementing Agent Orchestration platforms and leveraging MCP to access data and systems; however, this expanded access introduces new security risks, such as malicious tools and tool poisoning, requiring robust security measures like OAuth + RBAC and isolating trust domains.

  18. 83

    Google/AWS: Building A Secure Agent AI Application Leveraging Google's A2A Protocol

    Summary of https://arxiv.org/pdf/2504.16902 Explores the critical need for secure communication protocols as AI systems evolve into complex networks of interacting agents. It focuses on Google's Agent-to-Agent (A2A) protocol, designed to enable secure and structured communication between autonomous agents. The authors analyze A2A's security through the MAESTRO threat modeling framework, identifying potential vulnerabilities like agent card spoofing, task replay, and authentication issues, and propose mitigation strategies and best practices for secure implementation. The paper also discusses how A2A synergizes with the Model Context Protocol (MCP) to create robust agentic systems and emphasizes the importance of continuous security measures in the evolving landscape of multi-agent AI. Agentic AI and A2A Protocol Foundation: The emergence of intelligent, autonomous agents interacting across boundaries necessitates secure and interoperable communication. Google's Agent-to-Agent (A2A) protocol provides a foundational, declarative, identity-aware framework for structured, secure communication between agents, enabling them to discover capabilities via standardized Agent-Cards, authenticate, and exchange tasks. A2A Core Concepts: The A2A protocol defines key elements including the AgentCard (a public JSON metadata file describing agent capabilities), A2A Server and Client (for sending/receiving requests), the Task (the fundamental unit of work with a lifecycle), Message (a communication turn), Part (basic content unit like text or files), and Artifact (generated outputs). Communication flows involve discovery, initiation (using tasks.send or tasks.sendSubscribe), processing, input handling, and completion, potentially with push notifications. MAESTRO Threat Modeling: Traditional threat modeling falls short for agentic AI systems. The MAESTROframework (Multi-Agent Environment, Security, Threat, Risk, and Outcome), a seven-layer approach specifically for agentic AI, identifies threats relevant to A2A, including Agent Card spoofing, A2A Task replay, A2A Server impersonation, Cross-Agent Task Escalation, Artifact Tampering, Authentication & Identity Threats, and Poisoned AgentCard (embedding malicious instructions). Key Mitigation Strategies: Addressing A2A security threats requires specific controls and best practices. Crucial mitigations include using digital signatures and validation for Agent Cards, implementing replay protection (nonce, timestamp, MACs), enforcing strict message schema validation, employing Mutual TLS (mTLS) and DNSSEC for server identity, applying strict authentication/authorization (RBAC, least privilege), securing artifacts (signatures, encryption), implementing audit logging, using dependency scanning, and applying strong JWT validation and secure token storage. A2A and MCP Synergy: A2A and the Model Context Protocol (MCP) are complementary, operating at different layers of the AI stack. A2A enables horizontal agent-to-agent collaboration and task delegation, while MCP facilitates vertical integration by connecting agents to external tools and data sources. Their combined use enables complex hierarchical workflows but introduces security considerations at the integration points, requiring a comprehensive strategy.

  19. 82

    Stanford University: Predicting Long-Term Student Outcomes from Short-Term EdTech Log Data

    Summary of https://arxiv.org/pdf/2412.15473 Investigates whether student log data from educational technology, specifically from the first few hours of use, can predict long-term student outcomes like end-of-year external assessments. Using data from a literacy game in Uganda and two math tutoring systems in the US, the researchers explore if machine learning models trained on this short-term data can effectively predict performance. They examine the accuracy of different machine learning algorithms and identify some common predictive features across the diverse datasets. Additionally, the study analyzes the prediction quality for different student performance levels and the impact of including pre-assessment scores in the models. Short-term log data (2-5 hours) can effectively predict long-term outcomes. The study found that machine learning models using data from a student's first few hours of usage with educational technology provided a useful predictor of end-of-school year external assessments, with performance similar to models using data from the entire usage period (multi-month). This finding was consistent across three diverse datasets from different educational contexts and tools. Interestingly, performance did not always improve monotonically with longer horizon data; in some cases, accuracy estimates were higher using a shorter horizon. Certain log data features are consistently important predictors across different tools. Features like the percentage of success problems and the average number of attempts per problem were frequently selected as important features by the random forest model across all three datasets and both short and full horizons. This suggests that these basic counting features, which are generally obtainable from log data across many educational platforms, are valuable signals for predicting long-term performance. While not perfectly accurate for individual students, the models show good precision at predicting performance extremes. The models struggled to accurately predict students in the middle performance quintiles but showed relatively high precision when predicting students in the lowest (likely to struggle) or highest (likely to thrive) performance groups. For instance, the best model for CWTLReading was accurate 77% of the time when predicting someone would be in the lowest performance quintile (Q1) and 72% accurate for predicting the highest (Q5). This suggests potential for using these predictions to identify students who might benefit from additional support or challenges. Using a set of features generally outperforms using a single feature. While single features like percentage success or average attempts per problem still perform better than a baseline, machine learning models trained on the full set of extracted log features generally outperformed models using only a single feature. This indicates that considering multiple aspects of student interaction captured in the log data provides additional predictive power. Pre-assessment scores are powerful indicators and can be combined with log data for enhanced prediction.Pre-test or pre-assessment scores alone were found to be strong predictors for long-term outcomes, often outperforming using log data features alone. When available, combining pre-test scores with log data features generally resulted in improved prediction performance (higher R2 values) compared to using either source of data alone. However, the study notes that short-horizon log data can be a useful tool for prediction when pre-tests are not available or take time away from instruction.

  20. 81

    World Bank Group: From Chalkboard to Chatbots – Evaluating the Impact of Generative AI on Learning Outcomes in Nigeria

    Summary of https://documents1.worldbank.org/curated/en/099548105192529324/pdf/IDU-c09f40d8-9ff8-42dc-b315-591157499be7.pdf This is a Policy Research Working Paper from the World Bank's Education Global Department, published in May 2025. Titled "From Chalkboards to Chatbots: Evaluating the Impact of Generative AI on Learning Outcomes in Nigeria," it details a study on the effectiveness of using large language models, specifically Microsoft Copilot powered by GPT-4, as virtual tutors for secondary school students in Nigeria. The research, conducted through a randomized controlled trial over six weeks, found that the intervention led to significant improvements in English, digital, and AI skills among participating students, particularly female students and those with higher initial academic performance. The paper emphasizes the cost-effectiveness and scalability of this AI-powered tutoring approach in low-resource settings, although it also highlights the need to address potential inequities in access and digital literacy for broader implementation. Significant Positive Impact on Learning Outcomes: The program utilizing Microsoft Copilot (powered by GPT-4) as a virtual tutor in secondary education in Nigeria resulted in a significant improvement of 0.31 standard deviation on an assessment covering English language, artificial intelligence (AI), and digital skills for first-year senior secondary students over six weeks. The effect on English skills, which was the main outcome of interest, was 0.23 standard deviations. These effect sizes are notably high when compared to other randomized controlled trials (RCTs) in low- and middle-income countries. High Cost-Effectiveness: The intervention demonstrated substantial learning gains, estimated to be equivalent to 1.5 to 2 years of 'business-as-usual' schooling. A cost-effectiveness analysis revealed that the program ranks among some of the most cost-effective interventions for improving learning outcomes, achieving 3.2 equivalent years of schooling (EYOS) per $100 invested per participant. When considering long-term wage effects, the benefit-cost ratio was estimated to be very high, ranging from 161 to 260. Heterogeneous Effects Identified: While the program yielded positive and statistically significant treatment effects across all levels of baseline performance, the effects were found to be stronger among students with better prior academic performance and those from higher socioeconomic backgrounds. Treatment effects were also stronger among female students, which the authors note appeared to compensate for a deficit in their baseline performance. Attendance Linked to Greater Gains: A strong linear association was found between the number of days a student attended the intervention sessions and improved learning outcomes. Based on attendance data, the estimated effect size was approximately 0.031 standard deviation per additional day of attendance. Further analysis predicts substantial gains (1.2 to 2.2 standard deviations) for students participating for a full academic year, depending on attendance rates. Key Policy Implications for Low-Resource Settings: The findings suggest that AI-powered tutoring using LLMs has transformative potential in the education sector in low-resource settings. Such programs can complement traditional teaching, enhance teacher productivity, and deliver personalized learning, particularly when designed and used properly with guided prompts, teacher oversight, and curriculum alignment. The use of free tools and local staff contributes to scalability, but policymakers must address potential inequities stemming from disparities in digital literacy and technology access through investments in infrastructure, teacher training, and inclusive digital education.

  21. 80

    OpenAI: Multi-Agent Portfolio Collaboration with OpenAI Agents SDK

    Summary of https://cookbook.openai.com/examples/agents_sdk/multi-agent-portfolio-collaboration/multi_agent_portfolio_collaboration Introduces a multi-agent system built using the OpenAI Agents SDK for complex investment research. It outlines an "agent as a tool" pattern where a central Portfolio Manager agent orchestrates specialized agents (Fundamental, Macro, Quantitative) and various tools to analyze market data and generate investment reports. The text highlights the modularity, parallelism, and transparency offered by this architecture for building robust and scalable agent workflows. It details the different tool types supported by the SDK and provides an example output of the system in action, emphasizing the importance of structured prompts and tracing for building effective agent systems. Complex tasks can be broken down and delegated to multiple specialist agents for deeper, higher-quality results. Instead of using a single agent for everything, multi-agent collaboration allows different autonomous agents to handle specific subtasks or expertise areas. In the investment research example, specialists like Macro, Fundamental, and Quantitative agents contribute their expertise, leading to a more nuanced and robust answer synthesized by a Portfolio Manager agent. The "Agent as a Tool" pattern is a powerful approach for transparent and scalable multi-agent systems. This model involves a central agent (like the Portfolio Manager) calling other agents as tools for specific subtasks, maintaining a single thread of control and simplifying coordination. This approach is used in the provided example and allows for parallel execution of sub-tasks, making the overall reasoning transparent and auditable. The OpenAI Agents SDK supports a variety of tool types, offering flexibility in extending agent capabilities.Agents can leverage built-in managed tools like Code Interpreter and WebSearch, connect to external services via MCP servers (like for Yahoo Finance data), and use custom Python functions (like for FRED economic data or file operations) defined with the function_tool decorator. This broad tool support allows agents to perform advanced actions and access domain-specific data. Structured prompts and careful orchestration are crucial for building robust and consistent multi-agent workflows. The Head Portfolio Manager agent's system prompt encodes the firm's philosophy, tool usage rules, and a step-by-step workflow, ensuring consistency and auditability across runs. Modularity, parallel execution (enabled by features like parallel_tool_calls=True), and clear tool definitions are highlighted as best practices enabled by the SDK. The system design emphasizes modularity, extensibility, and observability. By wrapping specialist agents as callable tools and structuring the workflow with a central coordinator, it's easier to update, test, or add new agents or tools. OpenAI Traces provide detailed visibility into every agent and tool call, making the workflow fully transparent and easier to debug.

  22. 79

    Mary Meeker: Trends - Artificial Intelligence 2025

    Summary of https://www.bondcap.com/report/pdf/Trends_Artificial_Intelligence.pdf Extensively examines the rapid evolution of Artificial Intelligence, highlighting its unprecedented growth in user adoption, usage, and capital expenditure. It details the competitive landscape, noting the rise of open-source models and the significant presence of China alongside the USA in AI development. The text also explores AI's increasing integration into the physical world, its impact on workforces, and the ongoing investment in infrastructure like data centers and chips necessary to support this technological advancement. The pace of change catalyzed by AI is unprecedented, ramping materially faster than the Internet's early growth. This is demonstrated by record-breaking user and usage growth for AI products like ChatGPT, which reached 800 million weekly active users in just 17 months, and significantly faster user adoption compared to previous technologies. Capital expenditure (CapEx) by major technology companies is also growing rapidly, increasingly directed towards building AI infrastructure like data centers and specialized hardware. A key economic dynamic in AI is the tension between high and rising model training costs and rapidly falling inference costs per token. While training a frontier AI model can cost hundreds of millions or potentially billions of dollars, the cost to run these models (inference) has plummeted, with energy required per token falling drastically due to hardware and algorithmic advancements. This cost reduction is increasing accessibility and driving rising developer usage and new product creation, but also raises questions about the monetization and profitability of general-purpose LLMs. The AI landscape is marked by rising competition among tech incumbents, emerging attackers, and global powers. Key threats to monetization include this intense competition, the growing capabilities and accessibility of open-source models which are closing the performance gap with closed models, and the rapid advancement and relevance of China's AI capabilities, which are catching up to USA models, increasingly powered by local semiconductors, and dominating domestic usage. AI adoption and evolution are happening across diverse sectors and applications at a rapid pace. Beyond digital applications, AI is increasingly integrating into the physical world, enabling autonomous systems in areas like transportation, defense, agriculture, and robotics. It is also fundamentally transforming work, driving productivity improvements for employees and leading to significant growth in AI-related job postings and the adoption of AI tools by firms. AI is poised to fundamentally reshape the internet experience for the next wave of global users, who may come online through AI-native interfaces (like conversational agents) powered by expanding satellite connectivity, potentially bypassing traditional app ecosystems. This technological shift is intertwined with increasing geopolitical competition, particularly between the United States and China, where leadership in AI is viewed as a critical component of national resilience and geopolitical influence, creating an AI "space race" with significant international implications.

  23. 78

    Center for AI Policy: AI Agents – Governing Autonomy in the Digital Age

    Summary of https://cdn.prod.website-files.com/65af2088cac9fb1fb621091f/682f96d6b3bd5a3e1852a16a_AI_Agents_Report.pdf Presents an overview of AI agents, defined as autonomous systems capable of complex tasks without constant human supervision, highlighting their rapid progression from research to real-world application. It identifies three major risks: catastrophic misuse through malicious applications, gradual human disempowerment as decision-making shifts to algorithms, and significant workforce displacement due to automation of cognitive tasks. The report proposes four policy recommendations for Congress, including an Autonomy Passport for registration and oversight, mandatory continuous monitoring and recall authority, requiring human oversight for high-consequence decisions, and implementing workforce impact research to address potential job losses. These measures aim to mitigate the risks while allowing the beneficial aspects of AI agent development to continue. AI agents represent a significant shift in AI capabilities, moving from research to widespread deployment. Unlike chatbots, these systems are autonomous and goal-directed, capable of taking a broad objective, planning their own steps, using external tools, and iterating without continuous human prompting. They can operate across multiple digital environments and automate decisions, not just steps. Agent autonomy exists on a spectrum, categorized into five levels ranging from shift-length assistants to frontier super-capable systems. The widespread adoption of autonomous AI agents presents three primary risks: catastrophic misuse, where agents could enable dangerous attacks or cyber-intrusions; gradual human disempowerment, as decision-making power shifts to opaque algorithms across economic, cultural, and governmental systems; and workforce displacement, with projections indicating that tasks equivalent to roughly 300 million full-time global positions could be automated, affecting mid-skill and cognitive roles more rapidly than previous automation waves. To mitigate these risks, the report proposes four key policy recommendations for Congress. These include creating a federal Autonomy Passport system for registering high-capability agents before deployment, mandating continuous oversight and recall authority (including containment and provenance tracking) to quickly suspend problematic deployments, requiring human oversight by qualified professionals for high-consequence decisions in domains like healthcare, finance, and critical infrastructure, and directing federal agencies to monitor workforce impacts annually. The proposed policy measures are designed to be proportional to the level of agent autonomy and the domain of deployment, focusing rigorous oversight on where autonomy creates the highest risk while allowing lower-risk innovation to proceed. For instance, the Autonomy Passport requirement and continuous oversight mechanisms target agents classified at Level 2 or higher on the five-level autonomy scale. Early deployments demonstrate significant productivity gains, and experts project agents could tackle projects equivalent to a full human work-month by 2029. However, the pace of AI agent development is accelerating faster than the governance frameworks designed to contain its risks, creating a critical mismatch and highlighting the need for proactive policy intervention before the next generation of agents is widely deployed.

  24. 77

    North-West University: Exploring AI-Driven Conversations as Dynamic OER for Self-Directed Learners

    Summary of https://conference.pixel-online.net/files/foe/ed0015/FP/8250-ESOC7276-FP-FOE15.pdf This conceptual paper explores the potential of AI-driven conversations, such as those from ChatGPT, to function as dynamic Open Educational Resources (OER) that support self-directed learning (SDL). Unlike traditional, static resources, AI-powered dialogues offer personalized, interactive, and adaptive experiences that align with learners' needs. The paper argues that these tools can nurture key SDL competencies while acknowledging ethical, pedagogical, and technical considerations. Ultimately, the authors propose that thoughtfully designed AI-driven OER can empower learners and teachers and contribute to a more inclusive and responsive future for open education. AI-driven conversations can act as dynamic OER to support SDL. AI-driven conversations, such as those facilitated by ChatGPT, have the potential to function as dynamic Open Educational Resources (OER). Unlike traditional static resources, these dialogues offer personalised, interactive, and adaptive experiences that align with learners' unique needs and goals. This dynamic capability contrasts with static OER. AI supports core principles and competencies of Self-Directed Learning (SDL). AI-driven conversations and generative AI tools can nurture key SDL competencies such as goal setting, self-monitoring, and reflective practice. They support learner autonomy, responsibility, self-motivation, and empower students to take initiative, plan, and manage their learning processes. AI also enhances online collaboration, creativity, problem-solving, and communication skills, which align with SDL characteristics. AI integration can enhance Open Educational Practices (OEP) and improve access and inclusivity.Integrating AI into OEP holds the potential to address long-standing challenges in open education, such as learner engagement, the wider reach and adaptability of resources, and inclusive access. AI supports the creation of diverse and inclusive learning resources, facilitating multilingual and culturally relevant content generation. This integration aligns with the values of access, equity, and transparency that underpin open education. Significant challenges exist in integrating AI into open education. Key challenges include legal and ethical concerns related to copyright, data privacy, and potential biases in AI outputs. There are also technical limitationsdue to fragmented OER infrastructure and a critical need for teacher preparedness and AI literacy, as many educators lack the foundational knowledge and confidence to use AI technologies effectively. Successful integration requires thoughtful planning, policy, and professional development. To effectively realise the potential of AI-driven OER for SDL within OEP, it requires thoughtful design, robust infrastructure, inclusive policies, and sustained professional development for teachers. Recommendations include developing ethical guidelines, investing in compatible OER infrastructure, promoting inclusive AI design, providing professional development focused on both AI literacy and SDL skills for teachers, and encouraging ongoing research.

  25. 76

    Google: Agents Companion

    Summary of https://www.kaggle.com/whitepaper-agent-companion This technical document, the Agents Companion, explores the advancements in generative AI agents, highlighting their architecture composed of models, tools, and an orchestration layer, moving beyond traditional language models. It emphasizes Agent Ops as crucial for operationalizing these agents, drawing parallels with DevOps and MLOps while addressing agent-specific needs like tool management. The paper thoroughly examines agent evaluation methodologies, covering capability assessment, trajectory analysis, final response evaluation, and the importance of human-in-the-loop feedback alongside automated metrics. Furthermore, it discusses the benefits and challenges of multi-agent systems, outlining various design patterns and their application, particularly within automotive AI. Finally, the Companion introduces Agentic RAG as an evolution in knowledge retrieval and presents Google Agentspace as a platform for developing and managing enterprise-level AI agents, even proposing the concept of "Contract adhering agents" for more robust task execution. Agent Ops is Essential: Building successful agents requires more than just a proof-of-concept; it necessitates embracing Agent Ops principles, which integrate best practices from DevOps and MLOps, while also focusing on agent-specific elements such as tool management, orchestration, memory, and task decomposition. Metrics Drive Improvement: To build, monitor, and compare agent revisions, it is critical to start with business-level Key Performance Indicators (KPIs) and then instrument agents to track granular metrics related to critical tasks, user interactions, and agent actions (traces). Human feedback is also invaluable for understanding where agents excel and need improvement. Automated Evaluation is Key: Relying solely on manual testing is insufficient. Implementing automated evaluation frameworks is crucial to assess an agent's core capabilities, its trajectory (the steps taken to reach a solution, including tool use), and the quality of its final response. Techniques like exact match, in-order match, and precision/recall are useful for trajectory evaluation, while autoraters (LLMs acting as judges) can assess final response quality. Human-in-the-Loop is Crucial: While automated metrics are powerful, human evaluation provides essential context, particularly for subjective aspects like creativity, common sense, and nuance. Human feedback should be used to calibrate and validate automated evaluation methods, ensuring alignment with desired outcomes and preventing the outsourcing of domain knowledge. Multi-Agent Systems Offer Advantages: For complex tasks, consider leveraging multi-agent architectures. These systems can enhance accuracy through cross-checking, improve efficiency through parallel processing, better handle intricate problems by breaking them down, increase scalability by adding specialized agents, and improve fault tolerance. Understanding different design patterns like sequential, hierarchical, collaborative, and competitive is important for choosing the right architecture for a given application.

  26. 75

    UC San Diego: Large Language Models Pass the Turing Test

    Summary of https://arxiv.org/pdf/2503.23674 Researchers Cameron R. Jones and Benjamin K. Bergen investigated whether advanced large language models (LLMs) can pass the standard three-party Turing test. Their study involved human interrogators conversing with both a human and an AI, then judging which was human. The findings indicate that GPT-4.5, when prompted to adopt a persona, was identified as human significantly more often than the actual human participant, marking the first empirical evidence of an AI passing this rigorous version of the test. While other models like LLaMa-3.1 showed some human-like qualities, only the persona-prompted models consistently fooled human judges. The study also explored the strategies used by interrogators and the implications of these results for our understanding of AI capabilities and their societal impact. This study provides the first empirical evidence that a Large Language Model (LLM), specifically GPT-4.5 when prompted to adopt a humanlike persona, can pass a standard three-party Turing test. In the study, GPT-4.5-PERSONA was judged as human 73% of the time, significantly more often than the actual human participants. Prompting with a humanlike persona was crucial for success. While GPT-4.5-PERSONA and LLAMA-PERSONA were not reliably distinguished from humans, the same models without the persona prompt (GPT-4.5-NO-PERSONA and LLAMA-NO-PERSONA) performed significantly worse, often being identified as AI at rates significantly below chance in the undergraduate study. The study compared the performance of several AI systems, including GPT-4.5, LLaMa-3.1-405B, GPT-4o, and ELIZA. The baseline models, GPT-4o-NO-PERSONA and ELIZA, had significantly lower win rates, indicating that interrogators could generally distinguish them from humans. This suggests the interrogators were not simply guessing randomly. The research indicates that interrogators often relied on social, emotional, and linguistic cues rather than traditional measures of knowledge and reasoning when trying to distinguish between humans and AI. Interestingly, providing strange prompts or using "jailbreaks" was the most effective strategy for interrogators, while asking about the weather or human experiences was least effective. The findings have significant social and economic implications, suggesting that contemporary LLMs could potentially substitute for humans in short conversations, raising concerns about deception, misinformation, and the potential undermining of real human interaction. The study also found that general knowledge about LLMs and frequent chatbot interaction did not consistently improve participants' ability to distinguish AI from humans.

  27. 74

    Elon University: Being Human in 2035 – How Are We Changing in the Age of AI?

    Summary of https://imaginingthedigitalfuture.org/wp-content/uploads/2025/03/Being-Human-in-2035-ITDF-report.pdf This Elon University Imagining the Digital Future Center report compiles insights from a non-scientific canvassing of technology pioneers, builders, and analysts regarding the potential shifts in human capacities and behaviors by 2035 due to advanced AI. Experts anticipate blurred boundaries between reality and fiction, human and artificial intelligence, and human and synthetic creations, alongside concerns about eroding individual identity, autonomy, and critical thinking skills. The report explores both optimistic visions of AI augmenting human potential and creativity and pessimistic scenarios involving increased dependence, social division, and the erosion of essential human qualities like empathy and moral judgment. Ultimately, it highlights the critical need for ethical development, regulation, and education to navigate the profound societal changes anticipated in the coming decade. A significant majority of experts anticipate deep and meaningful or even fundamental and revolutionary change in people’s native operating systems and operations as humans broadly adapt to and use advanced AI by 2035. Experts predict mostly negative changes in several core human traits and behaviors by 2035, including social and emotional intelligence, the capacity for deep thinking, trust in shared values, empathy, mental well-being, sense of agency, and sense of identity and purpose. Conversely, pluralities of experts expect mostly positive changes in human curiosity and capacity to learn, decision-making and problem-solving abilities, and innovative thinking and creativity due to interactions with AI. Many experts express concern about the potential for AI to be used in ways that de-augment humanity, serving the interests of tool builders and those in power, potentially leading to a global sociotechnical dystopia. However, they also see the potential for AI to augment human intelligence and bring about universal enlightenment if the direction of development changes. The experts underscore the critical importance of how humans choose to integrate AI into their lives and societies. They emphasize the need for ethical considerations, human-centered design, the establishment of human values in AI development and policy, and the preservation of human agency to ensure AI serves humanity's flourishing rather than diminishing essential human capacities.

  28. 73

    Bain & Company: Nvidia GTC 2025 – AI Matures into Enterprise Infrastructure

    Summary of https://www.bain.com/globalassets/noindex/2025/bain_article_nvidia_gtc_2025_ai_matures_into_enterprise_infrastructure.pdf Nvidia's GTC 2025 highlighted a significant shift in AI, moving from experimental phases to becoming core enterprise infrastructure. The event showcased how data remains crucial, but AI itself is now a data generator, leading to new insights and efficiencies. Furthermore, smaller, specialized AI models are gaining prominence, offering cost advantages and improved control. While fully autonomous AI agents are still rare, structured semi-autonomous systems with human oversight are becoming standard. Finally, the conference underscored the growing importance of digital twins, video analytics, and accessible off-the-shelf tools in democratizing enterprise AI adoption and fostering cross-functional collaboration through simulation. AI has matured beyond pilot projects and is now being deployed at scale within the core operations of enterprises. Companies are re-architecting how they compete by moving AI from innovation teams into the business core. Data remains both a critical challenge and a significant opportunity for AI success. Successful AI deployments rely on clean, connected, and accessible data. Furthermore, AI is now generating a new layer of data through insights and generative applications. The trend is shifting towards smaller, specialized AI models that are more cost-effective and offer better control, latency, and privacy. Techniques like quantization, pruning, and RAG are facilitating this shift, although deploying and managing these custom models presents new operational complexities. Agentic AI is gaining traction, but its successful implementation hinges on structure, transparency, and human oversight. While fully autonomous agents are rare, semiautonomous systems with built-in safeguards and orchestration platforms are becoming the near-term standard. Digital twins and simulation have moved from innovation showcases to everyday enterprise tools, enabling faster rollout cycles, lower risk, and more informed decision-making. Simulation is also evolving into a collaboration platform for cross-functional teams.

  29. 72

    Anthropic: Circuit Tracing – Revealing Computational Graphs in Language Models

    Summary of https://transformer-circuits.pub/2025/attribution-graphs/methods.html Introduces a novel methodology called "circuit tracing" to understand the inner workings of language models. The authors developed a technique using "replacement models" with interpretable components to map the computational steps of a language model as "attribution graphs." These graphs visually represent how different computational units, or "features," interact to process information and generate output for specific prompts. The research details the construction, visualization, and validation of these graphs using an 18-layer model and offers a preview of their application to a more advanced model, Claude 3.5 Haiku. The study explores the interpretability and sufficiency of this method through various evaluations, including case studies on acronym generation and addition. While acknowledging limitations like missing attention circuits and reconstruction errors, the authors propose circuit tracing as a significant step towards achieving mechanistic interpretability in large language models. This paper introduces a methodology for revealing computational graphs in language models using Cross-Layer Transcoders (CLTs) to extract interpretable features and construct attribution graphs that depict how these features interact to produce model outputs for specific prompts. This approach aims to bridge the gap between raw neurons and high-level model behaviors by identifying meaningful building blocks and their interactions. The methodology involves several key steps: training CLTs to reconstruct MLP outputs, building attribution graphs with nodes representing active features, tokens, errors, and logits, and edges representing linear effects between these nodes. A crucial aspect is achieving linearity in feature interactions by freezing attention patterns and normalization denominators. Attribution graphs allow for the study of how information flows from the input prompt through intermediate features to the final output token. The paper demonstrates the application of this methodology through several case studies, including acronym generation, factual recall, and small number addition. These case studies illustrate how attribution graphs can reveal the specific features and pathways involved in different cognitive tasks performed by language models. For instance, in the addition case study, the method uncovers a hierarchy of heuristic features that collaboratively solve the task. Despite the advancements, the methodology has several significant limitations. A key limitation is the missing explanation of how attention patterns are formed and how they mediate feature interactions (QK-circuits), as the analysis is conducted with fixed attention patterns. Other limitations include reconstruction errors (unexplained model computation), the role of inactive features and inhibitory circuits, the complexity of the resulting graphs, and the difficulty of understanding global circuits that generalize across many prompts. The paper also explores the concept of global weights between features, which are prompt-independent and aim to capture general algorithms used by the replacement model. However, interpreting these global weights is challenging due to issues like interference (spurious connections) and the lack of accounting for attention-mediated interactions. While attribution graphs provide insights on specific prompts, future work aims to enhance the understanding of global mechanisms and address current limitations, potentially through advancements in dictionary learning and handling of attention mechanisms.

  30. 71

    RAND: Uneven Adoption of AI Tools Among U.S. Teachers and Principals in the 2023-2024 School Year

    Summary of https://www.rand.org/content/dam/rand/pubs/research_reports/RRA100/RRA134-25/RAND_RRA134-25.pdf A RAND Corporation report, utilizing surveys from the 2023-2024 school year, investigates the adoption and use of artificial intelligence tools by K-12 public school teachers and principals. The research highlights that roughly one-quarter of teachers reported using AI for instructional planning or teaching, with higher usage among ELA and science teachers and those in lower-poverty schools. Simultaneously, nearly 60 percent of principals indicated using AI in their jobs, primarily for administrative tasks like drafting communications. The study also found that guidance and support for AI use were less prevalent in higher-poverty schools for both educators, suggesting potential inequities in AI integration. Ultimately, the report underscores the emerging role of AI in education and recommends developing strategies and further research to ensure its effective and equitable implementation. A significant portion of educators are using AI tools, but there's considerable variation. Approximately one-quarter of teachers reported using AI tools for instructional planning or teaching, with higher rates among ELA and science teachers, as well as secondary teachers. Notably, nearly 60 percent of principals reported using AI tools in their jobs. However, usage differed by subject taught and school characteristics, with teachers and principals in higher-poverty schools being less likely to report using AI tools. Teachers primarily use AI for instructional planning, while principals focus on administrative tasks. Teachers most commonly reported using AI to generate lesson materials, assess students, and differentiate instruction. Principals primarily used AI to draft communications, support other school administrative tasks, and assist with teacher hiring, evaluation, or professional learning. Disparities exist in AI adoption and support based on school poverty levels. Teachers and principals in lower-poverty schools were more likely to use AI and reported receiving more guidance on its use compared to their counterparts in higher-poverty schools. Furthermore, schools in higher-poverty areas were less likely to be developing AI usage policies. This suggests a widening gap in AI integration and the potential for unequal access to its benefits. Educators have several concerns regarding AI use, including a lack of professional learning and data privacy. Principals identified a lack of professional development, concerns about data privacy, and uncertainty about how to use AI as major influences on their AI adoption. Teachers also expressed mixed perceptions about AI's helpfulness, noting the need to assess the quality of AI output and potential for errors. The report highlights the need for intentional strategies and further research to effectively integrate AI in education. The authors recommend that districts and schools develop strategies to support AI use in ways that improve instruction and learning, focusing on AI's potential for differentiated instruction, practice opportunities, and student engagement. They also emphasize the importance of research to identify effective AI applications and address disparities in access and guidance, particularly for higher-poverty schools.

  31. 70

    Stanford University: Expanding Academia's Role in Public Sector AI

    Summary of https://hai-production.s3.amazonaws.com/files/hai-issue-brief-expanding-academia-role-public-sector.pdf Stanford HAI highlights a growing disparity between academia and industry in frontier AI research. Industry's access to vast resources like data and computing power allows them to outpace universities in developing advanced AI systems. The authors argue that this imbalance risks hindering public-interest AI innovation and weakening the talent pipeline. To address this, the brief proposes increased public investment in academic AI, the adoption of collaborative research models, and the creation of new government-backed academic institutions. Ultimately, the aim is to ensure academia plays a vital role in shaping the future of AI in a way that benefits society. Academia is currently lagging behind industry in frontier AI research because no university possesses the resources to build AI systems comparable to those in the private sector. This is largely due to industry's access to massive datasets and significantly greater computational power. Industry's dominance in AI development is driven by its unprecedented computational resources, vast datasets, and top-tier talent, leading to AI models that are considerably larger than those produced by academia. This resource disparity has become a substantial barrier to entry for academic researchers. For AI to be developed responsibly and in the public interest, it is crucial for governments to increase investment in public sector AI, with academia at the forefront of training future innovators and advancing cutting-edge scientific research. Historically, academia has been the source of foundational AI technologies and prioritizes public benefit over commercial gain. The significant cost of developing advanced AI models has created a major divide between industry and academia. The expense of computational resources required for state-of-the-art models has grown exponentially, making it challenging for academics to meaningfully contribute to their development. The growing resource gap in funding, computational power, and talent between academia and industry is concerning because it restricts independent, public-interest AI research, weakens the future talent pipeline by incentivizing students to join industry, and can skew AI policy discussions in favor of well-funded private sector interests.

  32. 69

    University of Texas at Austin: Protecting Human Cognition in the Age of AI

    Summary of https://arxiv.org/pdf/2502.12447 Explores the rapidly evolving influence of Generative AI on human cognition, examining its effects on how we think, learn, reason, and engage with information. Synthesizing existing research, the authors analyze these impacts through the lens of educational frameworks like Bloom's Taxonomy and Dewey's reflective thought theory. The work identifies potential benefits and significant concerns, particularly regarding critical thinking and knowledge retention among novices. Ultimately, it proposes implications for educators and test designers and suggests future research directions to understand the long-term cognitive consequences of AI. Generative AI (GenAI) is rapidly reshaping human cognition, influencing how we engage with information, think, reason, and learn. This adoption is happening at a much faster rate compared to previous technological advancements like the internet. While GenAI offers potential benefits such as increased productivity, enhanced creativity, and improved learning experiences, there are significant concerns about its potential long-term detrimental effects on essential cognitive abilities, particularly critical thinking and reasoning. The paper primarily focuses on these negative impacts, especially on novices like students. GenAI's impact on cognition can be understood through frameworks like Krathwohl’s revised Bloom’s Taxonomy and Dewey’s conceptualization of reflective thought. GenAI can accelerate access to knowledge but may bypass the cognitive processes necessary for deeper understanding and the development of metacognitive skills. It can also disrupt the prerequisites for reflective thought by diminishing cognitive dissonance, reinforcing existing beliefs, and creating an illusion of comprehensive understanding. Over-reliance on GenAI can lead to 'cognitive offloading' and 'metacognitive laziness', where individuals delegate cognitive tasks to AI, reducing their own cognitive engagement and hindering the development of critical thinking and self-regulation. This is particularly concerning for novice learners who have less experience with diverse cognitive strategies. To support thinking and learning in the AI era, there is a need to rethink educational experiences and design 'tools for thought' that foster critical and evaluative skills. This includes minimizing AI use in the early stages of learning to encourage productive struggle, emphasizing critical evaluation of AI outputs in curricula and tests, and promoting active engagement with GenAI tools through methods like integrating cognitive schemas and using metacognitive prompts. The paper also highlights the need for long-term research on the sustained cognitive effects of AI use.

  33. 68

    University of Bristol: Alice in Wonderland – Simple Tasks Showing Complete Reasoning Breakdown in State-of-the-Art LLMs

    Summary of https://arxiv.org/pdf/2406.02061 Introduces the "Alice in Wonderland" (AIW) problem, a seemingly simple common-sense reasoning task, to evaluate the capabilities of state-of-the-art Large Language Models (LLMs). The authors demonstrate that even advanced models like GPT-4 and Claude 3 Opus exhibit a dramatic breakdown in generalization and basic reasoning when faced with minor variations of the AIW problem that do not alter its core structure or difficulty. This breakdown is characterized by low average performance and significant fluctuations in accuracy across these variations, alongside overconfident, yet incorrect, explanations. The study further reveals that standardized benchmarks fail to detect these limitations, suggesting a potential overestimation of current LLM reasoning abilities, possibly due to data contamination or insufficient challenge diversity. Ultimately, the AIW problem is presented as a valuable tool for uncovering fundamental weaknesses in LLMs' generalization and reasoning skills that are not apparent in current evaluation methods. Despite achieving high scores on various standardized benchmarks, many state-of-the-art Large Language Models (LLMs) exhibit surprisingly low correct response rates on the seemingly simple "Alice has brothers and sisters" (AIW) problem and its variations. Only a few large-scale closed models like GPT-4o and Claude 3 Opus show relatively better performance, while many others, including models claiming strong function, struggle significantly, sometimes even collapsing to a zero correct response rate. The document highlights a significant discrepancy between the performance of LLMs on standardized reasoning benchmarks and on the AIW problem, suggesting that current benchmarks may not accurately reflect true generalization and basic reasoning skills. Models that score highly on benchmarks like MMLU, MATH, ARC-c, GSM8K, and HellaSwag often perform poorly on AIW, indicating a potential issue with the benchmarks' ability to detect fundamental deficits in model function. This suggests that these benchmarks might suffer from issues like test data leakage. A key observation is the lack of robustness in SOTA LLMs, evidenced by strong performance fluctuations across structure and difficulty-preserving variations of the same AIW problem. Even slight changes in the numerical values within the problem statement can lead to drastically different correct response rates for many models. This sensitivity to minor variations points to underlying generalization deficits. The study reveals that LLMs often exhibit overconfidence and provide persuasive, explanation-like confabulations even when their answers to AIW problems are incorrect. This can mislead users into trusting wrong responses, especially in situations where verification is difficult. Furthermore, many models struggle to properly detect mistakes and revise their incorrect solutions, even when encouraged to do so. The AIW problem and its variations are presented as valuable tools for evaluating the robustness and generalization capabilities of LLMs, offering a method to reveal weaknesses that are not captured by standard benchmarks. The ability to create numerous diverse problem instances through variations addresses potential test set leakage issues. The introduction of a unified robustness score (R) is proposed to provide a more accurate model ranking by considering both average correct response rate and the degree of performance fluctuations across problem variations.

  34. 67

    NIST: Adversarial Machine Learning – A Taxonomy and Terminology of Attacks and Mitigations

    Summary of https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-2e2025.pdf This NIST report explores the landscape of adversarial machine learning (AML), categorizing attacks and corresponding defenses for both traditional (predictive) and modern generative AI systems. It establishes a taxonomy and terminology to create a common understanding of threats like data poisoning, evasion, privacy breaches, and prompt injection. The document also highlights key challenges and limitations in current AML research and mitigation strategies, emphasizing the trade-offs between security, accuracy, and other desirable AI characteristics. Ultimately, the report aims to inform standards and practices for managing the security risks associated with the rapidly evolving field of artificial intelligence. This report establishes a taxonomy and defines terminology for the field of Adversarial Machine Learning (AML). The aim is to create a common language within the rapidly evolving AML landscape to inform future standards and practice guides for securing AI systems. The report provides separate taxonomies for attacks targeting Predictive AI (PredAI) systems and Generative AI (GenAI) systems. These taxonomies categorize attacks based on attacker goals and objectives (availability breakdown, integrity violation, privacy compromise, and misuse enablement for GenAI), attacker capabilities, attacker knowledge, and the stages of the machine learning lifecycle. The report describes various AML attack classes relevant to both PredAI and GenAI, including evasion, poisoning (data and model poisoning), privacy attacks (such as data reconstruction, membership inference, and model extraction), and GenAI-specific attacks like direct and indirect prompt injection, and supply chain attacks. For each attack class, the report discusses existing mitigation methods and their limitations. The report identifies key challenges in the field of AML. These challenges include the inherent trade-offs between different attributes of trustworthy AI (e.g., accuracy and adversarial robustness), theoretical limitations on achieving perfect adversarial robustness, and the complexities of evaluating the effectiveness of mitigations across the diverse and evolving AML landscape. Factors like the scale of AI models, supply chain vulnerabilities, and multimodal capabilities further complicate these challenges. Managing the security of AI systems requires a comprehensive approach that combines AML-specific mitigations with established cybersecurity best practices. Understanding the relationship between these fields and identifying any unique security considerations for AI that fall outside their scope is crucial for organizations seeking to secure their AI deployments.

  35. 66

    Purdue University: The Emergence of AI Ethics Auditing

    Summary of https://journals.sagepub.com/doi/10.1177/20539517241299732 Explores the emerging field of artificial intelligence ethics auditing, examining its rapid growth and current state through interviews with 34 professionals. It finds that while AI ethics audits often mirror financial auditing processes, they currently lack robust stakeholder involvement, clear success metrics, and external reporting. The study highlights a predominant technical focus on bias, privacy, and explainability, often driven by impending regulations like the EU AI Act. Auditors face challenges including regulatory ambiguity, resource constraints, and organizational complexity, yet they play a vital role in developing frameworks and interpreting standards within this evolving landscape. AI ethics auditing is an emerging field that mirrors financial auditing in its process (planning, performing, and reporting) but currently lacks robust stakeholder involvement, measurement of success, and external reporting. These audits are often hyper-focused on technical AI ethics principles like bias, privacy, and explainability, potentially neglecting broader socio-technical considerations. Regulatory requirements and reputational risk are the primary drivers for organizations to engage in AI ethics audits. The EU AI Act is frequently mentioned as a significant upcoming regulation influencing the field. While reputational concerns can be a motivator, a more sustainable approach involves recognizing the intrinsic value of ethical AI for performance and user trust. Conducting AI ethics audits is fraught with challenges, including ambiguity in interpreting preliminary and piecemeal regulations, a lack of established best practices, organizational complexity, resource constraints, insufficient technical and data infrastructure, and difficulties in interdisciplinary coordination. Many organizations are not yet adequately prepared to undergo effective AI audits due to a lack of AI governance frameworks. The AI ethics auditing ecosystem is still in development, characterized by ambiguity between auditing and consulting activities, and a lack of standardized measures for quality and accredited procedures. Despite these limitations, AI ethics auditors play a crucial role as "ecosystem builders and translators" by developing frameworks, interpreting regulations, and curating practices for auditees, regulators, and other stakeholders. Significant gaps exist in the AI ethics audit ecosystem regarding the measurement of audit success, effective and public reporting of findings, and broader stakeholder engagement beyond technical and risk professionals. There is a need for more emphasis on defining success metrics, increasing transparency through external reporting, and actively involving diverse stakeholders, including the public and vulnerable groups, in the auditing process.

  36. 65

    Nature: The Mental Health Implications of AI Adoption – The Crucial Role of Self-Efficacy

    Summary of https://www.nature.com/articles/s41599-024-04018-w Investigates how the increasing use of artificial intelligence in organizations affects employee mental health, specifically job stress and burnout. The study of South Korean professionals revealed that AI adoption indirectly increases burnout by first elevating job stress. Importantly, the research found that employees with higher self-efficacy in learning AI experience less job stress related to AI implementation. The findings underscore the need for organizations to manage job stress and foster AI learning confidence to support employee well-being during technological change. Ultimately, this work highlights the complex relationship between AI integration and its psychological impact on the workforce. AI adoption in organizations does not directly lead to employee burnout. Instead, its impact is indirect, operating through the mediating role of job stress. AI adoption significantly increases job stress, which in turn increases burnout. Self-efficacy in AI learning plays a crucial role in moderating the relationship between AI adoption and job stress. Employees with higher self-efficacy in their ability to learn AI experience a weaker positive relationship between AI adoption and job stress. This means that confidence in learning AI can buffer against the stress induced by AI adoption. The findings emphasize the importance of a human-centric approach to AI adoption in the workplace. Organizations need to proactively address the potential negative impact of AI adoption on employee well-being by implementing strategies to manage job stress and foster self-efficacy in AI learning. Investing in AI training and development programs is essential for enhancing employees' self-efficacy in AI learning. By boosting their confidence in understanding and utilizing AI technologies, organizations can mitigate the negative effects of AI adoption on employee stress and burnout. This study contributes to the existing literature by providing empirical evidence for the indirect impact of AI adoption on burnout through job stress and the moderating role of self-efficacy in AI learning, utilizing the Job Demands-Resources (JD-R) model and Social Cognitive Theory (SCT) as theoretical frameworks. This enhances the understanding of the psychological mechanisms involved in the relationship between AI adoption and employee mental health.

  37. 64

    ECIIA: The AI Act – Road to Compliance

    Summary of https://www.eciia.eu/wp-content/uploads/2025/01/The-AI-Act-Road-to-Compliance-Final-1.pdf "The AI Act: Road to Compliance," serves as a practical guide for internal auditors navigating the European Union's Artificial Intelligence Act, which entered into force in August 2024. It outlines the key aspects of the AI Act, including its risk-based approach that categorizes AI systems and imposes varying obligations based on risk levels, as well as the different roles of entities within the AI value chain, such as providers and deployers. The guide details the implementation timeline of the Act and the corresponding obligations and requirements for organizations. Furthermore, it presents survey results from over 40 companies regarding their AI adoption, compliance preparations, and the internal audit function's understanding and auditing of AI. Ultimately, the document emphasizes the crucial role of internal auditors in ensuring their organizations achieve compliance and responsibly manage AI risks. The EU AI Act is now in force (August 1, 2024) and employs a risk-based approach to regulate AI systems, categorizing them into unacceptable, high, limited, and minimal risk levels, with increasing obligations corresponding to higher risk. There's also a specific category for General Purpose AI (GPAI) models, with additional requirements for those deemed to have systemic risk. Organizations involved with AI systems have different roles (provider, deployer, importer, distributor, authorised representative), each with distinct responsibilities and compliance requirements under the AI Act. The provider and deployer are the primary roles, with providers facing more extensive obligations. Compliance with the AI Act has a phased implementation timeline with key dates starting from February 2025 (prohibited AI systems) through August 2027 (high-risk AI components in products). Organizations need to start preparing by creating AI inventories, classifying systems by risk, and establishing appropriate policies. Internal auditors play a vital role in helping organizations achieve compliance with the AI Act by assessing AI risks, auditing AI processes and governance, and making recommendations. They need to ensure the implementation of AI Act requirements within their organizations. A recent survey of over 40 companies revealed widespread AI adoption but a relatively low level of understanding of the AI Act within internal audit departments. Most internal audit departments are not yet leveraging AI, but when they do, it's mainly for risk assessment. Ensuring adequate AI auditing skills through training is highlighted as a need.

  38. 63

    Harvard Business School: The Cybernetic Teammate – A Field Experiment on Generative AI Reshaping Teamwork and Expertise

    Summary of https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5188231 This working paper details a field experiment examining the impact of generative AI on teamwork and expertise within Procter & Gamble. The study involved 776 professionals working on real product innovation challenges, randomly assigned to individual or team settings with or without AI assistance. The research investigated how AI affects performance, expertise sharing across functional silos, and the social and emotional aspects of collaboration. Findings indicate that AI significantly enhances performance, allowing individuals with AI to match the output quality of traditional human teams. Moreover, AI facilitates the creation of more balanced solutions, regardless of professional background, and fosters more positive emotional responses among users. Ultimately, the paper suggests that AI functions as a "cybernetic teammate," prompting organizations to reconsider team structures and the nature of collaborative work in the age of intelligent machines. AI significantly enhances performance in knowledge work, with individuals using AI achieving a level of solution quality comparable to two-person teams without AI. This suggests that AI can effectively replicate certain benefits of human collaboration in terms of output quality. AI breaks down functional silos and broadens expertise. Professionals using AI produced more balanced solutions that spanned both commercial and technical aspects, regardless of their professional background (R&D or Commercial). AI can also help individuals with less experience in product development achieve performance levels similar to teams with experienced members. AI fosters positive emotional responses among users. Participants reported more positive emotions (excitement, energy, enthusiasm) and fewer negative emotions (anxiety, frustration) when working with AI compared to working alone without AI, matching or even exceeding the emotional benefits traditionally associated with human teamwork. AI-augmented teams have a higher likelihood of generating exceptional, top-tier solutions. Teams working with AI were significantly more likely to produce solutions ranking in the top 10% of all submissions, indicating that the combination of human collaboration and AI can be particularly powerful for achieving breakthrough innovations. AI is not merely a tool but functions as a "cybernetic teammate" that reshapes collaboration. It dynamically interacts with human problem-solvers, provides real-time feedback, bridges expertise boundaries, and influences emotional states, suggesting a fundamental shift in how knowledge work can be structured and carried out.

  39. 62

    Baruch College: Not all AI is Created Equal – A Meta-Analysis Revealing Drivers of AI Resistance Across Markets, Methods, and Time

    Summary of https://www.sciencedirect.com/science/article/pii/S0167811625000114 Presents a meta-analysis of two decades of studies examining consumer resistance to artificial intelligence (AI). The authors synthesize findings from hundreds of studies with over 76,000 participants, revealing that AI aversion is context-dependent and varies based on the AI's label, application domain, and perceived characteristics. Interestingly, the study finds that negative consumer responses have decreased over time, particularly for cognitive evaluations of AI. Furthermore, the meta-analysis indicates that research design choices influence observed AI resistance, with studies using more ecologically valid methods showing less aversion. Consumers exhibit an overall small but statistically significant aversion to AI (average Cohen’s d = -0.21). This means that, on average, people tend to respond more negatively to outputs or decisions labeled as coming from AI compared to those labeled as coming from humans. Consumer aversion to AI is strongly context-dependent, varying significantly by the AI label and the application domain. Embodied forms of AI, such as robots, elicit the most negative responses (d = -0.83) compared to AI assistants or mere algorithms. Furthermore, domains involving higher stakes and risks, like transportation and public safety, trigger more negative responses than domains focused on productivity and performance, such as business and management. Consumer responses to AI are not static and have evolved over time, generally becoming less negative, particularly for cognitive evaluations (e.g., performance or competence judgements). While initial excitement around generative AI in 2021 led to a near null-effect in cognitive evaluations, affective and behavioral responses still remain significantly negative overall. The characteristics ascribed to AI significantly influence consumer responses. Negative responses are stronger when AI is described as having high autonomy (d = -0.28), inferior performance (d = -0.53), lacking human-like cues (anthropomorphism) (d = -0.23), and not recognizing the user's uniqueness (d = -0.24). Conversely, limiting AI autonomy, highlighting superior performance, incorporating anthropomorphic cues, and emphasizing uniqueness recognition can alleviate AI aversion. The methodology used to study AI aversion impacts the findings. Studies with greater ecological validity, such as field studies, those using incentive-compatible designs, perceptually rich stimuli, clear explanations of AI, and behavioral (rather than self-report) measures, document significantly smaller aversion towards AI. This suggests that some documented resistance in purely hypothetical lab settings might be an overestimation of real-world aversion.

  40. 61

    CSET: Putting Explainable AI to the Test – A Critical Look at Evaluation Approaches

    Summary of https://cset.georgetown.edu/publication/putting-explainable-ai-to-the-test-a-critical-look-at-ai-evaluation-approaches/ This Center for Security and Emerging Technology issue brief examines how researchers evaluate explainability and interpretability in AI-enabled recommendation systems. The authors' literature review reveals inconsistencies in defining these terms and a primary focus on assessing system correctness (building systems right) over system effectiveness (building the right systems for users). They identified five common evaluation approaches used by researchers, noting a strong preference for case studies and comparative evaluations. Ultimately, the brief suggests that without clearer standards and expertise in evaluating AI safety, policies promoting explainable AI may fall short of their intended impact. Researchers do not clearly differentiate between explainability and interpretability when describing these concepts in the context of AI-enabled recommendation systems. The descriptions of these principles in research papers often use a combination of similar themes. This lack of consistent definition can lead to confusion and inconsistent application of these principles. The study identified five common evaluation approaches used by researchers for explainability claims: case studies, comparative evaluations, parameter tuning, surveys, and operational evaluations. These approaches can assess either system correctness (whether the system is built according to specifications) or system effectiveness (whether the system works as intended in the real world). Research papers show a strong preference for evaluations of system correctness over evaluations of system effectiveness. Case studies, comparative evaluations, and parameter tuning, which are primarily focused on testing system correctness, were the most common approaches. In contrast, surveys and operational evaluations, which aim to test system effectiveness, were less prevalent. Researchers adopt various descriptive approaches for explainability, which can be categorized into descriptions that rely on other principles (like transparency), focus on technical implementation, state the purpose as providing a rationale for recommendations, or articulate the intended outcomes of explainable systems. The findings suggest that policies for implementing or evaluating explainable AI may not be effective without clear standards and expert guidance. Policymakers are advised to invest in standards for AI safety evaluations and develop a workforce capable of assessing the efficacy of these evaluations in different contexts to ensure reported evaluations provide meaningful information.

  41. 60

    Harvard Business School: The Value of Open Source Software

    Summary of https://www.hbs.edu/ris/Publication%20Files/24-038_51f8444f-502c-4139-8bf2-56eb4b65c58a.pdf Investigates the economic value of open source software (OSS) by estimating both the supply-side (creation cost) and the significantly larger demand-side (usage value). Utilizing unique global data on OSS usage by firms, the authors calculate the cost to recreate widely used OSS and the replacement value for firms if OSS did not exist. Their findings reveal a substantial multi-trillion dollar demand-side value, far exceeding the billions needed for recreation, highlighting OSS's critical, often unmeasured, role in the modern economy. The study also examines the concentration of value creation among a small percentage of developers and the distribution of OSS value across different programming languages and industries. This study estimates that the demand-side value of widely-used open source software (OSS) is significantly larger than its supply-side value. The researchers estimate the supply-side value (the cost to recreate the most widely used OSS once) to be $4.15 billion, while the demand-side value (the replacement value for each firm that uses the software and would need to build it internally if OSS did not exist) is estimated to be much larger at $8.8 trillion. This highlights the substantial economic benefit derived from the reuse of OSS by numerous firms. The research reveals substantial heterogeneity in the value of OSS across different programming languages. For example, in terms of demand-side value, Go is estimated to be more than four times the value of the next language, JavaScript, while Python has a considerably lower value among the top languages analyzed. This indicates that the economic impact of OSS is not evenly distributed across the programming language landscape. The study finds a high concentration in the creation of OSS value, with only a small fraction of developers contributing the vast majority of the value. Specifically, it's estimated that 96% of the demand-side value is created by only 5% of OSS developers. These top contributors also tend to contribute to a substantial number of repositories, suggesting their impact is broad across the OSS ecosystem. Measuring the value of OSS is inherently difficult due to its non-pecuniary (free) nature and the lack of centralized usage tracking. This study addresses this challenge by leveraging unique global data from two complementary sources: the Census II of Free and Open Source Software – Application Libraries and the BuiltWith dataset, which together capture OSS usage by millions of global firms. By focusing on widely-used OSS, the study aims to provide a more precise understanding of its value compared to studies that estimate the replacement cost of all existing OSS. The estimated demand-side value of OSS suggests that if it did not exist, firms would need to spend approximately 3.5 times more on software than they currently do. This underscores the massive cost savings and productivity enhancement that the existence of OSS provides to the economy. The study argues that recognizing this value is crucial for the future health of the digital economy and for informing policymakers about the importance of supporting the OSS ecosystem.

  42. 59

    Hoover Institution: The Artificially Intelligent Boardroom

    Summary of https://www.hoover.org/sites/default/files/research/docs/cgri-closer-look-110-ai.pdf Examines the potential impact of artificial intelligence on corporate boardrooms and governance. It argues that while AI's influence on areas like decision-making is acknowledged, its capacity to reshape the operations and practices of the board itself warrants greater attention. The authors explore how AI could alter board functions, information processing, interactions with management, and the role of advisors, while also considering the challenges of maintaining board-management boundaries and managing information access. Ultimately, the piece discusses how AI could transform various governance obligations and presents both the benefits and risks associated with its adoption in the boardroom. AI has the potential to significantly transform corporate governance by reshaping how boards function, process information, interact with management and advisors, and fulfill specific governance obligations. Boards are already aware of AI's potential, ranking its increased use across the organization as a top priority. AI can reduce the information asymmetry between the board and management by increasing the volume, type, and quality of information available to directors. This allows boards to be more proactive and less reliant on management-provided information, potentially leading to better oversight. AI tools can enable directors to search and synthesize public and private information more easily. The adoption of AI will significantly increase the expectations and responsibilities of board members. Directors will be expected to spend more time preparing for meetings by reviewing and analyzing a greater quantity of information. They will also be expected to ask higher-quality questions and provide deeper insights, leveraging AI tools for analysis and benchmarking. AI can enhance various governance functions, including strategy, compensation, human capital management, audit, legal matters, and board evaluations. For example, AI can facilitate richer scenario planning, provide real-time compensation benchmarking, identify skills gaps in human capital, detect potential fraud, monitor legal developments, and analyze board effectiveness. This may also lead to a supplementation or replacement of work currently done by paid advisors. The integration of AI into the boardroom also presents several risks and challenges, including maintaining the separation of board and management responsibilities, managing information access, ensuring data security, addressing the potential for errors and biases in AI models, and avoiding "analysis paralysis". Boards will need to develop new protocols and skills to effectively utilize AI while mitigating these risks.

  43. 58

    Harvard Business School: Why Most Resist AI Companions

    Summary of https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5097445 This working paper by De Freitas et al. investigates why people resist forming relationships with AI companions, despite their potential to alleviate loneliness. The authors reveal that while individuals acknowledge AI's superior availability and non-judgmental nature compared to humans, they do not consider AI relationships to be "true" due to a perceived lack of essential qualities like mutual caring and emotional understanding. Through several studies, the research demonstrates that this resistance stems from a belief that AI cannot truly understand or feel emotions, leading to the perception of one-sided relationships. Even direct interaction with AI companions only marginally increases acceptance by improving perceptions of superficial features, failing to alter deeply held beliefs about AI's inability to fulfill core relational values. Ultimately, the paper highlights significant psychological barriers hindering the widespread adoption of AI companions for social connection. People exhibit resistance to adopting AI companions despite acknowledging their superior capabilities in certain relationship-relevant aspects like availability and being non-judgmental. This resistance stems from the belief that AI companions are incapable of realizing the essential values of relationships, such as mutual caring and emotional understanding. This resistance is rooted in a dual character concept of relationships, where people differentiate between superficial features and essential values. Even if AI companions possess the superficial features (e.g., constant availability), they are perceived as lacking the essential values (e.g., mutual caring), leading to the judgment that relationships with them are not "true" relationships. The belief that AI companions cannot realize essential relationship values is linked to perceptions of AI's deficiencies in mental capabilities, specifically the ability to understand and feel emotions, which are seen as crucial for mutual caring and thus for a relationship to be considered mutual and "true". Physical intimacy was not found to be a significant mediator in this belief. Interacting with an AI companion can increase willingness to engage with it for friendship and romance, primarily by improving perceptions of its advertised, more superficial capabilities (like being non-judgmental and available). However, such interaction does not significantly alter the fundamental belief that AI is incapable of realizing the essential values of relationships. The mere belief that one is interacting with a human (even when it's an AI) enhances the effectiveness of the interaction in increasing acceptance. The strong, persistent belief about AI's inability to fulfill the essential values of relationships represents a significant psychological barrier to the widespread adoption of AI companions for reducing loneliness. This suggests that the potential loneliness-reducing benefits of AI companions may be difficult to achieve in practice unless these fundamental beliefs can be addressed. The resistance observed in the relationship domain, where values are considered essential, might be stronger than in task-based domains where performance is the primary concern.

  44. 57

    Center for AI Policy: US Open-Source AI Governance – Balancing Ideological and Geopolitical Considerations with China Competition

    Summary of https://cdn.prod.website-files.com/65af2088cac9fb1fb621091f/67aaca031ed677c879434284_Final_US%20Open-Source%20AI%20Governance.pdf This document from the Center for AI Policy and Yale Digital Ethics Center examines the contentious debate surrounding the governance of open-source artificial intelligence in the United States. It highlights the tension between the ideological values promoting open access and geopolitical considerations, particularly competition with China. The authors analyze various policy proposals for open-source AI, creating a rubric that combines ideological factors like transparency and innovation with geopolitical risks such as misuse and global power dynamics. Ultimately, the paper suggests targeted policy interventions over broad restrictions to balance the benefits of open-source AI with national security concerns, emphasizing ongoing monitoring of technological advancements and geopolitical landscapes. The debate surrounding open-source AI regulation involves a tension between ideological values (innovation, transparency, power distribution) and geopolitical considerations, particularly US-China competition (Chinese misuse, backdoor risks, global power dynamics). Policymakers are grappling with how to reconcile these two perspectives, especially in light of advancements in Chinese open-source AI. Heavy-handed regulation like blanket export controls on all open-source AI models is likely sub-optimal and counterproductive. Such controls would significantly disrupt the development of specific-use applications, have limited efficacy against Chinese misuse, and could undermine US global power by discouraging international use of American technology. More targeted interventions are suggested as preferable to broad restrictions. The paper analyzes policies such as industry-led risk assessments for model release and government funding for an open-source repository of security audits. These approaches aim to balance the benefits of open-source AI with the need to address specific security risks more effectively and with less disruption to innovation. The nature of open-source AI, being globally accessible information, makes it inherently difficult to decouple the US and Chinese ecosystems. Attempts to do so through export controls may have unintended consequences and could be circumvented due to the ease of information transfer. Further research and monitoring are crucial to inform future policy decisions. Key areas for ongoing attention include tracking the performance gap between open and closed models, understanding the origins of algorithmic innovations, developing objective benchmarks for comparing models from different countries, and advancing technical safety mitigations for open models.

  45. 56

    National Security: Superintelligence Strategy

    Summary of https://arxiv.org/pdf/2503.05628 This expert strategy document from Dan Hendrycks, Eric Schmidt and Alexander Wang addresses the national security implications of rapidly advancing AI, particularly the anticipated emergence of superintelligence. The authors propose a three-pronged framework drawing parallels with Cold War strategies: deterrence through the concept of Mutual Assured AI Malfunction (MAIM), nonproliferation to restrict access for rogue actors, and competitiveness to bolster national strength. The text examines threats from rival states, terrorists, and uncontrolled AI, arguing for proactive measures like cyber espionage and sabotage for deterrence, export controls and information security for nonproliferation, and domestic AI chip manufacturing and legal frameworks for competitiveness. Ultimately, the document advocates for a risk-conscious, multipolar strategy to navigate the transformative and potentially perilous landscape of advanced artificial intelligence. Rapid advances in AI, especially the anticipation of superintelligence, present significant national security challenges akin to those posed by nuclear weapons. The dual-use nature of AI means it can be leveraged for both economic and military dominance by states, while also enabling rogue actors to develop bioweapons and launch cyberattacks. The potential for loss of control over advanced AI systems further amplifies these risks. The concept of Mutual Assured AI Malfunction (MAIM) is introduced as a likely default deterrence regime. This is similar to nuclear Mutual Assured Destruction (MAD), where any aggressive pursuit of unilateral AI dominance by a state would likely be met with preventive sabotage by its rivals, ranging from cyberattacks to potential kinetic strikes on AI infrastructure. A critical component of a superintelligence strategy is nonproliferation. Drawing from precedents in restricting weapons of mass destruction, this involves three key levers: compute security to track and control the distribution of high-end AI chips, information security to protect sensitive AI research and model weights from falling into the wrong hands, and AI security to implement safeguards that prevent the malicious use and loss of control of AI systems. Beyond mitigating risks, states must also focus on competitiveness in the age of AI to ensure their national strength. This includes strategically integrating AI into military command and control and securing drone supply chains, guaranteeing access to AI chips through domestic manufacturing and strategic export controls, establishing legal frameworks to govern AI agents, and maintaining political stability in the face of rapid automation and the spread of misinformation. Existing strategies for dealing with advanced AI, such as a completely hands-off approach, voluntary moratoria, or a unilateral pursuit of a strategic monopoly, are flawed and insufficient to address the multifaceted risks and opportunities presented by AI. The authors propose a multipolar strategy based on the interconnected pillars of deterrence (MAIM), nonproliferation, and competitiveness, drawing lessons from the Cold War framework adapted to the unique challenges of superintelligence.

  46. 55

    Monash University: Gen AI in Higher Ed – A Global Perspective of Institutional Adoption Policies and Guidelines

    Summary of https://www.sciencedirect.com/science/article/pii/S2666920X24001516 This paper examines how higher education institutions globally are addressing the integration of generative AI by analyzing the adoption policies of 40 universities across six regions through the lens of the Diffusion of Innovations Theory. The study identifies key themes related to compatibility, trialability, and observability of AI, the communication channels being used, and the defined roles and responsibilities for faculty, students, and administrators. Findings reveal a widespread emphasis on academic integrity and enhancing learning, but also highlight gaps in comprehensive policies and equitable access, offering insights for policymakers to develop inclusive AI integration strategies. Universities globally are proactively addressing the integration of generative AI (GAI) in higher education, primarily focusing on academic integrity, enhancing teaching and learning, and promoting AI literacy. This is evidenced by the emphasis on these themes in the analysis of policies across 40 universities from six global regions. The study highlights that institutions recognize the transformative potential of GAI while also being concerned about its ethical implications and impact on traditional educational values. The study, utilizing the Diffusion of Innovations Theory (DIT), reveals that while universities are exploring GAI's compatibility, trialability, and observability, significant gaps exist in comprehensive policy frameworks, particularly concerning data privacy and equitable access. The research specifically investigated these innovation characteristics in university policies. Although many universities address academic integrity and the potential for enhancing education (compatibility), and are encouraging experimentation (trialability), fewer have robust strategies for evaluating GAI's impact (observability) and clear guidelines for data privacy and equal access. Communication about GAI adoption is varied, with digital platforms being the most common channel, but less than half of the studied universities demonstrate a comprehensive approach to disseminating information and fostering dialogue among stakeholders. The analysis identified five main communication channels: digital platforms, interactive learning and engagement channels, direct and personalized communication channels, collaborative and social networks, and advisory, monitoring, and feedback channels. The finding that not all universities actively use a range of these channels suggests a need for more focused efforts in this area. Higher education institutions are establishing clear roles and responsibilities for faculty, students, and administrators in the context of GAI adoption. Faculty are largely tasked with integrating GAI into curricula and ensuring ethical use, students are responsible for ethical use and maintaining academic integrity, and administrators are primarily involved in policy development, implementation, and providing support. This highlights a structured approach to managing the integration of GAI within the educational ecosystem. Cultural backgrounds may influence the emphasis of GAI adoption policies, with institutions in North America and Europe often prioritizing innovation and critical thinking, while those in Asia emphasize ethical use and compliance, and universities in Africa and Latin America focus on equity and accessibility.This regional variation suggests that while there are common values, the specific challenges and priorities related to GAI adoption can differ based on cultural and socio-economic contexts.

  47. 54

    UNESCO: AI Competency Framework for Students

    Summary of https://unesdoc.unesco.org/ark:/48223/pf0000391105 This UNESCO publication presents a global framework for AI competency in students. Recognizing the increasing role of AI, it argues for proactive education to prepare responsible users and co-creators. The framework outlines twelve competencies across four dimensions: human-centered mindset, ethics of AI, AI techniques and applications, and AI system design, each with three progression levels. It aims to guide educators in integrating AI learning objectives into curricula, emphasizing critical judgment, ethical awareness, foundational knowledge, and inclusive design. The document also discusses implementation strategies, teacher professionalization, pedagogical approaches, and competency-based assessments for AI education. The UNESCO AI competency framework for students aims to equip students with the values, knowledge, and skills necessary to thrive in the AI era, becoming responsible and creative citizens. It is the first global framework of its kind, intended to support the development of core competencies for students to critically examine and understand AI from holistic perspectives, including ethical, social, and technical dimensions. The framework is structured around 12 competencies spanning four dimensions: Human-centred mindset, Ethics of AI, AI techniques and applications, and AI system design, across three progression levels: Understand, Apply, and Create. This structure is designed to provide a spiral learning sequence across grade levels, helping students progressively build a systematic and transferable understanding of AI competencies. The framework is grounded in key principles that include fostering a critical approach to AI, prioritizing human-centred interaction with AI, encouraging environmentally sustainable AI, promoting inclusivity in AI competency development, and building core AI competencies for lifelong learning. It embodies UNESCO's mandate by anchoring its vision of AI and education in principles of human rights, inclusion, and equity. The primary target audience for the AI CFS includes policy-makers, curriculum developers, providers of education programmes on AI for students, school leaders, teachers, and educational experts. The framework is intended to serve as a guide for public education systems to build the competencies required for the effective implementation of national AI strategies and the creation of inclusive, just, and sustainable futures. It is designed as a global reference that needs to be tailored to the diverse readiness levels of local education systems. The framework envisions students as active co-creators of AI and responsible citizens. It emphasizes the importance of critical judgment of AI solutions, awareness of citizenship responsibilities in the era of AI, foundational AI knowledge for lifelong learning, and inclusive, sustainable AI design. Ultimately, the AI CFS aims to prepare students to not only use AI effectively and ethically but also to contribute to shaping its future development and relationship with society.

  48. 53

    PWC: Agentic AI – An Executive Playbook

    Summary of https://media.licdn.com/dms/document/media/v2/D561FAQHEys4iGQj7CA/feedshare-document-pdf-analyzed/B56ZUN7jLFHQAY-/0/1739695481660?e=1743033600&v=beta&t=nLUoVEs06lwzFgHpx8DbIfd6nMyvXem1ZrpqPSChhiA "Agentic AI – the new frontier in GenAI," explores the transformative potential of agentic artificial intelligence, particularly within the realm of generative AI. It highlights how autonomous AI systems, capable of making decisions and acting with limited human input, are evolving through machine learning and multimodal data processing to automate complex tasks and optimize workflows. The text emphasizes the strategic imperative for organizations to adopt this technology early to gain competitive advantages, improve efficiency, enhance customer experiences, and drive revenue growth, providing numerous real-world examples across various industries and business functions. It also discusses key considerations for implementing agentic AI, including strategic planning, technological infrastructure, data readiness, talent acquisition, and ethical implications, alongside a comparison of commercial and open-source tools. Ultimately, the document positions agentic AI as a crucial element for future business success, requiring a strategic vision and commitment to realize its full potential in an increasingly AI-driven world. Agentic AI, with its advanced human-like reasoning and interaction capabilities, is transforming various sectors including manufacturing, healthcare, finance, retail, transportation, and energy. Organisations' AI strategies should leverage multimodal GenAI capabilities while ensuring ethical AI safeguards to drive autonomous process re-engineering and enhanced decision-making across all business areas. When integrated effectively, agentic AI can enhance efficiency, lower costs, improve customer experience, and drive revenue growth. Agentic AI systems possess the capacity to make autonomous decisions and take actions to achieve specific goals with limited or no direct human intervention, exhibiting key aspects like autonomy, goal-oriented behaviour, environment interaction, learning capability, workflow optimisation, and multi-agent and system conversation. The evolution of agentic AI has progressed through the integration of machine learning for data learning and NLP-enabled user interactions, the introduction of multimodality combining various data types for enhanced interactions, and the development of advanced autonomy and real-time interactions enabling human-like reasoning and independent decision-making.

  49. 52

    Harvard Business School: Global Evidence on Gender Gaps and Generative AI

    Summary of https://www.hbs.edu/ris/Publication%20Files/25-023_8ee1f38f-d949-4b49-80c8-c7a736f2c27b.pdf Examines the gender gap in the adoption and usage of generative AI tools across the globe.Synthesizing data from 18 studies involving over 140,000 individuals, the authors reveal a consistent pattern: women are less likely than men to use generative AI. This gap persists even when access to these technologies is equalized, suggesting deeper underlying causes. Analysis of internet traffic data and mobile app downloads further supports these findings, indicating a skewed gender distribution among users of popular AI platforms. The research explores potential mechanisms behind this disparity, such as differences in knowledge, confidence, and perceptions of AI's ethical implications. The authors caution that this gender gap could lead to biased AI systems and exacerbate existing inequalities, emphasizing the need for targeted interventions. The most prominent explanations behind the gender gap in generative AI adoption are: Lower familiarity and knowledge Women consistently report less familiarity with generative AI tools. They are also more likely to report not knowing how to use AI tools. Lower confidence and persistence Women show less confidence in their ability to use AI tools effectively. They are also less persistent when using generative AI, being less likely to attempt prompting multiple times for desired results. Perception of unethical use Women are more likely to perceive the use of AI in coursework or assignments as unethical or as cheating. Mixed perceptions of benefits Studies show mixed results regarding whether men and women equally perceive the benefits and usefulness of generative AI. Some studies indicate women perceive lower productivity benefits and are less likely to see generative AI as useful in job searches or educational settings. No significant differences in trust or risk perception The study indicates that gender differences in generative AI adoption are likely driven by disparities in knowledge, familiarity, and confidence, rather than differences in trust or risk perceptions. There are no statistically significant differences in men and women trusting the accuracy of generative AI, or in expressing concerns about risks such as data breaches or job redundancy.

  50. 51

    UC Berkeley: Responsible Use of Generative AI – A Playbook for Product Managers and Business Leaders

    Summary of https://re-ai.berkeley.edu/sites/default/files/responsible_use_of_generative_ai_uc_berkeley_2025.pdf A playbook for product managers and business leaders seeking to responsibly use generative AI (genAI) in their work and products. It emphasizes proactively addressing risks like data privacy, inaccuracy, and bias to build trust and maintain accountability. The playbook outlines ten actionable plays for organizational leaders and product managers to integrate responsible AI practices, improve transparency, and mitigate potential harms. It underscores the business benefits of responsible AI, including enhanced brand reputation and regulatory compliance. Ultimately, the playbook aims to help organizations and individuals capitalize on genAI's potential while ensuring its ethical and sustainable implementation. GenAI has diverse applications and is used for automating work, generating content, transcribing voice, and powering new products and features. Organizations can use different genAI models. These include off-the-shelf tools, enterprise solutions, or open models, which can be customized for specific needs and products. Adoption of genAI can lead to increased productivity and efficiency. Organizations that address the risks associated with genAI are best positioned to capitalize on the benefits. Responsible AI practices can foster a positive brand image and customer loyalty. There are key risks product managers need to consider when using genAI, especially regarding data privacy, transparency, inaccuracy, bias, safety, and security. There are several challenges to using genAI responsibly, including a lack of organizational policies and individual education, the immaturity of the industry, and the replication of inequitable patterns that exist in society.

Type above to search every episode's transcript for a word or phrase. Matches are scoped to this podcast.

Searching…

We're indexing this podcast's transcripts for the first time — this can take a minute or two. We'll show results as soon as they're ready.

No matches for "" in this podcast's transcripts.

Showing of matches

No topics indexed yet for this podcast.

Loading reviews...

ABOUT THIS SHOW

ibl.ai is a generative AI education platform based in NYC. This podcast, curated by its CTO, Miguel Amigot, focuses on high-impact trends and reports about AI.

HOSTED BY

ibl.ai

CATEGORIES

URL copied to clipboard!