AI 'N Stuff Podcast - All Episodes

14

Geospatial Data Demystified: Satellites, AI, and Earth’s Hidden Data

This week, my guest was Yohan Iddawela. Yohan is a geospatial data scientist at the Asian Development Bank and previously worked for the World Bank. He has a PhD in economic geography from the London School of Economics. In this episode, we talked about all things related to geospatial analysis, including fascinating use cases for geospatial data, the integral role of satellites, how AI and machine learning are helping improve geospatial data quality, and a grab bag of other geospatial topics. Be sure to check out Yohan’s newsletter! It's called Spatial Edge, and you can find it on Substack. It covers all the latest innovations in geospatial analysis. You can view a condensed transcript of this conversation along with relevant links here.

Sep 23, 2024

44m

13

The weird, wonderful AI art of Niceaunties

My guest this episode was Niceaunties, the pseudonym of a Singaporean-based AI artist that uses her cultural heritage and childhood experiences growing up with 11 aunties, plus parents and grandparents, as inspiration for an imaged reality she created called the Auntieverse, short for Auntie Universe. I spoke with nice aunties while she was exhibiting her work at the Zona Maco festival in Mexico City in partnership with the gallery Patricia Conde. This was part of a group show sponsored by Fellowship AI, a collective that helps support AI artists. She also recently completed an online solo show with the Fellowship that included more than 1,000 still images of her own work that she curated, many of them selling through Fellowship’s online platform Daily.xyz. We spoke about her inspiration, the AI tools she uses and how her artistic process has changed over time, and about the criticism of AI art from traditional artists. I had a great time speaking with her and I think you’ll enjoy our conversation. - Aunties Nail Spa video discussed in the intro - Niceaunties on Twitter and Instagram. - Full transcript with links to everything discussed is available here.

Sep 15, 2024

33m

12

Mitigating Catastrophic AI Risk Through Tort Law

Earlier this spring, I had the chance to sit down in person with Professor Gabriel Weil here in New York to discuss his proposal for mitigating catastrophic risk from artificial intelligence. Professor Weil's proposal involves instituting a new punitive damages framework, which would increase defiance to AI companies in near miss scenarios where an AI generated harm was limited in its impact, but could have been catastrophic. Much of our discussion comes from Professor Weil's paper, “Tort law is a tool for mitigating catastrophic risk from artificial intelligence.” Professor Weil is a Professor of Law at Toro University, and his work is now partially funded by Open Philanthropy. We start by discussing the definition of harmful AI activity before walking through a case study to demonstrate how the proposal would work in practice. We also contrast Professor Weil's proposal with the current state of law and talk about some criticisms he's received in his response. I thought it was a fascinating conversation, and I think you will, too.

Jul 1, 2024

45m

11

AI-inflicted harms: Can insurance fill the gaps?

Full show notes are available ⁠here⁠. If you follow AI you’ve probably heard about the growing volume of proposed AI legislation in the U.S. and beyond as well as the increasing number of AI-related cases being brought before the courts. Today’s guest argues there is another industry that is a key in handling AI-inflicted harms. Everyone’s favorite, the insurance industry. Anat Lior is a professor at Drexel University’s Kline School of Law and has written broadly about the intersection of insurance and emerging technologies. In our conversation today we’ll be focusing largely on her paper which appeared in the Harvard Journal of Law and Technology called “Insuring AI: The role of insurance in artificial intelligence regulation.” We discuss insurance’s role in society and its intersection with emerging technology, how insurance can supplement the courts and government regulation, and end with a discussion about specific insurance proposals related to autonomous vehicles. I thought it was a fascinating conversation and I think you’ll enjoy it.

Jun 5, 2024

44m

10

AI's impact on artist creativity and productivity

This week, my guest was Eric Zhou, a PhD student at Boston University researching the impact of generative AI on art and artists. We discussed one of Eric's recent research projects, where he acquired access to a vast dataset of activity on a major online art platform. Eric used this data to assess how adopting generative AI tools impacted both the productivity and creativity of thousands of artists across 18 months, totaling about 4 million artworks. This is an important topic, and there were some pretty interesting findings. I think you'll enjoy the conversation. Full show notes available here.

May 20, 2024

49m

9

Transitioning from scale to efficiency in AI model training

If you follow AI you might have heard the phrase, “scale is all you need.” The idea that to continue to improve the performance of AI systems, all you need is bigger models and more data. But as AI has continued its rapid advancement the tide is starting to shift on that paradigm. Many of the new AI language and image models released in 2024 have been a fraction of the size of the models we saw in early 2023. But even these smaller models are data hungry. That’s where today’s guest comes in. In a widely circulated paper from April of this year, Vishaal Udandarao and his coauthors showed that when it comes to AI image models, while more data is better, it takes an exponential increase in data volume to achieve a linear improvement in model performance. With concerns that AI models have already exhausted much of the easily scrapable data from the web Vishaal’s paper has added fuel to the conversation around how AI progress can continue. Vishaal is a second-year PhD student at the Max Plank Institute at The University of Tuebingen. He’s also affiliated with the European Laboratory for Learning and Intelligent Systems. Vishaal and I talk in detail about his paper’s results and about what solutions might be available to help continue the progress of AI model development by leveraging existing data more efficiently. Full show notes available here.

May 13, 2024

40m

8

Translating endangered languages with off-the-shelf large language models

There are currently 7,000 languages actively spoken in the world and about 40% are endangered, at risk of disappearing forever (see map below, click for a larger version). Can Generative AI systems help us with preservation and education about these languages via translation into English or other high-resource languages? Not today. Current state-of-the-art, off-the-shelf large language models like OpenAI’s GPT-4, Anthropic’s Claude Opus, or Google’s Gemini are able to translate easily between high-resource languages, say translating Spanish to English. But training data for low-resource and endangered languages is sparse and absent from the pre-training data sets used by language models, like the Common Crawl, discussed in last week’s episode. But a team of researchers at Carnegie Mellon University and UC Santa Barbara is trying to solve this problem. They’ve developed LingoLLM, a workflow and pipeline for improving the translation capabilities of large language models for low-resource and endangered languages that don't have much digitized content. Importantly, the workflow doesn’t require any additional training of the language model or special fine-tuning. This week I spoke to Kexun Zhang, a PhD student in computer science at Carnegie Mellon University, who helped lead the first phase of LingoLLM’s development. The LingoLLM workflow automates the creation of a package of linguistic artifacts — like grammar books and a gloss — both of which we talk about during our conversation. This package can then be passed to off-the-shelf language models as part of a structured prompt along with the passage in the low-resource language that needs to be translated. LingoLLM upgrades off the shelf language models from essentially useless in translating low-resource languages to a translation tool that, while not perfect, is still pretty good. Kexun and I talked about how he got interested in linguistics, provide some background about low-resource and endangered languages, and talk in detail about the workflow behind LingoLLM and what challenges remain. I had a great time talking to Kexun, and I think you'll enjoy the conversation.

Apr 10, 2024

38m

7

The 100-billion webpage dataset that powers AI

Full show notes and a transcript are available on 96layers.ai. This week I spoke to Stefan Baack from the Mozilla Foundation about a recent research article he authored on the Common Crawl. The Common Crawl is the name of both a non-profit open-data company founded in 2008 by Gil Elbaz and the name of the associated dataset. The Common Crawl is one of the most important datasets in the Generative AI ecosystem and has been used to train dozens of large language models. To give a sense of just how large Common Crawl, every month it collects 3 to 5 billion webpages, 500 times more webpages than all of the articles on Wikipedia. The associated size of these monthly datasets is around 90 Terabytes, 4,000 times as large as all of the text on Wikipedia. Over its 17 year history Common Crawl has collected more than 250 billion webpages. Stefan is a researcher and data analyst at the Mozilla Foundation’s Insights Team. He completed his PhD at the Research Center for Media and Journalism studies at the University of Grow Knee In, where he wrote a dissertation about the relationship between data journalism and civic tech. Stefan and I spoke about how Common Crawl decides what webpages to collect, about its founder Gil Elbaz and his philosophy of building neutral data companies, about how AI builders utilize and filter Common Crawl, and about how pre-training influences large language model behavior and biases.

Apr 2, 2024

42m

6

Can ChatGPT be CEO?

Can ChatGPT be CEO? Can a robot buy a house? Could an AI produce a Hollywood blockbuster? Professor Shawn Bayern thinks the answer to these questions is yes. Professor Bayern is a legal scholar and professor of law at Florida State University who has written a book called "Autonomous Organizations." In his book Professor Bayern outlines his argument that under today’s legal regime an AI could be setup to govern a Limited Liability Company, or LLC, the most popular type of business arrangement in the U.S. In today’s discussion we focus on AI, but Professor Bayern’s proposal also covers other kinds of non-traditional arrangements like Decentralized Autonomous Organizations, or DAOs, favored by many crypto enthusiasts. Professor Bayern’s path to create an autonomous organization, also called an autonomous business entity, works like this. A human sets up a single-member LLC. The human then creates an operating agreement that dictates the decisions of the LLC are to be made by a software program, like an AI. The human then dissociates from the LLC leaving the AI in charge without internal human governance. The AI is then free to engage in any activities an LLC can legally undertake such as buying property or being party to a contract. In our conversation we cover the basics of LLCs, some examples of what autonomous organizations might do in practice, corporate personhood, the details and limitations of Professor Payern’s plan, and how regulation and legislation still provides an opportunity for oversight. Along the way Shawn touches on some objections to his proposal and his response. I thought it was a fascinating conversation and I think you will too. A full transcript of this episode is available on 96layers.ai.

Mar 17, 2024

42m

5

A chatbot defamed you. Now what?

To learn more about AI generated defamation, I spoke with Professor Nina Brown from Syracuse University. Nina graduated from Cornell Law School and spent several years as a practicing attorney before joining Syracuse's Newhouse School of Public Communications. She now focuses on teaching communications law. Last year, Nina wrote an article with a delightful title, “Bots Behaving Badly A Products Liability Approach to Chatbot Generated Defamation.” Her article appeared in an edition of the Journal of Free Speech Law, which focused on speech law surrounding new generative AI technologies. Our conversation starts with a brief introduction into defamation, before we spend 30 minutes walking through a case study to explore how current defamation laws might apply to new generative AI technologies. I learned a lot, and I think you will too.

Jan 22, 2024

38m

4

Will AI ever become a "person?"

Full show notes for this episode can be found here: https://www.96layers.ai/p/will-ai-ever-become-a-person Have you ever considered what it truly means to be a person? I don't mean biologically, but from a philosophical standpoint, like what really defines personhood is a person. Someone that has common sense and can think and reason at a high level. Could a person be defined by having a distinct, consistent personality, or is it rooted in social interactions, like being accountable to others? As ChatGPT and other large language models have continued to advance, some have asked whether these new AI systems might be considered persons. Earlier this year, the Los Angeles Times published an article titled is it time to start Considering personhood rights for AI chatbots? And even if the answer is no for current AI systems, might we reach a point where we're forced to recognize an AI as a person in its own right? To help answer these questions, I spoke with Jake Browning, a visiting scientist at New York University's computer science department. Jake received his PhD in philosophy from The New School and has written extensively on the philosophy of artificial intelligence and large language models. I found Jake's ideas on AI personhood thought provoking, and I think you will too.

Dec 18, 2023

49m

3

Tracing AI Data Origins

Let's say you're on the edge of developing an awesome new AI language model. But here's a critical question – how do you ensure that your use of training data aligns with its licensing terms? How do you even find out what the licensing terms of that data are? Here’s another question: how do you find out where the dataset came from and what's inside? And how do you prevent the dataset from introducing bias and toxicity into your model? These are some of the key questions we're discussing in this week’s episode. I spoke with Robert Mahari and Shane Longpre from the Data Provenance Initiative, a research project and online tool that helps researchers, startups, legal scholars, and other interested parties track the lineage of AI fine-tuning datasets. Shane and Robert are both PhD candidates at MIT’s Media Lab, and Robert is also a J.D. candidate at Harvard Law School.

Dec 12, 2023

37m

2

AI and "Artificial Humanities"

This week I talked to AI researcher Nina Beguš. Nina completed her PhD in Comparative Literature at Harvard University where she began creating a new practice called “Artificial Humanities,” the idea that history, literature, film, myth, and other humanities can help add depth to AI development, including in the design and engineering process. Nina is currently a postdoctoral researcher at Cal Berkeley’s Center for Science, Technology, Medicine, & Society. We had a wide ranging conversation including Nina’s early experiences with art and literature while growing up in Slovenia, AI and chess, large language model’s impact on writing, AI and human interpretations of the pygmalion myth — an area Nina has researched in depth — and more about Nina’s goal of an Artificial Humanities research agenda. For those who enjoyed this conversation you may be interested to know that Nina has a book coming out in 2024 called, “Artificial Humanities: A Fictional Perspective on Language in AI,” so be on the lookout for that. A transcript and show notes are available here. I have augmented the transcript with an extensive set of notes, links, videos, pictures, and maps.

Nov 17, 2023

41m

1

Responsible AI in Africa

This week I stopped by De Montfort University in Leicester, England to speak with Dr. Kutoma Wakunuma, an expert on Responsible AI in Africa. We discussed opportunities and challenges, the importance of gender equality, and Ubuntu and Ujamma philosophies. Full show notes are available here.

Oct 24, 2023

45m