
All Episodes - Open||Source||Data
What can we learn from ai-native development through stimulating conversations with developers, regulators, academics and people like you that drive forward development, seek to understand impact, and are working to mitigate risk in this new world? Join Charna Parkey and the community shaping the future of open source data, open source software, data in AI, and much more.
View Podcast Details101 Episodes

How Open Data and AI Are Transforming Environmental Monitoring | Gracie Ermi
Machine learning scientist Gracie Ermi joins Charna Parkey to explore how AI and open-source satellite data are changing the way we understand land use, climate impact, and environmental risk. At Impact Observatory, she helps create high-resolution, publicly available maps used by educators, researchers, and global organizations alike. A conversation about the technical challenges behind these tools, what open access really looks like in practice, and the role AI plays in making environmental data faster and more useful. QuotesCharna Parkey“One of the most exciting things about where AI is headed is that we’re finally expanding its use beyond language. Gracie’s work is a prime example of how machine learning can interpret physical space, detect environmental change, and deliver insights that matter. It’s a reminder that AI isn't just a chatbot—it’s a tool to see, sense, and protect the planet.” Gracie Ermi“The biggest innovation we need right now isn’t necessarily a new AI model. It’s better, cheaper satellite imagery—especially higher-resolution data that’s still open access. Right now, we’re working mostly with Sentinel imagery, which has a 10-meter resolution. That’s great for a lot of things, but it limits what you can detect. Individual buildings, small changes—they get lost at that scale. If higher-res data became more affordable or openly available, it would change everything.” Timestamps00:00:00 – Introduction to Gracie Ermi and Impact Observatory’s mission using AI and open data for environmental monitoring.00:02:00 – Gracie shares how she discovered computer science and open source, and how that shaped her interest in using tech for impact.00:04:00 – Why Gracie chose to work at a mission-driven organization that prioritizes open access and environmental good.00:06:00 – Real-world uses of Impact Observatory’s open-source maps00:08:00 – Challenges around tracking open-source usage and the tension between openness and attribution in the ecosystem.00:10:00 – How AI speeds up the creation of land-use maps00:12:00 – Discussion on classical computer vision versus GenAI in geospatial work00:14:00 – The technical limitations of current satellite imagery, particularly resolution and frequency, and how they affect output.00:16:00 – Ethical considerations of increasing image resolution and what it might mean for privacy and surveillance.00:18:00 – Reflections on unexpected risks and consequences that come with technological advancement in mapping.00:24:00 – Advice for people with nontraditional backgrounds who want to enter AI or conservation tech.00:26:00 – How Gracie uses GenAI tools like ChatGPT to overcome creative friction and emotional resistance to complex tasks.00:28:00 – How large language models might help make geospatial tools more accessible, and what’s next for the field.

Multi-Agent Systems and Human-Agent Collaboration | Rodrigo Nader
In this episode, Charna Parkey welcomes Rodrigo Nader, the founder of Langflow, an open-source, low-code app builder for multi-agent AI systems. Rodrigo and Charna dive into his beginnings in a small Brazilian town to the future of AI and the emergence of multi-agent systems. Discover how these systems will enable human-agent collaboration, increase productivity, and solve complex problems across various industries.---TIMESTAMPS00:01:00 Introduction to Rodrigo Nader, CEO and founder of Langflow, and an overview of Langflow's mission and recent developments.00:03:00 - Rodrigo Nader's background and journey into open-source, data science, and machine learning, including his early experiences with MIT OpenCourseWare and Kaggle.00:06:00 - Rodrigo's work at Bitvore Corp, focusing on structuring financial data using machine learning, and his introduction to the open-source AI ecosystem.00:10:00 - The inspiration behind Langflow, including the idea of connecting multiple AI models to create a more powerful, trainable system.00:15:00 - Discussion on the evolution of AI agents, their decision-making capabilities, and the future of multi-agent systems.00:18:00 -The role of agents in AI development, the democratization of AI tools, and the potential for community-driven innovation.00:22:00 -The importance of multi-agent collaboration and the future of human-AI interaction in productivity and task management.00:26:00 - Common use cases for Langflow, including language model pipelines, RAG (Retrieval-Augmented Generation), and agentic systems.00:30:00 - Challenges in AI development, particularly debugging and prompt engineering, and the need for better tools to visualize and monitor AI systems.00:34:00 - Predictions for the future of AI in 2025, including the rise of specialized agents and the importance of human feedback in AI training.00:38:00 - Rodrigo's personal interests outside of AI, particularly his fascination with physics, quantum mechanics, and the concept of time.00:42:00 - Final thoughts on the democratization of AI tools, the importance of community contributions, and advice for aspiring developers and AI enthusiasts.00:46:00 - Reflections with executive producer Leo Godoy, discussing the impact of Langflow, the differences between traditional and AI development, and the rapid pace of AI evolution.QuotesCharna Parkey"For any developer who has sort of avoided the soft skills, the managerial skills, et cetera, you should go listen to some of those courses. You are now going to be managing this AI workforce that you really do need to treat like a team of interns that you're delegating work to, that you're giving feedback on, and all of those skills of sort of like more senior-level engineering of design reviews, code reviews, feedback, like that's gonna be more central than actually writing a line of code yourself."Rodrigo Nader"We're going to see millions and millions more agents than humans very soon, right? So we don't think that these agents are going to emerge from, one, only developers, meaning like hard-code developers, neither from big companies creating solutions that will suddenly solve all the problems."

Why AI Can’t Scale Without Infrastructure Fixes | Darrick Horton
From energy bottlenecks to proprietary GPU ecosystems, the CEO of TensorWave, Darrick Horton explains why today’s AI scale is unsustainable—and how open-source hardware, smarter networking, and nuclear power could be the fix.QUOTESDarrick Horton“The energy crisis is getting worse every day. It’s very hard to find data center capacity—especially capacity that can scale. Five years ago, 10 or 20 megawatts was considered state-of-the-art. Now, 20 is nothing. The real hyperscale AI players are looking at 100 megawatts minimum, going into the gigawatt territory. That’s more than many cities combined just to power one cluster.”Charna Parkey“We’re still training models in a very brute-force way—throwing the biggest datasets possible at the problem and hoping something useful emerges. That’s not sustainable. At some point, we have to shift toward smarter, more intentional training methods. We can’t afford to be wasteful at this scale.”TIMESTAMPS[00:00:00] Introduction[00:01:00] Founding TensorWave[00:04:00] AMD as a Viable Alternative[00:08:00] Open Source as a Startup Enabler[00:09:30] Launching ScalarLM[00:12:00] ScalarLM Impact and Reception[00:14:30] Roadmap for 2025[00:16:00] Technical Advantages of AMD[00:18:00] Emerging Open Source Infrastructure[00:20:00] Broader Societal Issues AI Must Address[00:22:00] AI’s Impact on Global Energy[00:26:00] Fundamental Hardware vs. Human Efficiency[00:30:00] Data Center Density Evolution[00:34:00] Advice to Founders and Tech Trends[00:38:00] AI Energy Challenges[00:44:00] AI’s Rapid Impact vs. Internet[00:46:00] Monopoly vs. Democratization in AI[00:50:00] Close to Season Wrap Discussion and Predictions

Building Open-Source LLMs with Philosophy | Anastasia Stasenko
Join Charna Parkey as she welcomes Anastasia Stasenko, CEO and co-founder of pleias, through her unique journey from philosophy to building open-source, energy-efficient LLMs. Discover how pleias is revolutionizing the AI landscape by training models exclusively on open data and establishing a precedent for ethical and socially acceptable AI. Learn about the challenges and opportunities in creating multilingual models and contributing back to the open-source community. QUOTES[00:00:00] Introducing Anastasia and pleias[00:02:00] From Philosophy to AI[00:06:00] The Problem of Generic Models[00:10:00] Open Weights vs. Open Source vs. Open Science[00:14:00] Why Open Data Matters[00:18:00] High-Quality, Specialized Models[00:22:00] Multilingual Challenges[00:26:00] Global Inclusion Requires Small Models[00:30:00] Using and Contributing to Wikidata[00:38:00] The Future: Specialized Models[00:48:00] Advice for Newcomers[00:54:00] Cultural Sensitivity and Data Representation[00:50:00] Leo’s Takeaways[00:52:00] Charna on Ethical, Verifiable AI[00:54:00] Representation vs. Exclusion[00:56:00] Letting People Be More Human[00:57:30] Applied, Transformative AIQUOTESCharna:"If you didn’t make it represented in the data, then we’re leaving another culture behind... So which one are you wanting to do, misrepresent them or just completely leave them behind from this technical revolution?"Anastasia:"The real issue now is that the lack of diversity in the current AI labs leads to the situation where all LLMs look alike."Anastasia:"Being able to design, to find, and also to create the appropriate data mix for large language models is something that we shouldn't really forget about when we talk about the success of what large language models are."

Democratizing Cloud Infrastructure | Kevin Carter
Discover how Rackspace Spot is democratizing cloud infrastructure with an open-market, transparent option for cloud servers. Kevin Carter, Product Director at Rackspace Technology, discusses Rackspace Spot's hypothesis and the impact of an open marketplace for cloud resources. Discover how this novel approach is transforming the industry. TIMESTAMPS[00:00:00] – Introduction & Kevin Carter’s Background[00:02:00] – Journey to Rackspace and Open Source[00:04:00] – Engineering Culture and Pushing Boundaries[00:06:00] – Rackspace Spot and Market-Based Compute[00:08:00] – Cognitive vs. Technical Barriers in Cloud Adoption[00:10:00] – Tying Spot to OpenStack and Resource Scheduling[00:12:00] – Product Roadmap and Expansion of Spot[00:16:00] – Hardware Constraints and Power Consumption[00:18:00] – Scrappy Startups and Emerging Hardware Solutions[00:20:00] – Programming Languages for Accelerators (e.g., Mojo)[00:22:00] – Evolving Role of Software Engineers[00:24:00] – Importance of Collaboration and Communication[00:28:00] – Building Personal Networks Through Open Source[00:30:00] – The Power of Asking and Offering Help[00:34:00] – A Question No One Asks: Mentors[00:38:00] – The Power of Educators and Mentorship[00:40:00] – Rackspace’s OpenStack and Spot Ecosystem Strategy[00:42:00] – Open Source Communities to Join[00:44:00] – Simplifying Complex Systems[00:46:00] – Getting Started with Rackspace Spot and GitHub[00:48:00] – Human Skills in the Age of GenAI - Post Interview Conversation[00:54:00] – Processing Feedback with Emotional Intelligence[00:56:00] – Encouraging Inclusive and Clear Collaboration QUOTESCHARNA PARKEY“If you can’t engage with this infrastructure in a way that’s going to help you, then I guarantee you it’s not up to par for the direction that we’re going. [...] This democratization — if you don’t know how to use it — it’s not doing its job.”KEVIN CARTER“Those scrappy startups are going to be the ones that solve it. They’re going to figure out new and interesting ways to leverage instructions. [...] You’re going to see a push from them into the hardware manufacturers to enhance workloads on FPGAs, leveraging AVX 512 instruction sets that are historically on CPU silicon, not on a GPU.”

AI and the Future of Media Consumption | Pete Pachal
In this episode of Open Source Data, Charna Parkey interviews Pete Pachal, founder of The Media Copilot. With over two decades of experience covering technology, Pete shares his insights on how AI is transforming media, journalism and discusses how journalists can embrace AI as a tool to enhance their work to adapt and thrive in this new environment. QUOTESPETE PACHAL: AI is something that you control. I know, it feels like it's a wave that's coming over that it's unstoppable, inevitable. And that's true to a large extent. But at the same time, it's not, there's no there, right? There's no spark, there's no intent. (...) Never relinquish your role as the ultimate creator and person responsible for what's coming out of this thing.CHARNA PARKEY: I think that there was a point where I found myself shifting more away from media and towards individual curated newsletters because like subject matter experts in that area, I could be like maybe they're going to summarize it incorrectly, et cetera. But at least I know my theory of mind of that individual. And then when I expand that to media, I don't know who's writing what and who's shadow writing what for who.TIMESTAMPS00:00:00 - Introduction of Pete Pachal and his background in journalism and AI.00:02:00 - Pete’s career journey, including his work at CoinDesk and founding The Media Copilot.00:04:00 - AI training for media professionals (journalists, PR, marketers).00:06:00 - Evolution of AI in journalism: From skepticism to ethical frameworks.00:08:00 - AI in content pipelines: Idea generation vs. post-production tasks.00:10:00 - Open-source builders needing to cater to domain experts (e.g., journalists).00:12:00 - Meta’s removal of fact-checking and its implications.00:16:00 - Public tolerance for AI errors (e.g., Apple’s AI summaries).00:18:00 - Consumer trust shifts away from platforms like Facebook/X.00:22:00 - Ghostwriting vs. authenticity in AI-generated content.00:24:00 - Preference for human-curated newsletters over AI summaries.00:26:00 - AI in news digests (e.g., Perplexity, Alexa).00:28:00 - Publisher AI experiments (Washington Post chatbot, TIME summaries).00:32:00 - AI’s impact on click-through rates and publisher economics.00:34:00 - AI-written articles (e.g., ESPN’s use case) and copyright issues.00:36:00 - Legal battles over AI training data (NYT vs. OpenAI).00:38:00 - Copyright concerns with AI-generated outputs.00:40:00 - AI search tools (Perplexity, ChatGPT) and publisher licensing deals.00:46:00 - The unhealthy impact of social media trends on journalism.00:48:00 - Post-interview discussion: Accountability in AI and media.00:56:00 - Leo’s perspective as a journalist on AI adoption.00:58:00 - Closing thoughts on balancing AI innovation with industry needs.

Your AI Roadmap: Building a Career, Revenue and a Future in AI | Dr. Joan Bajorek
In this episode, Dr. Joan Bajorek—AI entrepreneur, author of Your AI Roadmap, and founder of Clarity AI—joins Charna Parkey to talk about what it really takes to build a future in AI. From career pivots and layoff anxiety to financial transparency and finding joy in your work, Joan shares practical advice and personal stories navigating fear, burnout, and career uncertainty in tech, while staying grounded in purpose, community, and long-term resilience.TIMESTAMPS[00:00:00] — Introduction to Joan Bajorek & Her Work[00:02:00] — Transparency About Finances and Career[00:04:00] — The Taboo Around Talking About Money[00:06:00] — Resilience During Tech Layoffs[00:08:00] — How to Get Credit for Your Work[00:12:00] — Should You Chase an AI Job?[00:14:00] — Career Goals vs. Financial Security[00:16:00] — Translating Academic and Life Skills into Tech[00:18:00] — Defining and Finding Joy in Work[00:20:00] — Multiple Income Streams and Personal Freedom[00:24:00] — AI’s Near-Future Impact on Jobs and Industries[00:26:00] — Data and AI Opportunities in Underexplored Domains[00:34:00] — Creating Scalable, Alternative Income Models[00:36:00] — How Joan Maintains Long-Term Motivation[00:42:00] — Post-Interview DiscussionQUOTESJoan Bajorek"Networking is how I've gotten the best opportunities and jobs of my life... LinkedIn has this research about how after COVID layoffs, 70% of people landed their next job based on an intro."Charna Parkey"I always try to strive for transparency, and I get such mixed results where at work with coworkers, it's absolutely valued. And then there seems to always be some sort of consequences in my personal life."

Cooperative Systems, Data Transparency & Quality and the Year of Small AI | Dr. Jason Corso
Dr. Jason Corso joins Charna Parkey to debate the critical role of data quality, how its transparency shapes AI development and the rise of smaller, domain-specific AI models - making 2025 the year of small, specialized AI. QUOTESCharna Parkey"Knowing the right data is incredibly important, because it'll save you money, but predicting the impact of that data means that you don't have to do the training at all to even directionally know if it's going to work out, right?"Jason Corso "You can't understand and analyze an AI system in the way you can analyze open source software if you don't have access to the data."Timestamps[00:00:00] - Introduction[00:02:00] - Jason Corso’s journey on open source[00:08:00] - The importance of data in AI[00:10:00] - Voxel 51's mission[00:14:00] - The value of open source and the importance of data in AI systems[00:20:00] - Recent discoveries in AI[00:28:00] - The cost of training AI models[00:36:00] - Cooperative AI in healthcare[00:40:00] - Charna Parkey on the impact of AI in education[00:56:00] -The year of small AI

Building the Future of Streaming Data | Alex Gallego
In this episode of Open Source Data, Charna Parkey talks with Alex Gallego, CEO and founder of Redpanda Data, about his journey as a builder, the evolution of Redpanda, and the company's new agent framework for the enterprise. Alex shares insights on low-latency storage, distributed stream processing, and the importance of developer experience to the growth of AI and the Open Source space. Timestamps[00:00:00] Introduction[00:02:00] Alex Gallego talks about his background[00:04:00] Charna Parkey discusses the importance of hands-on experience in learning.[00:06:00] Alex explains the origins of Red Panda and how it emerged from challenges in the streaming space.[00:08:00] Alex details the evolution of Red Panda, its use of C-Star and FlatBuffers, and its low-latency design.[00:11:00] Alex discusses the positioning of Kafka versus Red Panda in the market.[00:20:00] Alex introduces Red Panda's new agent framework and multi-agent orchestration.[00:24:00] Alex explains how Red Panda fits into the evolving landscape of AI-powered applications.[00:30:00] The future of multi-agent orchestration.[00:44:00] Thoughts on AI model training and data retention.[00:46:00] Alex encourages future founders and shares his perspective on risk-taking.[00:50:00] Charna Parkey and Leo Godoy discuss the key takeaways from the conversation with Alex Gallego.[00:52:00] Charna reflects on open source trends and the role of developer experience in adoption.[00:54:00] Charna and Leo talk about the different types of founder journeys and the importance of team dynamQuotes Charna Parkey"For AI, unifying historical and real-time data is critical. If you're just using nightly or monthly data, it doesn’t match the context in which your prediction is being made. So it becomes very important in the future of applying AI because you need to align those things."Alex Gallego"Every app is going to span three layers. The first layer is going to be your operational layer, just like you have to do business right now. Then there always has to be an analytical layer, and the third layer is this layer of autonomy."

What is Neuro-Symbolic AI? | Emin Can Turan
In this episode, we dive deep into the world of neuro-symbolic AI with Emin Can Turan, CEO of Pebbles AI. Learn how this technology combines neuroscience, behavioral economics, and AI to revolutionize B2B go-to-market strategies. Emin explains how neuro-symbolic AI bridges the gap between human logic and machine learning, enabling smarter, context-aware systems that democratize complex workflows for startups and enterprises alike.Timestamps[00:00:00] - Introduction by Charna Parkey and introduction of Emin Can Turan.[00:02:00] - Emin’s journey to AI and his background in go-to-market strategies.[00:06:00] - Emin explains his deep R&D phase and the development of neuro-symbolic AI.[00:08:00] - Emin describes the architecture of their AI system, including neuro-symbolic AI, generative AI, and agentic frameworks.[00:10:00] - Explanation of neuro-symbolic AI and its relevance to domain-specific problems.[00:12:00] - Discussion on the components of go-to-market strategies and the role of psychology and communication.[00:16:00] -The limitations of generative AI and how they applied strict communication tactics.[00:22:00] - Discussion on the importance of contextual science and data insights.[00:24:00] - The three agentic frameworks they use in their system.[00:26:00] - Explanation of how users control the product and the two co-pilots (strategy and execution).[00:36:00] - The ethical implications of AI and the potential for misuse.[00:38:00] - Discussion on the future of AI and the balance between dystopian and hopeful outcomes.[00:40:00] - Emin emphasizes the importance of truth and transparency in AI development.[00:42:00] - Emin shares his personal motivation for building his AI startup.[00:48:00] - Closing remarks and discussion on the user experience of their platform.[00:50:00] - Charna and Leo discuss the connection between Emin's work and the open-source community.QuotesEmin Can Turan"I felt that this was the future and that AI was the only technology that can digitalize this level of complexity for everyone to use. Nothing else could, you know, you can't use normal neural networks to do this. Even generative AI is not sufficient enough."Charna ParkeyI would love to be able to use Gen AI for more personal things. I love technology. I have the Oura Ring. I've got the Apple Watch. I want to feed that data into something that can somehow tell me and others, here's your state of mind. Here's what you're going to be affected by.

How to Empower Non-Technical Teams with Data Insights | Suzanne El-Moursi
Learn how BrightHive's AI-powered platform is democratizing data insights, making them accessible to non-technical teams across organizations. Suzanne El-Moursi discusses the importance of data fluency and how BrightHive is helping businesses harness the power of their data.Timestamps00:00:00 - Introduction and Background00:02:30 - Journey to BrightHive and open source00:06:00 - The evolution of AI and BrightHive's approach00:14:00 - The data problem and the role of AI agents00:22:00 - Building BrightBot with open source frameworks00:26:00 - The future of AI agents and open source00:30:00 - People’s reaction to DeepSeek 00:34:00 - The future of work and AI00:40:00- AI in education and personal growth00:42:00 - Suzanne’s legacy 00:48:00 -Recap and takeaways with producer Leo GodoyQuotesCharna Parkey "Every single innovation comes out of some form of restriction or need. (...) Don't come and say, “oh, what is this? This is terrible”. I heard all kinds of responses to my excitement and to my belief."Suzanne El-Moursi"So if 97% of an organization is data consumers, there are strategists, the marketing analysts, the customer success associates, the managers all across the enterprise, who need to understand the insights in the company's data, in their functions, in their units, so that they can make the next right step for the customer and for their plan."

Open Source AI and Copyright: Building Ethical Models | Kent Keirsey
Kicking off Open Source Data Season 7, Charna Parkey welcomes the CEO and Founder of Invoke, Kent Keirsey to discuss his thoughts on licensing, copyright in generative AI, and the role of communities in building ethical, free-to-use technologies that can democratize technology and inspire global innovation.QuotesKent Keirsey "When we look at open source models, if you just release the weights, and you don't really release information on how the data set was captioned, for example, or how you construct the data set, if you don't really know how it got to the artifact that was released, as a user, you do not understand how it works."Charna Parkey But there's still a lot of claims by big tech right now about how anything on the internet should be fair use for training, even if, you know, it might have its own kind of copyrightTimestamps[00:02:00] - Kent Keirsey on his journey to open source[00:06:00] - Kent Keirsey on the Open Model Initiative (OMI)[00:08:00] -What makes a model truly open source[00:12:00] - The legal landscape of AI and copyright[00:14:00] - Kent Keirsey on the ethical implications of AI training data fair and use and AI development[00:26:00] Creativity, AI tools, personal AI models and recommendation algorithms:[00:32:00] - Kent Keirsey on TikTok and cultural clash:[00:38:00] - AI, self-reflection and a decision-making tool[00:42:00] - The Bria AI partnership[00:52:00] - The future of creativity, AI and Robotics:[01:00:00] - Final thoughts with producer Leo GodoyConnect with Kent KeirseyConnect with Charna Parkey

Building Trust in AI: From Open Source to Global Impact with host, Charna Parkey
Join Charna Parkey as she recaps a transformative year in AI, exploring the delicate balance between innovation and ethics. From open source communities to global regulations, discover how trust, diversity, and collaboration are shaping the future of technology.

AI Regulations in Financial Services with Vinay Kumar
Vinay Kumar discusses the transformation of AI in banking and financial services, addressing challenges and solutions with regulatory compliance and model explainability while addressing the stringent requirements in the financial industry.Episode QuotesVinay Kumar"I always believe in this: you don't need to solve a very large problem. Maybe it will take a lot of time to do that. A lot of resources to do that but something small, which you can have an opportunity to solve that could be very big or a fundamental for quite a bit is fantastic. Think of a scenario where your small fundamental idea is a base for another small fundamental idea for someone else." Charna ParkeyWe also want to ground it a little bit in impact we've been seeing. And I think in the financial, banking, insurance industries it's not, I would say, an even distribution of advancement. Different countries have different regulations and different appetites for risk."Timestamps- [00:00:00] Introduction by Charna Parkey.- [00:01:57] Vinay Kumar begins talking about his journey.- [00:05:27] Discussion on building a search engine for STEM researchers.- [00:07:06] Challenges with early deep learning.- [00:09:55] Conversation shifts to ML observability.- [00:17:06] Discussion on simplifying verticalized AI.- [00:22:30] Impact of large language models (LLMs) on AI.- [00:30:58] Comparison of autonomous cars with AI regulation.- [00:37:58] Vinay mentions his science fiction novels.- [00:42:19] Conversation summary with Producer Leo Godoy.

The importance and the Challenges & Solutions of AI Literacy with Brian Magerko
QuotesBrian Magerko“We're really trying to show that we could co-create experiences with AI technology that augmented our experience rather than served as something to replace us in creative act”.“For every project like [LuminAI], there's a thousand companies out there just trying to do their best to get our money... That's an uncomfortable place to be in for someone who has worked in AI for decades”.“I had no idea what was going to happen kind of in the future. When we started EarSketch... we were advised by a couple of colleagues to not do it. And here we are, having engaged over a million and a half learners globally”.Charna Parkey"I remember the first robot that I built. It was part of the first robotic systems... and watching these machines work with each other was just crazy."“If you're building a product and your goal is to engage underrepresented groups, it is on you to make sure that you're educating the folks in a way that you're trying to reach.”Episode timestamps(01:11) Brian Magerko's Journey into AI and Robotics (05:00) LuminAI and Human-Machine Collaboration in Dance(09:00) Challenges of AI Literacy and Public Perception(17:32) Explainable AI and Accountability (20:00) The Future of AI and Its Impact on Human Interaction (22:10) EarSketch and learning: computing as a meaningful concept (27:18) The need for interdisciplinary collaboration to ensure AI developments are beneficial for society as a whole.(30:02) Brian Magerko's next reshape of the future, better understanding models of collaboration and improvisation between people and computers(35:51) Brian Magerko's advice to researchers based on his own identity and experiences(44:20) Projects and updates related to EarSketch and LuminAI’s improvisation model.(46:24) Backstage with Executive Producer Leo Godoy

Demystifying AI Governance: A Practical Guide for Organizations with Heather Domin
As AI becomes increasingly integrated into business operations, having robust governance structures in place is no longer optional. But what does effective AI governance look like in practice? In this episode, Dr. Heather Domin, a leading expert in AI ethics and governance, breaks down the key components of a successful AI governance framework. Heather guides us through the opportunities and challenges presented by this transformative technology. Learn about the importance of responsible adoption practices, the role of governance structures, the need for ongoing feedback loops and how to align AI initiatives with organizational values, establishing clear accountability, and creating a culture of responsible innovation.Timestamps00:00:00 - 00:01:23 - Introduction00:01:23 - 00:04:30 - Heather Domin's Journey00:09:50 - 00:12:48 - Open Source and AI Ethics00:12:48 - 00:15:25 - Generative AI and Governance00:23:40 - 00:26:22 - Future of Responsible AI Practices00:35:37 - 00:37:31 - Advice for the Audience00:37:31 - 00:46:04 - Reflection on Risk and Hope in AI QuotesHeather Domin"I think that each of us individually can scan our environment and understand, you know, where can I make an impact? What problem can I help solve? What is the next thing that I can really contribute to?""There are absolutely ways to automate, you know, the prompt testing and many of the routine tasks that you want to leverage automation in that way so that you can actually have the humans focus on other, other things so they can focus on the critical thinking and outside the box sort of thinking that we want the humans to be focused on."Charna Parkey"I think that it's a hard for people getting into it for the first time to jump to hope if they've experienced something that they should fear in the past. By that, I mean, groups that have been marginalized by other forms of technology are not going to start hopeful with this new one that is is using their data without their permission..""If for some reason I came to understand in a month what that meant, I should be able to go back and revoke and be like, nope, I actually don't want you to have that anymore. So I think that that would help people feel better." Check Heather's paper: On the ROI of AI Ethics and Governance Investments Connect with HeatherConnect with Charna

Transforming Food Systems with Regenerative AI with Ethan Soloviev
Ethan Soloviev, Chief Innovation Officer at HowGood, reveals how generative AI can revolutionize the food and agriculture industry. Discover the potential of AI to create a regenerative, sustainable, and net-positive food system that benefits the planet and all living beings.Timestamps1. Introduction and Background (00:00:00 - 00:01:16)2. Ethan's Journey (00:01:16 - 00:05:12)3. The Role of Food and Agriculture (00:05:12 - 00:06:52)4. Investment in Regenerative Agriculture and Generative AI (00:06:52 - 00:07:44)5. Levels of AI Impact (00:07:44 - 00:12:42)6. HowGood's Use of AI (00:12:42 - 00:13:20)7. Consumer Impact and Corporate Responsibility (00:13:20 - 00:15:44)8. Future of AI in Food Systems (00:15:44 - 00:20:30)9. Innovative Perspectives on AI Training (00:20:30 - 00:21:10)10. Action models in agriculture, optimizing water and soil use on a larger scale. (00:24:14 - 00:25:28)11. Discussion on integrating human cultural geography into AI models. (00:27:37 - 00:30:00)12. Charna and Ethan discuss procurement decisions and their impact on sustainability. (00:30:20- 00:40:15)13. The ethical implications of AI in corporate and government decision-making. (00:42:01 - 00:54:31)14. Leo brings up the impact of AI on consumers, discussing how AI can change purchasing decisions by highlighting product sustainability. (00:54:40 - 00:55:30)15. Charna elaborates on using AI to understand different business models and how generational changes affect consumer choices. (00:55:47 - 00:57:32) QuotesEthan Soloviev"What if we're using ecological data? What if we're training on trees and insects and animals and whale song? What kind of questions would a gen AI trained on whale song and hummingbird language ask us?"Charna Parkey"If we have this great translator that is Gen AI, we already have text and language to code. We can do code generation. We can already interpret this code and tell me what it's going to do. Take that code to language. Why can't we do that with some of these other senses and these other measurements?"Connect with EthanConnect with Charna

Redefining AI Ethics: The Key Role of Explainability with Beth Rudden
Beth Rudden, recognized as one of the 100 most brilliant leaders in AI ethics, discusses the crucial role of explainability and traceability in building trustworthy AI systems. She shares how Bast AI is using ontologies and knowledge graphs to provide contextual relevance and understanding, enabling humans to fully trust artificial intelligence and how it allows the system to transform fields like education and healthcare.Timestamps00:00:00 - Intro00:02:00 - Beth’s Journey00:19:33 - Ontologies in AI00:21:44 - Data Lineage and Provenance00:32:52 - Open Source Tools00:38:38 - Explainable AI00:44:58- Inspiration from NatureQuotesBeth Rudden: "The best thing that I could tell you that I see is that it's going to shift from more pure mathematical and statistical to much more semantic, more qualitative. Instead of quantity, we're going to have quality."Charna Parkey: "I love that because I've been so mathematical for most of my life. I didn't have a lot of words for the feelings or expressions, right? And so I had sort of this lack of data and the Brené Brown reference you make, like I have many of her books on my shelf and I often pull, I don't even know where it is right now, but the Atlas of the Heart because I am having this feeling and I don't know what it is."LinksConnect with BethConnect with Charna

Eliminating AI Bias Through Inclusive Data Annotation with Andrea Brown
Learn how Andrea Brown, CEO of Reliabl, is revolutionizing AI by ensuring diverse communities are represented in data annotation. Discover how this approach not only reduces bias but also improves algorithmic performance. Andrea shares insights from her journey as an entrepreneur and AI researcher. Episode timestamps(02:22) Andrea's Career Journey and Experience with Open Source (Adobe, Macromedia, and Alteryx)(11:59) Origins of Alteryx's AI and ML Capabilities / Challenges of Data Annotation and Bias in AI(19:00) Data Transparency & Agency(26:05) Ethical Data Practices(31:00) Open Source Inclusion Algorithms(38:20) Translating AI Governance Policies into Technical Controls(39:00) Future Outlook for AI and ML(42:34) Impact of Diversity Data and Inclusion in Open SourceQuotesAndrea Brown"If we get more of this with data transparency, if we're able to include more inputs from marginalized communities into open source data sets, into open source algorithms, then these smaller platforms that maybe can't pay for a custom algorithm can use an algorithm without having to sacrifice inclusion." Charna Parkey“I think if we lift every single platform up, then we'll advance all of the state of the art and I'm excited for that to happen."Connect with AndreaConnect with Charna

Regulation's Role in Driving Responsible AI with Asa Whillock
In this week’s episode, Charna welcomes Asa Whillock, the VP & GM Machine Learning and Artificial Intelligence at Alteryx. Asa shares a surprising perspective on AI regulation, explaining how it sets a baseline for responsible practices. Discover why he believes regulation is crucial in guiding the ethical development and deployment of AI and learn the importance of continuous learning and what the past can teach us about navigating the challenges and opportunities of AI today. Episode timestamps(01:47) Asa Whillock's career journey at market-leading companies and the role of open source in each (Adobe, Macromedia, Alteryx)(04:56) Feature Labs acquisition by Alteryx and its open source roots in democratizing machine learning capabilities(11:00) Survey findings on enterprise board members' perspectives on AI and the need to move beyond policy creation to implementation and governance.(27:00) Applying AI capabilities and decision-making related to AI (30:00) The future of AI predominance, including cost reduction, open source model advancements, and the push for demonstrating business value(43:33) Advice for navigating AI expertise and decision-making, including continuous learning, self-awareness of decision-making models, and acknowledging knowledge limitsQuotesAsa Whillock"I love regulation. I think it's great. And people are like, what? Why would you say that? And the reason why I say that is because I think it puts a floor underneath all of us of what do we think good looks like?"Charna Parkey"I think we need to, as a community, focus on meeting them where they are if we really want the democratization that is promised. Yeah, I don't know any other way to do it."

Transforming Client Experience with AI with Robbi Armstrong
Join Charna Parkey as she interviews Robbi Armstrong, AI Products and Strategy Director at KeyBank. Discover how this $190 billion bank is navigating the rapidly evolving landscape of generative AI, balancing the need for innovation with the challenges of managing risk in a heavily regulated industry. Explore the impact of KeyBank's virtual assistant, MyKey, on client experience. With nearly 70% repeat usage, MyKey seamlessly transfers clients to contact center agents, providing a warm handoff that includes authentication and chat context. Episode Timestamps(02:11): Robbi Armstrong's role at KeyBank and intersection with open source and AI initiatives in the financial industry(04:06): Compliance and regulatory trends in AI for banking(12:10): Organizational Change Management with AI(28:00): Responsible and Ethical AI(37:00): Financial Literacy and AI QuotesRobbi Armstrong“I truly believe that if you are an organization and you are sitting back and you're not organizing a team and you're not organizing a program and you're not learning, you're not looking at education, you're not looking at change management around Gen AI, I don't think you'll be here in two years. I really truly believe that. Because you won't be able to compete."Charna Parkey“I think the democratization is real and I think it's incredibly important because that step in between the domain expert and the technology is very lossy. You know, oftentimes we say, well, if only I had the data to answer your question let me give you a different answer or let me answer it completely and now we can actually put it in the hands of the experts and say, well, oh, then let's go collect that data." LinksConnect with RobbiConnect with Charna

Navigating Open Source Talent, AI & Policy Challenges with Amanda Brock
Amanda Brock's path began with picking potatoes at 8 years old. Now, she's the CEO of OpenUK, advocating for open source across the UK. In this insightful interview, Brock shares her journey into open source law and policy. She dives into OpenUK's latest research on the state of open technology in Britain, talent challenges, and the economic impact of open source contributions. Brock also unpacks key discussions from State of OpenCon 2024 on open data, generative AI, and balanced regulation. Episode timestamps(05:06): State of open source in the UK (07:22): Importance of open source community (15:19): Balancing openness and regulation in AI (21:19): Pace of technological development and regulation(28:21): Reliability and discernment with AI outputs(35:24): Universal advice QuotesAmanda Brock“I think the governments that are going to win, the governments that are going to have the best regulation that promotes most innovation are going to be the ones which are able to make their regulatory environment flow in the same way as the technology evolution and innovation flows."Charna Parkey"I think the expectation needs to change. Part of what has happened with, you know, literal text search or keyword search and just Google and things like that, is that the average person expects what comes back to be relatively factual. That it's been referenced and, you know, backlinked, etc. That's a deterministic system. These are not. These are based upon statistical likelihoods of what word should come next." LinksConnect with CharnaConnect with Amanda

Using AI to Impact Performance Feedback Equity with Tacita Morway
Dive into the world of purposeful AI with Tacita Morway, CTO of Textio. Learn how Textio ensures their AI is built responsibly and ethically to transform the way teams communicate, hire, and measure their health. Discover their rigorous testing processes and the importance of having a diverse team to catch potential risks and how that helps the company develop strategies for avoiding bias and maintaining data privacy.Episode timestamps(02:15): Tacita's unconventional career path to becoming a CTO (07:00): Textio's practices for building AI responsibly and ethically (14:00) The impact of Textio's AI on performance feedback (17:00) The importance of purpose-built vs generic AI models(28:00) Balancing open source and proprietary data/models (42:00) Advice for the AI industry moving forward QuotesTacita Morway“When you've got a team with different backgrounds, educational, lived experiences, identity, careers, all of those things, we have those different perspectives in the room. And we're all working off of the same expectations. We can catch each other's gaps.”Charna Parkey“There's an interesting conversation happening, I think, in the community right now about these purpose-built LLMs. Are they as good as generic LLMs? Sure, certainly if you're not going to apply something purpose-built to something generic or outside of its domain, it is not as good. But I think some of this shows us that unless you have something purpose-built and unless you're leveraging the data in the right way, you may just be feeding noise back into the system.” LinksConnect with TacitaConnect with Charna

The Ethical Path to High-Quality AI Data with Fabiana Clemente
How can we accelerate AI while protecting privacy? Fabiana Clemente discusses founding YData to enable high-quality synthetic data for machine learning. She covers open sourcing data profiling tools, the impact of generative AI on synthetic data, and maintaining work-life balance as an introvert leader.Timestamps(00:02:29) Fabiana's journey starting YData and becoming a public speaker (00:20:19) Misconceptions and hype around generative AI and AGI (00:32:46) Potential real-world impact and use cases of LLMs today (00:34:55) The role of synthetic data in making AI models more robust and fair (00:43:55) Advice for founders: value your time and learn to say no (00:48:24) The importance of technical leaders being able to communicate well QuotesCharna Parkey: "It's a balance. I think that's also what led us to some of the demographic based data science. Essentially, folks were making like event data into pre-aggregated data. And then they were trying to obscure it so much that you couldn't get back to the person. And so you're like, okay, what's their age and what's their gender? And you're like, that's not actually the most useful part of data science that can't predict behavior or intent or any of that. It throws out time as a component of the entire process, seasonality, everything. And so there just, there has to be a better way."Fabiana Clemente: "I have to say, that's a very beautiful way to put it. Hallucinations, I have to say. I never thought about that. And it makes a lot of sense. I do think, though, that in terms of LLMs, it's so language, it's so definitely, it sounds like we are getting very, very intelligent system, exactly, because language is very complex. And we know that was needed for the leap of humanity. I do think there are other, the sense of combining. Well, and here we enter in the multimodal kind of space. It's what's missing." LinksConnect with CharnaConnect with Fabiana

Disrupting Data Analysis with Avi Press
Join host Charna Parkey as she sits down with Scarf’s CEO and Founder Avi Press in a riveting exchange about his pioneering journey into the world of open source with Scarf. Learn how Avi challenges conventional data analytics and collection, aiming to reshape industry standards through the power of open source. A conversation that delves into altering analytics norms, innovative monetization strategies, and the exploration of alternative licenses like BSL. Avi’s insights offer a unique perspective on the transformative role of open source in driving data analytics forward, fostering community engagement, and encouraging transparent development. Episode timestamps(02:15): Challenges of collecting open source usage data(22:06): Driving impact with open source usage data(28:27): Avi's entrepreneurial journey(39:42) Persistence and vision in startups(44:03) Tracking outcomes to stay motivated QuotesAvi Press“I mean, one thing is, for any project that you might be thinking about doing or any initiative that you want to work on or goal that you have, I think there's a lot of power in just trying the thing. You may not have all the details figured out, but just try it anyway and see where it takes you. And I think a lot of projects that I've ever worked on that led anywhere, I didn't know all these details, but I just start trying and seeing what works anyway and being very open to it not working out, but attempting it anyway. And then the other thing, which is I think admittedly fitting into our agenda at Scarf, but it is something that I really believe, which is that for any of these things you're doing, tracking the outcomes of that thing is very, very important and will both be tactically helpful, but also I think, like you said, give you these inspirational moments that keep you going, whether that's awe or inspiration or fulfillment or whatever that feeling is that helps you keep going. I think that tracking the outputs of your work such that you can understand the impact that you have is both very strategic and the most rewarding way to do anything, I think”. Charna Parkey“Given the venture-backed nature of a lot of these startups, there's going to have to be some sort of monetization at some point. You're not gonna have 1 million, 10 million, 40 million dollars dumped into just giving software away for free. So sort of these misaligned motivations are certainly what raised my hackles where I'm like, oh, you're claiming forever or you're claiming that you're like a values-driven organization, but you're venture-backed and you need to make money. And so show me how those motivations align or misalign. Tell me what your monetization strategy is gonna be. I know you need one. That way I'm not wondering, should I use this? Should I not?” LinksConnect with CharnaConnect with Avi

Tech, Trust, and Transformation with Paula Paul
On today’s episode of Open Source Data, Charna Parkey chats with tech veteran Paula Paul, exploring her remarkable 40-year journey in the technology sector. Starting at 16, Paula navigated through pivotal tech revolutions and embraced the essence of open source and community. Delve into Paula's world of coding on tape, the evolution of technology, and how communities foster growth, innovation, and trust. Discover the impact of open source in shaping technology and professional paths. Paula also sheds light on personal growth, community's pivotal role in professional mobility, and offers invaluable advice to aspiring tech professionals. A captivating look at the intersections of technology, community, and open source through the lens of an industry pioneer.Timestamps00:00 - Intro05:10 - Paula’s Professional Journey10:30 - What Inspired Paula to Go Through the Open Source Path14:50 - What are some of the biggest challenges and impacts that Paula sees in companies trying to derive value?23:30 - Is the Tech World a Meritocracy? 25:35 - A Shift Of What is a Tech Company?27:30 - Kids Interacting with New Technologies31:30 - What Does Open Source Data Means to Paula? 42:50 - What is a Question that Paula has never been asked before?47:00 - What Advice would you give to the audience? 51:50 - Backstage with Executive Producer Leo Godoy Quotes:Charna Parkey“I think from my side, as the applications we build change, then some of those backing technologies have to. Where databases used to be used by expert-like database administrators and you needed to have like data architects to your data model and you had to do all of these very, very specific things. And now we have this Gen AI moment and all of a sudden all of these specialized vector databases, NoSQL databases, etc., need to be used by an average developer. So they just want an API and it has to work and it has to be fast. And so, over these different moments, different technologies came about or were evolved, but I think it might be the application that's actually driving the change instead of the technology itself opening”.Paula Paul“It still surprises people to hear that 90% of any given modern application is open source and then there's 10% custom code that, depending on your company, you own or not. And it just still amazes me that we have these open source projects like jQuery is a project of the OpenJS Foundation and it's in a tremendous amount of our ecommerce infrastructure. But it's a project that's maintained by a very small team of contributors. And, you know, if this were a commercial product, it would be like a $1,000,000,000 company. (...) The piece of work being done by the new foundation to help make sure that we have the healthy web and that it's secure is really important, because people, if I say Log4j, people that remember those days know how important it is to keep security vulnerabilities addressed.And that's a concern for me, that people don't pay more attention to this. I mean, if you had a commercial software product, you typically would pay 20% a year in maintenance fees. But as many of us know, sometimes you find a bug and you would just report the bug, but it might take years for that bug to get fixed in a commercial release.Whereas if it's open source, there are people out there who can jump on it. But it's really crazy that there's no funding for that or no public works through the government, given all the dependance and dependencies that we have on these open source assets.” LinksLinkedIn - Connect with CharnaLinkedin - Connect with Paula

An Innovative Approach to AI & NLP with Milos Rusic
Starting the new season of Open Source Data, our new host Charna Parkey welcomes the CEO and Co-founder of deepset, Milos Rusic. With an impressive journey around NLP and AI, pioneering several areas in the Open Source field, Milos has revolutionized data search processes and brought about a new era of user-friendly and efficient enterprise search systems.Charna also shares some common ground with Milos when talking about joining an NLP Startup in 2015-16, predictive maintenance and more.Don’t miss it!
New Beginnings: Open||Source||Data in Transition
This episode features an interview with Charna Parkey, Real-Time AI Product and Strategy Leader at DataStax. Charna has been developing AI and ML products over the last 17 years and has worked with 90 of the Fortune 100 in her various roles. She is also a co-author and inventor on several patents.In this episode, Sam and Charna discuss handing over the role as host, Sam’s new startup journey, and how their thinking has evolved during the explosion of LLMs.-------------------“Now, it seems like we have this opportunity where the conversation and the place that society is at is different. Where we want to contribute to the right set of data when we talk open source data. We want to make sure that we have the right data to train this model in order to get the right outcome. We want to provide a lens of, ‘All right, you are this persona. How would you say this thing?’ I do think that from a lot of what the LLMs have today, the outcome of those words are still missing. And we need to solve that. Like, ‘Is this piece of writing actually going to achieve the outcome I want versus am I following legal's guidelines? Am I technically correct? Is my CEO going to like it?’ That doesn't mean you're achieving impact in the world. There's an aspect there where we've given feedback loops, it seems, to be like, ‘Did I like the answer or not?’ But not, ‘Did I take an action?’ As we get to autonomousness, we're going to have to have an outcome or multiple outcomes associated with the reward of the system.” – Charna Parkey“I personally believe that all cognition is bias. My degree is in cognitive science. One of the things that we trained on is attention. And to pay attention, literally means to selectively choose what data is coming in from the world that you're going to pay attention to and what you're going to discard. Which is also, to me, the definition of bias. All cognition is bias, but what do we care about? Do you trust this thing? What does that mean? Well, do you trust it to do these particular actions to a level of consistency in this particular domain? It doesn't mean that you're going to trust it in all environments. There's a lot more nuance that hopefully will evolve in this strange age of nuanced destruction machines.” – Sam Ramji-------------------Episode Timestamps:(01:04): Sam and Charna catch up (06:05): Sam explains his new company, Sailplane (14:21): How Charna’s thinking has evolved during the LLM explosion(25:45): Sam’s thoughts after 5 seasons of Open||Source||Data(38:52): What Charna is looking forward to in the next season of the podcast(40:44): A question Sam wishes to be asked(45:45): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with CharnaLinkedIn - Connect with SamLearn more about Sailplane
The Intersection of Open Source and AI with Stefano Maffulli & Stephen O’Grady
This episode features a panel discussion with Stefano Maffulli, Executive Director of the Open Source Initiative (OSI); and Stephen O’Grady, Co-founder of RedMonk. Stefano has decades of experience in open source advocacy. He co-founded the Italian chapter of Free Software Foundation Europe, built the developer community of the OpenStack Foundation, and led open source marketing teams at several international companies. Stephen has been an industry analyst for several decades and is author of the developer playbook, The New Kingmakers: How Developers Conquered the World.In this episode, Sam, Stefano, and Stephen discuss the intersection of open source and AI, good data for everyone, and open data foundations.-------------------“Internet Archive, Wikipedia, they have that mission to accumulate data. The OpenStreetMap is another big one with a lot of interesting data. It's a fascinating space, though. There are so many facets of the word ‘data.’ One of the reasons why open data is so hard to manage and hasn't had that same impact of open source is because, like Stephen, the stories that he was telling about the startups having a hard time assembling the mixing and matching, or modifying of data has a different connotation. It's completely different from being able to do the same with software.” – Stefano Maffulli“It's also not clear how said foundation would get buy-in. Because, as far as a lot of the model holders themselves, they've been able to do most of what they want already. What's the foundation really going to offer them? They've done what they wanted. Not having any inside information here, but just judging by the fact that they are willing to indemnify their users, they feel very confident legally in their stance. Therefore, it at least takes one of the major cards off the table for them.” – Stephen O’Grady-------------------Episode Timestamps:(01:44): What open source in the context of AI means to each guest(16:21): Stefano explains OSI’s opportunity to shine a light on models and teams(21:22): The next step of open source AI according to Stephen(25:38): Creating better definitions in order to modify software(33:09): The case of funding an open data foundation(42:31): The future of open source data(51:54): Executive producer, Audra Montenegro's backstage takeaways-------------------Links:LinkedIn - Connect with StefanoVisit Open Source InitiativeLinkedIn - Connect with StephenVisit RedMonk
Throwback: The AI-Native Stack with Mikiko Bazeley, Zain Hasan, and Tuana Celik
This episode features a panel discussion with Mikiko Bazeley, Head of MLOps at Featureform; Zain Hasan, Senior Developer Advocate at Weaviate; and Tuana Celik, Developer Advocate at deepset.In this episode, Mikiko, Zain, and Tuana discuss what open source data means to them, how their companies fit into the AI-first ecosystem, and how jobs will need to evolve with the AI-native stack.-------------------“We're almost part of a fancy new AI robot kitchen that you'd find in Tokyo, in some ways. I see a virtual feature store as, yes, you can have a bunch of your ingredients tossed into a closet. Or, what you can do is you can essentially have a nice way to organize them. You can have a way to label them, to capture information.” – Mikiko Bazeley“I really like that analogy as well. I like how Mikiko put it where a vector search engine is really extracting value from what you've already got. [...] So where I see vector search engines, really, is if we think of these embedding providers as the translators to take all of our unstructured data and bring it into vector space into a common machine language, vector search engines are essentially the workhorses that allow us to compute and search over these objects in vectorized format. They're essentially the calculators of the AI stack.” – Zain Hasan“Haystack, I would really position as the kitchen. I need Mikiko to bring the apples. I need Zain to bring the pears. I need Hugging Face or OpenAI to bring the oranges to make a good fruit salad. But, Haystack will provide the spoons and the pans and the knives to make that into something that works together.” – Tuana Celik-------------------Episode Timestamps:(02:58): What open source data means to the panelists(09:11): What interested the panelists about AI/ML(24:10): Mikiko explains Featureform(27:00): Zain explains Weaviate(30:23): Tuana explains deepset(36:00): The panelists discuss how their companies fit into the AI-first ecosystem(44:58): How jobs need to evolve with the AI-native stack(54:35): Executive producer, Audra Montenegro's backstage takeaways-------------------Links:LinkedIn - Connect with MikikoVisit FeatureformLinkedIn - Connect with ZainVisit WeaviateLinkedIn - Connect with TuanaVisit deepsetVisit Data-centric AI
How We Should Think About Data Reliability for Our LLMs with Mona Rakibe
This episode features an interview with Mona Rakibe, CEO and Co-founder of Telmai, an AI-based data observability platform built for open architecture. Mona is a veteran in the data infrastructure space and has held engineering and product leadership positions that drove product innovation and growth strategies for startups and enterprises. She has served companies like Reltio, EMC, Oracle, and BEA where AI-driven solutions have played a pivotal role.In this episode, Sam sits down with Mona to discuss the application of LLMs, cleaning up data pipelines, and how we should think about data reliability.-------------------“When this push of large language model generative AI came in, the discussions shifted a little bit. People are more keen on, ‘How do I control the noise level in my data, in-stream, so that my model training is proper or is not very expensive, we have better precision?’ We had to shift a little bit that, ‘Can we separate this data in-stream for our users?’ Like good data, suspicious data, so they train it on little bit pre-processed data and they can optimize their costs. There's a lot that has changed from even people, their education level, but use cases also just within the last three years. Can we, as a tool, let users have some control and what they define as quality data reliability, and then monitor on those metrics was some of the things that we have done. That's how we think of data reliability. Full pipeline from ingestion to consumption, ability to have some human’s input in the system.” – Mona Rakibe-------------------Episode Timestamps:(01:04): The journey of Telmai (05:30): How we should think about data reliability, quality, and observability (13:37): What open source data means to Mona(15:34): How Mona guides people on cleaning up their data pipelines (26:08): LLMs in real life(30:37): A question Mona wishes to be asked(33:22): Mona’s advice for the audience(36:02): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with MonaLearn more about Telmai
Throwback: Open Source Innovation, The GPL for Data, and The Data In to Data Out Ratio with Larry Augustin
This episode features an interview with Larry Augustin, angel investor and advisor to early-stage technology companies. Larry previously served as the Vice President for Applications at AWS, where he was responsible for application services like Pinpoint, Chime, and WorkSpaces.Before joining AWS, Larry was the CEO of SugarCRM, an open source CRM vendor. He also was the founder and CEO of VA Linux, where he launched SourceForge. Among the group who coined the term “open source”, Larry has sat on the boards of several open source and Linux organizations.In this episode, Sam and Larry discuss who owns the rights to data, the data in to data out ratio, and why Larry is an open source titan.-------------------"People are willing to give up so much of their personal information because they get an awful lot back. And privacy experts come along and say, ‘Well, you're taking all this personal information’. But then most people look at that and say, ‘But I get a lot of value back out of that.’ And it's this data ratio value question, which is: for a little in, I get a lot back. That becomes a key element in this. And I think there has to be some kind of similar thought process around open source data in general, which is if I contribute some data into this, I'm going to get a lot of value back. So this data in to data out ratio, I think it's an incredibly important one. And it gets everyone in the mindset of, ‘How do I provide more and more and take less and less?’ It's a principle of application development that I like a lot. And I think there's a similar concept here around open source data. Are there models or structures that we can come up with where people can contribute small amounts of data and as a result of that, they get back a lot of value.” – Larry Augustin-------------------Episode Timestamps:(02:52): How Larry is spending his time now after AWS(06:25): What drove Larry to open source(18:41): What is the GPL for data?(24:28): Areas of progress in open source data(28:57): The data in to data out ratio(36:39): Larry’s advice for folks in open source-------------------Links:LinkedIn - Connect with LarryTwitter - Follow Larry
Reframing Machine Learning and AI-Assisted Development with Jorge Torres
This episode features an interview with Jorge Torres, Co-founder and CEO of MindsDB. MindsDB is a virtual AI database that works with existing data to help developers build AI-centered apps. In 2008, Jorge began his work on scaling solutions using machine learning as the first full-time engineer at Couchsurfing, growing the company from a few thousand users to a few million. He has also served a number of data-intensive start-ups and was a visiting scholar at UC Berkeley researching machine learning automation and explainability.In this episode, Sam and Jorge discuss the inspiration and challenges behind MindsDB, classic data science AI versus applied AI, and time series transformers.-------------------“So much data in the world is time series data, so much data. Even data that people don't know is time series, it's time series. So long as it’s moving over time, it is time series data. Whether you store it or not, that's a different thing. For having a pre-trained model on time series data, it even enabled the fact that you don't have to store all the historical data. You can just take the model and start passing data as it comes through, and then you get out the forecast. So you don't even have to have the historical data. All you need to have is the data at that given instance, and you can pass it to the model and you get an output. It's mind blowing.” – Jorge Torres-------------------Episode Timestamps:(05:20): The inspiration behind MindsDB(10:20): Classic data science AI approach vs. applied AI(22:09): What open source data means to Jorge(28:51): What excites Jorge about Nixtla and time series transformers(37:07): A question Jorge wishes to be asked(40:20): Jorge’s advice for the audience(41:38): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with JorgeLearn more about MindsDB open source codeLearn more about MindsDB
A Sam Ramji Feature: The Evolution of Open Source, Kubernetes, and AI's Forward Journey
On this episode, we’ve partnered with the Future Rodeo podcast for a discussion between Sam and Matt Wallace. Matt is the Chief Technology Officer and EVP at Faction, a pioneer of multi-cloud data services, and host of Future Rodeo.In this episode, Sam and Matt discuss Microsoft’s transformation, the impact of Kubernetes on container orchestration, and the rapid acceleration of AI research and development.-------------------Episode Timestamps:(01:38): Microsoft’s open source transformation(13:19): The impact of Kubernetes and how it defragmented the industry(22:06): The transformative power of AI and how it’s changing the value of reasoning(54:58): The concept of cognitive economy and its potential impact on AI and software development(01:03:25): Potential implications of advancements in robotics, AI, and clean energy(01:04:17): Sam’s advice for those entering the industry or choosing a career path-------------------Links:LinkedIn - Connect with MattListen to the Future Rodeo podcast
The Importance of Open Source Data for Generative AI, Now and in the Future with Abby Kearns
This episode features an interview with Abby Kearns, technology executive, board director, and angel investor. Her career has spanned executive leadership, product marketing, product management, and consulting across Fortune 500 companies and startups, including Puppet, Cloud Foundry Foundation, and Verizon. Abby currently serves as a board director for Lightbend, Stackpath, and Invoke. In this episode, Sam sits down with Abby to discuss the betrayal source license, the role open source plays in AI, and empowering trust.-------------------“There's so much happening so quickly that I think open source has the power to help harness a lot of that innovative conversation. In a way that I think it's going to be really, really hard to match in a proprietary way. I think open source and the ability, given the fact that we're talking about AI and data, the two are very interrelated at this point. AI is not super interesting without data. I think the power of open source right now and what's happening, I think it has to happen in open source and I think it really has to have that level of transparency and visibility. But, always the ability for everyone to step up and understand what's happening at this moment in time and shape it.” – Abby Kearns-------------------Episode Timestamps:(00:50): Sam and Abby discuss the betrayal source license(14:12): What open source data means to Abby(23:30): Abby dives into the companies she’s investing in(34:30): How nonprofits can empower trust(38:32): A question Abby wishes to be asked(40:21): Abby’s advice for the audience(43:53): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with AbbyTwitter - Follow AbbyRead Design the Life You Love
The Value of Reproducibility and Ease of AI Deployment with Daniel Lenton
This episode features an interview with Daniel Lenton, Founder and CEO of Ivy, where the team is on a mission to unify the fragmented AI stack. Prior to Ivy, Daniel was a Robotics Research Engineer at Dyson and a Deep Learning Research Scientist for Amazon Prime Air. During his PhD, Daniel explored the intersection between learning-based geometric representations, ego-centric perception, spatial memory, and visuomotor control for robotics.In this episode, Sam and Daniel discuss the inspiration behind Ivy, open source reproducibility, and democratizing AI.-------------------"There's too much amazing stuff going on, from too many different parties. We just want to be the objective source of truth to show you the data and show you where your model will be doing best, and continue to do this as a service or something like this. This is high-level, some of the areas we see and going into, we really want to be a useful tool for anybody that wants to just kind of understand this fragmented complex space quickly and intuitively, and we are trying to be the tool that does that." – Daniel Lenton-------------------Episode Timestamps:(01:00): What open source data means to Daniel(05:37): The challenges of building Ivy(15:37): The future of Ivy(25:19): Who should know about Ivy(28:46): Daniel’s advice for the audience(32:00): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with DanielLearn more about Ivy
ML Engineering Teams and Niche Chat Bot Experiences with Demetrios Brinkmann
This episode features an interview with Demetrios Brinkmann, Founder of the MLOps Community, an organization for people to share best practices around MLOps. Demetrios fell into the Machine Learning Operations world and has since interviewed leading names around MLOps, data science, and machine learning. In this episode, Sam sits down with Demetrios to discuss LLM in production use cases, ML engineering teams, and the LLM Survey Report from the MLOps Community.-------------------"I think the most novel ones that I saw from the survey were when a chat bot would prompt a human as opposed to the human prompting the chat bot. It's almost like you have this LLM coach. And in that way, it's not necessarily like this isn't LLM in production that an end user is getting that's not outside the business or that is outside the business. It's more like internally, you can think about maybe it's an accountant and the accountant is filing my taxes for the year. As they're filing them, the LLM is prompting them on different tax laws that maybe they weren't thinking about or different ways that they could file things." – Demetrios Brinkmann-------------------Episode Timestamps:(04:30): LLMs as the new standard(19:26): Key LLM in production use cases(31:18): What open source data means to Demetrios(34:36): What Demetrios is seeing in open source AI models(42:44): One question Demetrios wishes to be asked(44:41): Demetrios’s advice for the audience(47:19): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with DemetriosRead the LLM Survey ReportListen to The MLOps Podcast
Building With Trust, Inspiration, and Reputation with Jaya Gupta, Yuliia Tkachova, and Omoju Miller
This bonus episode features conversations from season 5 of the Open||Source||Data podcast. In this episode, you’ll hear from Jaya Gupta, Partner at Foundation Capital; Yuliia Tkachova, Co-founder and CEO of Masthead Data; and Omoju Miller, Founder and CEO of Fimio.Sam sat down with each guest to discuss how they are building foundations for trust, inspiration, and reputation as we all race into the AI-centric future.You can listen to the full episodes from Jaya Gupta, Yuliia Tkachova, and Omoju Miller by clicking the links below.-------------------Episode Timestamps:(00:49): Jaya Gupta(01:48): Yuliia Tkachova(03:03): Omoju Miller-------------------Links:Listen to Jaya’s episodeListen to Yuliia’s episodeListen to Omoju’s episode
FMOps and a Founders Automated Future with Jaya Gupta
This episode features an interview with Jaya Gupta, Partner at Foundation Capital, where she leads early-stage investments across the enterprise software stack. Previously, Jaya was a Senior Business Analyst at McKinsey & Company focusing on software diligence and helping startups expand their go-to-market strategies.In this episode, Sam and Jaya discuss her journey to Foundation Model Ops, how software is becoming more accessible, and the democratization of AI tools.-------------------"At the end of the day, FMOps isn't just about the new tools. It's actually more about the new builders, the new workflows, and a completely new market of customers. I was on the other day, looking at LangChain's page of integrations, I don't know if you've seen it, but it's like Anyscale, Databricks, all these other huge legendary companies are integrating with LangChain, and I think it's clear that there's a huge community that is building something real and valuable." – Jaya Gupta-------------------Episode Timestamps:(01:05): What open source data means to Jaya(08:51): Jaya’s journey to Foundation Model Ops(15:58): How software is becoming more accessible(23:04): The democratization of AI tools(27:01): One question Jaya wishes to be asked(29:32): Jaya’s advice for the audience(31:51): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with JayaFollow Jaya on TwitterLearn more about FMOps
Web3 and Putting Reputation on Code with ML with Omoju Miller
This episode features an interview with Omoju Miller, Founder and CEO of Fimio, a web3 reputation company. Originally from Lagos, Nigeria, Omoju holds a doctoral degree in Computer Science Education from UC Berkeley. Her expertise in machine learning and computational intelligence led her to companies such as Google and GitHub. Omoju also served as a volunteer advisor to the Obama administration’s White House Presidential Innovation Fellows.In this episode, Sam sits down with Omoju to discuss how machine learning can make applications more secure, what the future of the internet looks like, and the fascinating story behind Fimio.-------------------“So my first view is, in this future internet we have people, we also have bots, we have machines, we have code doing things. And bots sounds like such a horrible word now. [...] You need to have a level of trust on what that bot is. Everything from the humans to the machines collaborating in this decentralized world, we need to have some kind of reputation attached to each of those nodes. And the reason why we need that reputation is, as the thing scales, it becomes overwhelming to get value from it. You need something to help you filter, to find what you're looking for. Otherwise, you get stuck in that environment where you're just completely overwhelmed and you don't even know what to do. So I think of what I'm doing as just reputation to make this decentralized future slightly more attainable.” – Omoju Miller-------------------Episode Timestamps:(00:59): Omoju’s inspiration for starting Fimio(10:27): The future of smart contracts(28:47): Using mathematics to guarantee the safety of algorithms(34:34): What led Omoju to building a mathematical product(51:27): What open source data means to Omoju(55:38): One question Omoju wishes to be asked(57:47): Omoju’s advice for the audience(01:00:08): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with OmojuVisit Fimio
The Human Right to Privacy and Caring About UX Design with Yuliia Tkachova
This episode features an interview with Yullia Tkachova, Co-founder and CEO of Masthead Data, an observability platform that catches anomalies in Google BigQuery in real-time. She holds degrees in Management Information Systems, Math, Statistics, and Marketing. Prior to Masthead, Yuliia designed complex BI products and solutions powered by ML and utilized by Fortune 500 companies.In this episode, Sam and Yuliia discuss how ML is shaping the future of data analytics, caring about users, and the fundamental human right to privacy.-------------------“We map those errors and anomalies on lineage, helping to understand what upstreams and downstreams are affected, what business users are affected. And that actually speeds up all the troubleshooting from hours to minutes. And this is the ultimate goal where we deliver. Because again, my belief that if you don't have this lineage piece was mapped anomalous in errors, it's not observability. It's monitoring. [...] What is also very unique to us, because Masthead operates on logs, it's triggered by logs. So, we do support streaming data. Unlike SQL-first solutions, as you can guess. We don't have to run SQL queries to see if they're anomalous, we’re triggered by logs. And this is also what sets us apart.” – Yuliia Tkachova-------------------Episode Timestamps:(01:14): What got Yuliia excited about math and statistics(11:31): The basic human right to privacy(18:21): What open source data means to Yuliia(28:00): Yuliia’s reason for building a solution focused on privacy and security(38:09): One question Yuliia wishes to be asked(42:21): Yuliia’s advice for the audience(44:46): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with YuliiaVisit Masthead Data
Determinism in Complex Environments and Workflow Services with Maxim Fateev
This episode features an interview with Maxim Fateev, Co-founder and CEO of Temporal, an open source, distributed, and scalable workflow orchestration engine capable of running millions of workflows. He has 20 years of experience architecting mission-critical systems at Uber, Google, Amazon, and Microsoft. In this episode, Sam sits down with Maxim to discuss workflow services, the power behind Temporal, and bringing determinism to highly complex environments.-------------------“[Temporal] has this notion of workflows, which can run for a very long time and handle external events, you can treat them as a durable actor. And they're very good at implementing a lifecycle. For example, you can have an object per model and let this object handle all the events. Like, new data came in, notify this object, this object will go and retrain it. Or, it'll run an activity to superiorly check the status. So you can have end-to-end lifecycle implemented fully in Temporal.” – Maxim Fateev-------------------Episode Timestamps:(01:03): What’s top of mind for Maxim in workflow services(04:09): What open source data means to Maxim(11:07): Maxim explains his time at AWS and building Cadence at Uber(23:09): Use cases and the community of Temporal(28:26): How Temporal is being used for ML workloads(32:28): One question Maxim wishes to be asked(36:38): Maxim’s advice for those working with complex distributed systems(39:11): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with MaximTemporal.ioWatch Maxim’s talk “Designing a Workflow Engine from First Principles”Replay Conference 2023
The AI-Native Stack in Practice with Charna Parkey and Sam Bean
This episode features a panel discussion with Charna Parkey, a Real-Time AI Product and Strategy leader at DataStax; and Sam Bean, Staff Engineer at You.com. Charna is a co-author and inventor on several patents, including patent-pending work on ML/coordinated feature engine at the edge. Sam helped create the Spark connector to Weaviate, and is passionate about Big Data, Spark, NLP, Hugging Face, and large language models.In this episode, Charna and Sam discuss adapting to user expectations, what’s missing in the AI stack, and how to become an advanced citizen in open source.-------------------"We've seen these companies start to better understand that these streaming technologies have a place, whether it's Kafka or Flink or Pulsar, but it's still incredibly difficult to use and we need a different level of abstraction. [...] We're starting to see the stack change so that it becomes more interchangeable of the components and try to sort of raise that layer of abstraction so that we can get these types of models and these types of capabilities to more people." – Charna Parkey"I think that a lot of what you need to adjust to are these, what you were discussing as I call interaction data, you were calling it event data. But these interactions that people have with the internet and trying to find ways to model that in a way that even if your models aren't real-time, having ways to featurize real-time data in a way that's interpretable by a model. [...] I think Spark and Kafka and Delta and all of those things, give you a lot more flexibility now to move in different directions and readjust and I think, pivot what you want to do with the system." – Sam Bean-------------------Episode Timestamps:(01:29): Sam explains his background(03:36): Charna explains her background(18:13): Sam explains the problems You.com is solving for(28:21): Changes in user expectations in the AI-native stack(39:09): Advice for becoming an advanced citizen in open source(47:25): What’s missing in the AI stack(54:51): What open source data means to the panelists(58:22): How technologists should prepare for the future(01:03:10): Executive producer, Audra Montenegro's backstage takeaways-------------------Links:LinkedIn - Connect with CharnaVisit DataStaxLinkedIn - Connect with SamVisit You.com
The AI-Native Stack with Mikiko Bazeley, Zain Hasan, and Tuana Celik
This episode features a panel discussion with Mikiko Bazeley, Head of MLOps at Featureform; Zain Hasan, Senior Developer Advocate at Weaviate; and Tuana Celik, Developer Advocate at deepset.In this episode, Mikiko, Zain, and Tuana discuss what open source data means to them, how their companies fit into the AI-first ecosystem, and how jobs will need to evolve with the AI-native stack.-------------------“We're almost part of a fancy new AI robot kitchen that you'd find in Tokyo, in some ways. I see a virtual feature store as, yes, you can have a bunch of your ingredients tossed into a closet. Or, what you can do is you can essentially have a nice way to organize them. You can have a way to label them, to capture information.” – Mikiko Bazeley“I really like that analogy as well. I like how Mikiko put it where a vector search engine is really extracting value from what you've already got. [...] So where I see vector search engines, really, is if we think of these embedding providers as the translators to take all of our unstructured data and bring it into vector space into a common machine language, vector search engines are essentially the workhorses that allow us to compute and search over these objects in vectorized format. They're essentially the calculators of the AI stack.” – Zain Hasan“Haystack, I would really position as the kitchen. I need Mikiko to bring the apples. I need Zain to bring the pears. I need Hugging Face or OpenAI to bring the oranges to make a good fruit salad. But, Haystack will provide the spoons and the pans and the knives to make that into something that works together.” – Tuana Celik-------------------Episode Timestamps:(02:08): What open source data means to the panelists(08:22): What interested the panelists about AI/ML(23:20): Mikiko explains Featureform(26:11): Zain explains Weaviate(29:34): Tuana explains deepset(35:11): The panelists discuss how their companies fit into the AI-first ecosystem(44:12): How jobs need to evolve with the AI-native stack(53:45): Executive producer, Audra Montenegro's backstage takeaways-------------------Links:LinkedIn - Connect with MikikoVisit FeatureformLinkedIn - Connect with ZainVisit WeaviateLinkedIn - Connect with TuanaVisit deepsetVisit Data-centric AI
Special Episode: Data on Kubernetes and Cassandra Forward with Patrick McFadin
This special episode of Open||Source||Data features an interview with Patrick McFadin. Patrick has been a distributed systems hacker since he first plugged a modem into his Atari computer. Looking for adventure, he joined the US Navy, working on the Naval Tactical Data System (NTDS), which cemented his love of distributed systems. He is now an Apache Cassandra Committer, and is the Vice President of Developer Relations at DataStax. Sam catches up with Patrick at Data Day Texas to discuss his book Managing Cloud Native Data on Kubernetes, Cassandra Forward, and the future of Apache Cassandra.-------------------“I can now use my Parquet file in Iceberg or DuckDB, and this is data that I created with Cassandra. And we're not getting to the point where we have to reinvent an entire database. We can just connect the Lego parts together and if they're open, then I don't have these encumbrances. I'm not like, ‘Well, I can connect that if I call a salesperson and get a license.’ [...] That's what's exciting to me about Cassandra, the way that the ecosystem is evolving around Cassandra. It's not, ‘Cassandra's at the center, it's just a player.’ It's at the party." – Patrick McFadin-------------------Episode Timestamps:(01:06): What open source data means to Patrick(02:11): Patrick discusses his book Managing Cloud Native Data on Kubernetes(10:02): Patrick discusses Cassandra Forward(11:09): The future of Apache Cassandra-------------------Links:LinkedIn - Connect with PatrickCassandra Forward
Making Graph Data Easier with Open Initiatives with Denise Gosnell
This episode features an interview with Denise Gosnell, Principal Product Manager at Amazon Web Services. At AWS, Denise leads product and strategy for Amazon Neptune, a fully managed graph database service. Her career centers on her passion for examining, applying, and advocating for the applications of graph data. Denise has also authored, patented, and spoken on graph theory, algorithms, databases, and applications across all industry verticals.In this episode, Sam sits down with Denise to discuss graph initiatives, the future of developer models, and what Denise learned from hiking the Appalachian Trail.-------------------“We just open sourced something called graph-explorer, which is something for the community by the community, Apache 2.0 license. graph-explorer is a low-code visualization tool. But, the best part about it is that it works for JanusGraph, it works for Blazegraph, it works for all of these graph models that we've talked about, because we've got this divided graph community, but it was written to work with all graphs. [...] Today it's all, ‘Here's your Lego blocks and build one on your own. If you want to go ahead and fork Jupyter Notebook and figure out a way to get that D3 force-directed graph way out to pop up, have fun.’ It's the first time that we've had a unified way across graph vendors and graph implementations to have a way to visualize your graph data in one tool that's open source.” – Denise Gosnell-------------------Episode Timestamps:(01:17): What open source data means to Denise(04:27): How Denise got interested in computer science(08:39): Denise’s work on graph initiatives(14:30): How Denise’s work at LDBC relates to SQL standards(23:43): The future of developer models(29:43): One question Denise wishes to be asked(34:05): Denise’s advice for graph practitioners(37:37): Executive producer, Audra Montenegro's backstage takeaways-------------------Links:LinkedIn - Connect with DeniseThe Practitioner’s Guide to Graph Data
Advising Big Data and The Future of AI/ML with Ben Lorica
This episode features an interview with Ben Lorica, Co-founder and Principal of Gradient Flow, a company that provides a wide range of content on data and technology. Ben is an industry expert on data, machine learning, and AI. He is a Technical Advisor for Databricks, a program chair for several data conferences, and he hosts The Data Exchange Podcast.In this episode, Sam and Ben discuss Big Data and the improvements and future opportunities of AI and machine learning.-------------------“The reason I use the word decentralize is because when you try to explain it to someone, let's say you want to train a different model for each user, or region, or sensor, or device. So you can't use necessarily just personalized because recommenders can be personalized, but they're still centralized models.” – Ben Lorica-------------------Episode Timestamps:(01:17): What open source data means to Ben(05:54): What intrigued Ben about Big Data(12:07): What brought Ben to working on Ray(16:15): Ben’s opinion on how far AI and ML have come in the last 5 years(26:38): What Ben sees happening in this space in the next 5 years(39:06): What challenges Ben sees in the next 5 years (43:51): One question Ben’s always wanted to be asked(44:55): Ben’s advice for those starting their open source data adventure(46:34): Executive producer, Audra Montenegro's backstage takeaways-------------------Links:LinkedIn - Connect with BenGradient Flow’s NewsletterGradient Flow’s 2023 Trends ReportVisit Sky Labs
Functional Programming and an Ideal Data Stack Building Experience with Holden Karau
This episode features an interview with Holden Karau, an Open Source Engineer at Netflix. Holden is best known for her work on Apache Spark, her advocacy in the open source software movement, and her creation of a variety of related projects including spark-testing-base. Previously, Holden worked at Big Tech companies like Apple, IBM, and Google as a software engineer and developer advocate.In this episode, Sam sits down with Holden to discuss the data analysis stack, functional programming, and the future of open source software data tooling.-------------------“These things are not one off. We may think that they're one off and they don't need testing, but that's not the reality. When you write something, it needs to be maintainable and as software people, the only real way that I think we know to make something vaguely maintainable is to at least have tests. And these tests need to cover common failure cases that we've experienced. And certainly, there's different approaches to this. There's property based testing, there's golden sets, all kinds of different options. I don't think necessarily any one approach is right or better here, but I think we need something. We need less untitled 5.IPython Notebook running in production, scheduled every hour. That is not a way to run a company.” – Holden Karau-------------------Episode Timestamps:(02:27): What open source data means to Holden(04:37): What interested Holden in mathematical computer science (09:51): What drew Holden to Spark(12:49): What Holden has learned about cognitive systems(20:02): What we need to learn as developers and data specialists(25:28): The future of the data analysis stack(31:21): Improvements in data tooling over the next 5 years(34:25): A question Holden wishes to be asked(40:51): Holden’s advice for open source data project committers(43:18): Executive producer, Audra Montenegro's backstage takeaways-------------------Links:LinkedIn - Connect with HoldenBuy Holden’s booksVisit Holden’s website
Workflow Engines and Building a Domain Specific Language for Data Quality with Tom Baeyens
This episode features an interview with Tom Baeyens, Co-founder and CTO of Soda, where he oversees the company's product development, software architecture, and technology strategy. He is passionate about open source and committed to building a community where data engineers can succeed using the Soda Data Monitoring Platform. Tom is the inventor of the widely-used open source project jBPM and Activiti. He also co-founded Effektif, a cloud process automation company.In this episode, Sam and Tom discuss the evolution of open source workflow engines, data contracts, and why data quality needs a language approach.-------------------“Where we're heading is what I think is exactly the same as with software engineering in the testing. Test-driven development was a radical new thing back then. But then it turns out, you can much more reliably release software. And this is exactly the same here. If you don't inject data testing, data observability throughout your data stack, then how are you going to trust the data that you put into your machine learning model? This is something that people are realizing, but we're still figuring out the best practices, the dos, the don'ts. We've come a long way, but there's still a way to go before this is as common and as normal as in the test-driven development software engineering space.” - Tom Baeyens-------------------Episode Timestamps:(01:23): What open source data means to Tom(04:34): Tom’s motivations for creating jBPM(09:39): What led Tom to building Soda(13:57): Why data quality needs a language approach(19:24): The community of Soda(22:47): The future of Soda as a technology(24:59): A question Tom wishes to be asked(30:24): Tom’s advice for engineers who want to leverage data observability tools-------------------Links:LinkedIn - Connect with TomTwitter - Follow TomVisit SodaCL
Enabling Edge Workers, AI & ML, and The Future of Data Science with Matthew Rocklin
This episode features an interview with Matthew Rocklin, CEO of Coiled, the scalable Dask-based cloud platform. Prior to founding Coiled, Matthew worked on Dask at Anaconda and then NVIDIA where his teams focused on accelerating Dask through parallel computing and GPUs. Matthew is an industry speaker, author, and founding member of Pangeo, whose mission is to develop open source analysis tools for ocean, atmosphere, and climate science.In this episode, Sam sits down with Matthew to discuss enabling edge workers, the future of data science, and the revolution of AI and ML.-------------------“There's all sorts of fun people using these tools and that's the most fun part of this job. You get to learn so much about so many different applications that are all so different and all so fascinating. You were thinking about all these different tools and technologies and I was talking to someone once, it's like, ‘Oh, it's like you're standing on the shoulders of giants.’ That's not quite right. There's lots of sort of normal size people all standing on each other's shoulders in like a massive pyramid. [...] Dask was designed to scale up an existing ecosystem. There's a legacy Python ecosystem that’ll provide a layer of parallel computing on top of it. You can do that either by rewriting the whole thing, which is not feasible, or you can do it by talking to lots of people and getting them to integrate in interesting, fun ways. That's actually been the fun parts of Dask. I think I've probably talked to every major maintainer group ever. I have worked with them to find out the ways to get everything to work smoothly together. And that's super fun. There's an interesting sort of technical and social hacking that occurs, which I think Python has done pretty well at, historically. Which is why it has success.” – Matthew Rocklin-------------------Episode Timestamps:(00:58): What open source data means to Matthew(03:29): Matthew’s motivations behind Python(18:58): How Matthew is enabling edge workers (34:46): What the future of data Python space looks like(39:29): Matthew’s advice for the technical data audience(41:36): Executive producer, Audra Montenegro's backstage takeaways-------------------Links:LinkedIn - Connect with MatthewTwitter - Follow MatthewVisit Matthew’s WebsiteVisit DaskDask ExamplesVisit CoiledSciPy Mission
OSPOs, Measuring Community Success, and Self Knowledge with Nithya Ruff
This episode features an interview with Nithya Ruff, Head of Open Source Program Office at Amazon. At Amazon, she drives open source culture and coordination and engagement with external communities. Prior to Amazon, Nithya spearheaded and grew Open Source Program Offices (OSPOs) for Comcast and Western Digital. She has also served as the Director-At-Large on the Linux Foundation Board since 2016, where she works to advance the mission of building sustainable ecosystems that are built on open collaboration.In this episode, Sam and Nithya discuss OSPOs, how to measure success, and the evolution of the data ecosystem.-------------------“I think if we look at what matters to customers, which is innovation, trust, and being a force for change with open source, then we can really deliver on the metrics that the company cares about.” – Nithya Ruff-------------------Episode Timestamps:(04:02): What open source data means to Nithya(06:29): What interested Nithya about open source software(12:34): What Nithya learned at Western Digital and Comcast that she uses now at Amazon(18:23): What Nithya teaches people in OSPO curriculum(22:06): How the open source data ecosystem has evolved in the last decade(27:44): One question Nithya wishes to be asked(30:37): Nithya’s advice for folks who want to create an OSPO-------------------Links:LinkedIn - Connect with NithyaTwitter - Follow NithyaOpen Source Law, Policy and PracticeLinkedIn - Connect with AmazonTwitter - Follow AmazonVisit Amazon
IoT Databases, Digital Twins, and Real Holodecks with Jonathan Beri
This episode features an interview with Jonathan Beri, Founder & CEO of Golioth, a commercial IoT development platform built for scale. Previously, Jonathan was a Product Manager at Particle, Google/Nest, Magneto, and Myspace where he spent his time building IoT solutions.In this episode, Sam sits down with Jonathan to discuss the concept of digital twins, the future of IoT databases, and how to build a real holodeck.-------------------“I think about IoT when I started at Nest, we had some of the best engineers I've ever worked with. Starting from first principles, defining networking protocols, and introducing new specifications that became parts of the fabric of the internet. And fast forward 10 years later, a lot of that exists now as building blocks. Someone who's not a PhD with a lifetime and achievement award from the ITF can go actually design systems that are highly productive, integrated, and enabling. And that's where I get excited. And the through line I think is enabling teams of developers to really create more with their own bare hands. And the technology around it, that is that enabler.” – Jonathan Beri-------------------Episode Timestamps:(01:33): Jonathan’s motivation for starting Golioth(08:59): The role of data in IoT(11:01): What is a digital twin and why does it matter?(17:12): The classes of problems Jonathan is trying to solve(20:35): The future of IoT databases in the next five years(31:04): What open source data means to Jonathan(32:24): Jonathan explains how to build a real holodeck(33:42): Jonathan’s advice for those excited about industrial data-------------------Links:LinkedIn - Connect with JonathanTwitter - Follow JonathanVisit Jonathan’s WebsiteLinkedIn - Connect with GoliothTwitter - Follow GoliothVisit Golioth
Healthcare Infrastructure, ALS Research and Reliable Data with Indu Navar
This episode features an interview with Indu Navar, CEO and Founder of EverythingALS, a patient-driven non-profit, bringing technological innovations and data science to support efforts from care to cure, for people with ALS. Indu’s impressive career includes being an original member of the WebMD engineering team, where she was instrumental in using emerging technologies to achieve application scalability and performance.In this episode, Sam sits down with Indu to discuss healthcare infrastructure applications, her strategies for providing reliable patient data, and the future of ALS research.-------------------“We said, ‘Okay, we're going to make this a citizen-driven research.’ That means patients are going to come and enroll because it's their project and it's patient-driven. So, it's a patient-driven, open innovation. So, once you do open patient-driven, open innovation, now we are the custodians of the data. Patients own the data, so all the data is shared with the patient. That was not done before in any of the research. And so, we give all the data back to the patients. And of course, we give them metrics as well. What was the rate of their speed of their speech? And if they don't want to see it, it's fine, at least they have it. And that data, we are the custodians and as custodians we share the data. So, once we did this model, we got almost close to one thousand people enrolled, consented, within 16 months. As supposed to about 25 people in one year or 50 people in one to two years.” – Indu Navar-------------------Episode Timestamps:(01:19): What’s changed for Indu in the last tear(05:46): What data infrastructure was like 25 years ago to solve for health outcomes(13:00): Indu’s personal experience with healthcare data(16:47): What Indu is looking forward to in ALS research(20:43): How regulatory establishments have shifted in healthcare(30:31): Where Indu wants to see EverythingALS go in the next year(36:28): One question Indu wishes to be asked(38:28): Indu’s advice for people inspired by EverythingALS-------------------Links:LinkedIn - Connect with InduTwitter - Follow InduTwitter - Follow EverythingALSVisit EverythingALS
Shifting Left on Data with DeVaris Brown, Tomer Shiran, and Erica Brescia
This bonus episode features conversations from season 3 of the Open||Source||Data podcast. In this episode, you’ll hear from DeVaris Brown, CEO & Co-founder of Meroxa; Tomer Shiran, Founder & CPO of Dremio; and Erica Brescia, Managing Director at Redpoint Ventures.Sam sat down with each guest to discuss how they’re making data more programmable by shifting left.You can listen to the full episodes from DeVaris Brown, Tomer Shiran, and Erica Brescia by clicking the links below.-------------------Episode Timestamps:(00:12): DeVaris Brown(00:42): Tomer Shiran(01:32): Erica Brescia-------------------Links:Listen to DeVaris’ episodeListen to Tomer’s episodeListen to Erica’s episode
Serial Entrepreneurship, Metadata Capture Systems, and Osquery with Tony Gauda
This episode features an interview with Tony Gauda, Head of Customer Engineering at Fleet Device Management, an open core company powered by Osquery. Tony is a serial entrepreneur and inventor with a profound history in fraud, security, and SaaS business. He holds several issued patents and his companies have raised over $40 million in venture funding. Tony is also the founder of ThinAir, a Y-Combinator backed SaaS service that tackles the insider threat problem for enterprises and government agencies.In this episode, Sam and Tony discuss calculating data usage at scale, the creativity of attackers, and how to evolve as threats increase.-------------------“The great thing about Osquery is that since it is a sensor-based system that is queryable, it literally gives you the ability to discover new indicators of compromise and then use those when doing security investigations. And Osquery allows you to create these extremely interesting queries that would find things that you would never be able to find with a traditionally static functionality agent. And, that to me, is extremely exciting. The fact that you have this agent that is extendable and it's configurable and it's deployable across multiple different platforms, at the end of the day, it feels like it's almost a superpower for visibility.” – Tony Gauda-------------------Episode Timestamps:(01:17): What Tony is curious about these days(04:39): What problems Tony is trying to solve(05:47): How Tony got into the tech world(11:09): Tony’s inspiration behind ThinAir(15:25): What open source data means to Tony(17:06): What led Tony to being an early adopter of Osquery(20:31): What’s ahead for building next level applications with open and secure data(25:37): One question Tony’s always wanted to be asked(29:24): Tony’s advice for inventors-------------------Links:LinkedIn - Connect with TonyTwitter - Follow TonyTwitter - Follow FleetdmFleetdmFleetdm GitHub Platform
Code Intelligence, GraphQL, and Closing the Remediation Gap with Beyang Liu
This episode features an interview with Beyang Liu, CTO and Co-founder of Sourcegraph, a code intelligence platform. Prior to Sourcegraph, Beyang was a software engineer at Palantir Technologies, where he developed new data analysis software on a customer-facing team working with Fortune 500 companies. Beyang studied Computer Science at Stanford, where he published research in probabilistic graphical models and computer vision at the Stanford AI Lab.In this episode, Sam sits down with Beyang to discuss the power of intelligence and visualization, GraphQL versus REST API, and how Sourcegraph is drawing inspiration from Google.-------------------“When I think about the future of Sourcegraph, it's really the future of this global human knowledge base that we're constructing. Similar to the worldwide web, the internet, where that was an amazing thing that came along. We're starting to see something like that emerge in the world of code. The open source ecosystem is this amazing, decentralized, distributed store of human knowledge that encapsulates all these algorithms and data structures and systems that are then pulled into all these systems that we rely on in our lives. And, so far, no one has really tried to map that web of knowledge in the same way that Google has mapped the internet and we want to do that. [...] You just open up a web browser, open up Google, type a query and you're good to go. We want to make exploring code as easy as that experience.” – Beyang Liu-------------------Episode Timestamps:(01:21): What open source data means to Beyang(02:59): Beyang’s inspiration to create Sourcegraph(09:13): What Beyang sees in the future of power of intelligence and visualization(14:37): How Sourcegraph works(24:11): GraphQL versus REST API(27:10): What Sourcegraph’s open source community looks like(30:29): Beyang’s advice for people wanting to build new companies-------------------Links:LinkedIn - Connect with BeyangTwitter - Follow BeyangTwitter - Follow SourcegraphSourcegraphSourcegraph Discord Channel
Stream Processing, Observability, and the User Experience with Eric Sammer
This episode features an interview with Eric Sammer, CEO of Decodable. Eric has been in the tech industry for over 20 years, holding various roles as an early Cloudera employee. He also was the co-founder and CTO of Rocana, which was acquired by Splunk in 2017. During his time at Splunk, Eric served as the VP and Senior Distinguished Engineer responsible for cloud platform services.In this episode, Sam and Eric discuss the gap between operating infrastructure and the analytical world, stream processing innovations, and why it’s important to work with people who are smarter than you.-------------------"The thing about Decodable was just like let's connect systems, let's process the data between them. Apache Flink is the right engine and SQL is the language for programming the engine. It doesn't need to be any more complicated. The trick is getting it right, so that people can think about that part of the data infrastructure, the way they think about the network. They don't question whether the packet makes it to the other side because that infrastructure is so burned in and it scales reasonably well these days. You don't even think about it, especially in the cloud." – Eric Sammer-------------------Episode Timestamps:(01:09): What open source data means to Eric(06:57): What led Eric to Cloudera and Hadoop(12:48): What inspired Eric to create Rocana(20:29): The problem Eric is trying to solve at Flink(29:54): What problems in stream processing we’ll have to solve in the next 5 years(36:58): Eric’s advice for advancing your career-------------------Links:LinkedIn - Connect with EricTwitter - Follow EricTwitter - Follow DecodableDecodable
Season 3 Compressed Edition with Sam and Audra
Join Open||Source||Data executive producer Audra Montenegro as she and Sam discuss his learnings and takeaways from this season and what the future of open source data looks like.-------------------“There's such an open conversation about, ‘Yeah, open source,’ we usually think about open source software. How can we cross apply more of what we think about in software in general into data, and then what is it that's totally new about this domain? So, the answers cluster into three groups. It's either about the source of the data itself is open, meaning this is government data or data that's been made public and it's openly accessible. Or it could be that open source data is how the data is actually produced. Is it using open source tooling? Is it on an open source architecture? And finally, how do you trust that open source data? If it's just a whole bunch of data but it hasn't been labeled, if it hasn't been managed and produced, turned into a product. How do you understand its heritage? How do you understand the lineage of the data so that you can produce trustworthy models and trustworthy results based on it? So it's a big open field, but those are the general responses that people have when we explore that topic.” – Sam Ramji-------------------Episode Timestamps:(01:29): What open source data means to our guests(02:57): Sam discusses the themes of season 3(10:38): What Sam is looking forward to in the future of open source data-------------------Links:LinkedIn - Connect with SamLinkedIn - Connect with AudraTwitter - Follow SamTwitter - Follow Audra
Accelerating Computation, Machine Learning, and Data Mesh with Sophie Watson
This episode features an interview with Sophie Watson, Technical Product Marketing Manager at NVIDIA. Previously, Sophie served as a software engineer and principal data scientist at RedHat where she used machine learning to solve business problems in the hybrid cloud. Sophie has a PhD in Bayesian statistics and frequently speaks about machine learning workflows on Kubernetes, recommendation engines, and machine learning for search. In this episode, Sam and Sophie discuss Principal Component Analysis, computational acceleration, and MLOps.-------------------“We all start when we get hold of a data set by visualizing it to try to understand it. So that usually for me involves starting with a simple technique, something like PCA, Principal Component Analysis. It's been around since the eighties, probably longer, maybe the sixties. Don't quote me on that. With Principal Component Analysis, we can map our high dimensional data down to a smaller number of dimensions. Let's map it down to two so that we can visualize it. So we can go ahead and visualize it. But Principal Component Analysis is quite a simple technique in what it's doing and it's just mapping onto key components of our data. We might not be able to see, perhaps, separation of classes if we're working with data that's from a set of classes. Maybe we're looking at transactions, are they fraudulent or are they legitimate? And we might not be able to see that distinction. So that makes us think, "Is there something interesting in my data? Am I going to be able to train a machine learning model?" I don't know. Back in the day, I think the next step would've been, “Oh, let's train a model in C”, but now with accelerated compute within a really reasonable amount of time, we can go ahead and use a more sophisticated technique so we can use something like UMAP that's leaning on differential manifolds to do that projection to lower dimensions. And because this technique is slightly more sophisticated, what we find in general is that within the same amount of time, we're able to get more insight into the data. We're able to see the distinction in classes between our data sets. It keeps you in that loop. It keeps you in that productivity state.” – Sophie Watson-------------------Episode Timestamps:(01:22): What open source data means to Sophie(02:47): How Sophie is spending her time (07:52): What excites Sophia about the data science community(10:13): What Sophie is most excited about in data visibility(16:29): Data on servers versus data in the cloud(18:09): Accelerated computation on machine learning(22:27): Sophie breaks down probabilistic programming(24:21): What problem was Sophie trying to solve in her career(32:12): Sophie’s dream job of working for Taylor Swift(34:48): Sophie’s advice for those interested in open source-------------------Links:LinkedIn - Connect with SophieTwitter - Follow SophieTwitter - Follow NVIDIANVIDIA
Democratization and Cognition with Margot Gerritsen, Rachel Chalmers, and Patricia Boswell
This bonus episode features conversations from season 1 of the Open||Source||Data podcast. In this episode, you’ll hear from Margot Gerritsen, Stanford Professor and Co-Founder/Director of WiDS; Rachel Chalmers, Partner at Alchemist Accelerator; and Patricia Boswell, Staff Technical Writer at Google.Sam sat down with each guest to discuss cognition and democratization in data. You can listen to the full episodes from Margot Gerritsen, Rachel Chalmers, and Patricia Boswell by clicking the links below.-------------------Episode Timestamps:(00:18): Margot Gerritsen(02:07): Rachel Chalmers(03:46): Patricia Boswell-------------------Links:Listen to Margot’s episodeListen to Rachel’s episodeListen to Patricia's episode
Vector Search, the AI Stack and more with Bob van Luijt
This episode features an interview with Bob van Luijt, CEO and Co-Founder of SeMI Technologies and co-creator of Weaviate, an open source vector search engine. At just 15 years of age, Bob started his own software company in the Netherlands. He went on to study music at ArtEZ University of the Arts and Berklee College of Music, and completed the Harvard Business School Program of Management Excellence. Bob is also a TedX speaker, discussing the relationship between software and language.In this episode, Sam sits down with Bob to break down vector search, the AI-first ecosystem, and how music and software relate to one another.-------------------“I dare to argue that from the two big waves in database technology that we've seen, so first, in the seventies and eighties with SQL. And then the whole NoSQL wave that we have seen and the big winners that are in there, I dare to argue that we see a third wave coming up. And the third wave, I simply call it AI-first. And what I mean with that is that these models play an important role. So we do it from the perspective of the models first. And in that new segment, you see four niches. So the first niche that we see are what I like to call the embedding providers. The Hugging Faces of this world, the OpenAIs of this world, etc. Those who bring us the embeddings that we need to do the vectorization. Then secondly, we have so-called neural search frameworks. So we see frameworks like Haystack and Jina. Then third, we have the feature stores. So the feature stores take care of storing large chunks of features that we later can use to do vectorization on those kinds of things.And then we have the search engines. And Weaviate is an example of such a search engine that takes care of searching through data on a large scale that is vectorized.It might be a bold statement, but I really believe that we see this third wave of database technology happening.” – Bob van Luijt-------------------Episode Timestamps:(01:45): How Bob defines open source data (04:09): What is a vector database and why do we need them? (07:55): How data is different before and after vectorization(13:58): Orders of magnitude faster or personal(16:09): How music and software relate to each other for Bob(19:33): Bob’s inspiration behind Weaviate(25:02): The AI-first ecosystem(27:38): The distinction between vector search engines, feature stores, neural search frameworks, and embedding (32:28): Bob’s advice for folks on the OSS startup journey-------------------Links:LinkedIn - Connect with BobTwitter - Follow BobTwitter - Follow WeaviateWeaviateSeMI TechnologiesBob’s TedX TalkBob's Forbes Article on the AI-First Database Ecosystem
Open Source Innovation, The GPL for Data, and The Data In to Data Out Ratio with Larry Augustin
This episode features an interview with Larry Augustin, angel investor and advisor to early-stage technology companies. Larry previously served as the Vice President for Applications at AWS, where he was responsible for application services like Pinpoint, Chime, and WorkSpaces.Before joining AWS, Larry was the CEO of SugarCRM, an open source CRM vendor. He also was the founder and CEO of VA Linux, where he launched SourceForge. Among the group who coined the term “open source”, Larry has sat on the boards of several open source and Linux organizations.In this episode, Sam and Larry discuss who owns the rights to data, the data in to data out ratio, and why Larry is an open source titan.-------------------"People are willing to give up so much of their personal information because they get an awful lot back. And privacy experts come along and say, ‘Well, you're taking all this personal information’. But then most people look at that and say, ‘But I get a lot of value back out of that.’ And it's this data ratio value question, which is: for a little in, I get a lot back. That becomes a key element in this. And I think there has to be some kind of similar thought process around open source data in general, which is if I contribute some data into this, I'm going to get a lot of value back. So this data in to data out ratio, I think it's an incredibly important one. It's a principle that I drive into application development. If you put a user in front of an app and they start using the app, you're going to ask them for things. And my principle is always, ‘How do you figure out how to never ask them and only give them?’ And you can't get 100% of the way there, but every time it's like, ‘Why did you ask them for that? Couldn't you figure it out?’ And it gets everyone in the mindset of, ‘How do I provide more and more and take less and less?’ It's a principle of application development that I like a lot. And I think there's a similar concept here around open-source data. Are there models or structures that we can come up with where people can contribute small amounts of data and as a result of that, they get back a lot of value.” – Larry Augustin-------------------Episode Timestamps:(02:14): How Larry is spending his time after AWS(06:01): What drove Larry to open source(18:04): What is the GPL for data?(23:51): Areas of progress in open source data(28:37): The data in to data out ratio(36:02): Larry’s advice for folks in open source-------------------Links:LinkedIn - Connect with LarryTwitter - Follow Larry
Data Observability with Barr Moses, Einat Orr, and Shinji Kim
This bonus episode features conversations from season 2 of the Open||Source||Data podcast. In this episode, you’ll hear from Barr Moses, Co-founder and CEO at Monte Carlo; Einat Orr, Co-founder and CEO at Treeverse; and Shinji Kim, Founder and CEO at Select Star.Sam sat down with each guest to discuss data observability. You can listen to the full episodes from Barr Moses, Einat Orr, and Shinji Kim by clicking the links below.-------------------Episode Timestamps:(00:35): Barr Moses(01:21): Einat Orr(02:07): Shinji Kim-------------------Links:Listen to Barr’s episodeListen to Einat’s episodeListen to Shinji’s episode
Apache Pinot and Real-Time Analytics with Neha Pawar
This episode features an interview with Neha Pawar, a Founding Engineer at StarTree. StarTree is a software development company that focuses on democratizing data for all users by providing real-time, user-facing analytics.Prior to her time at StarTree, Neha was a Senior Software Engineer on LinkedIn’s Data Analytics team where she spent five years working on Apache Pinot. Neha has provided countless contributions to Pinot over the years, focusing on real-time streaming integrations, ingestion, and storage. In this episode, Sam sits down with Neha to discuss Apache Pinot’s impact on the data community and how LinkedIn popularized real-time analytics.-------------------"Many people do think that a batch is good enough, real-time infra is expensive anyway. And what difference is it going to make if the data shown in this application is a day ago or an hour ago, and it's not real-time to the nearest second? And while that is true, in some cases, but in many other cases, not having real-time data can be super expensive and can affect the business badly and also make them irrelevant. You need the real-time data and then you also need to be able to analyze that data at the speed of your thought. For example, if you are having fraudulent activity somewhere, you can't wait for, ‘Hey, my model is going to learn about this.’ And then the next time, be able to tell me that that was a fraudulent activity. You need to be able to analyze all that data right now. So, it's not just a nice-to-have, it's a must-have.” – Neha Pawar-------------------Episode Timestamps:(01:58): What open source data means to Neha(06:04): Neha’s learnings from the LinkedIn Data Analytics Team(07:07): What peaked Neha’s interest in real-time data analytics(08:30): Neha’s first experiences working on Apache Pinot(11:40): How the work of real-time data spread from LinkedIn to other companies(17:30): How the Apache community has grown(24:04): Neha’s focus at StarTree(30:41): Neha’s motivation for tiered storage at StarTree (37:07): Neha’s advice for open source data folks-------------------Links:LinkedIn - Connect with NehaLinkedIn - Connect with StarTreeTwitter - Follow NehaTwitter - Follow StarTreeVisit StarTree
Real-Time Data, Enabling Developers, and User Experience with DeVaris Brown
This episode features an interview with DeVaris Brown, CEO and Co-Founder of Meroxa. Meroxa was founded in 2020 and enables teams of any size and any expertise to build real-time data pipelines in minutes.Previously, DeVaris was a product leader at Twitter, Heroku, and Zendesk. Sam and DeVaris even crossed paths at Microsoft in the aughts.In this episode, Sam and DeVaris discuss enabling developers, real-time data, and providing the ultimate user experience.-------------------"From the beginning we wanted to be system engineer first, software engineer second, and we were happy to stand on the shoulders of giants that built foundational pieces of technology to help us get our job done more efficiently. [...] The one thing I love about my co-founder and he's super humble, Ali, we did billions of events a minute at Heroku on the data platform for tens of thousands of Kafka clusters for thousands of customers. But the team was six and he was a lead on that team. And we had five nines for years. Why? Because automation. And that's really what we built. [...] And so what we said was the experience will be our differentiator, but the components and the architecture which we run on, that can be standard. And that was a real big lesson that I learned at Heroku." – DeVaris Brown-------------------Episode Timestamps:(05:47): What open source data means to DeVaris (09:08): DeVaris’ inspiration for building a Heroku for data (14:09): The open source underneath Meroxa (20:06): What the Meroxa open source community looks like(25:13): How will data engineering evolve over time?(28:41): DeVaris breaks down real-time data(33:40): Where does the name Meroxa come from? (35:01): DeVaris’ advice for open source data folks-------------------Links:LinkedIn - Connect with DeVarisLinkedIn - Connect with MeroxaTwitter - Follow DeVarisTwitter - Follow MeroxaVisit MeroxaVisit Orbit
Data Meshes, Fabrics, and Discovery with Zhamak Dehghani, David Thomas, and Shirshanka Das
This bonus episode features conversations from season 1 and 2 of the Open||Source||Data podcast. In this episode, you’ll hear from Zhamak Dehghani, Director of Emerging Technologies at ThoughtWorks North America; David Thomas, Principal at Deloitte; and Shirshanka Das, Founder of LinkedIn DataHub and Acryl Data.Sam sat down with each guest to discuss data meshes, fabrics, and discovery. You can listen to the full episodes from Zhamak Dehghani, David Thomas, and Shirshanka Das by clicking the links below.-------------------Episode Timestamps:(00:36): Zhamak Dehghani(01:41): David Thomas(02:43): Shirshanka Das-------------------Links:Listen to Zhamak’s episodeListen to David’s episodeListen to Shirshanka’s episode
Investing in Communities, Differentiating, and Trusting Your Gut with Erica Brescia
This episode features an interview with Erica Brescia, Managing Director of Redpoint Ventures. At Redpoint, Erica focuses her investing on infrastructure, DevOps, and security.Erica has over 15 years of experience in the open source community and currently serves on the board of directors of the Linux Foundation. Prior to joining Redpoint, Erica was also an angel investor and advisor to companies such as Netlify, Coda, and Xata.In this episode, Sam and Erica discuss the evolution of open source data, what’s changed for practitioners, and why you should always listen to your gut.-------------------“I think there is just so much good motivation to make the world a better place, especially during my time at GitHub. When you can see what kinds of opportunity open source can bring to people in developing countries, that’s really exciting. You see people whose lives and livelihoods have literally been changed because they were able to participate in a global open source project. And then you can see the way that open source projects, even back when we were packaging things at Bitnami, we’d hear from non-profits in Africa that were never able to use open source until we made it easy to consume. When you feel like you’re really making that kind of a difference and you’re doing it in a community of great people, it’s a really great way to spend your time.” – Erica Brescia-------------------Episode Timestamps:(03:18): What open source data means to Erica(11:31): What’s changed in open source data in recent years(18:01): How the journey has evolved for innovators and practitioners(24:11): What stands out as a venture capitalist to Erica(30:03): Don’t discount junior investors(31:17): Erica’s advice: get quiet and listen to your gut-------------------Links:LinkedIn - Connect with EricaLinkedIn - Connect with Red PointTwitter - Follow EricaTwitter - Follow RedpointVisit RedpointXataDagger
Data on Kubernetes with Kelsey Hightower, Lachlan Evenson, and Patrick McFadin
This bonus episode features conversations from season 1 of the Open||Source||Data podcast. In this episode, you’ll hear from Kelsey Hightower, Principal Engineer at Google Cloud; Lachlan Evenson, Principal Program Manager at Microsoft Azure; and Patrick McFadin, Head of Developer Relations at DataStax. Sam sat down with each guest to discuss Data on Kubernetes and how they’re making progress on a stateless infrastructure.You can listen to the full episodes from Kelsey Hightower, Lachlan Evenson, and Patrick McFadin by clicking the links below.-------------------Timestamps:(00:39): Kelsey Hightower(01:33): Lachlan Evenson(02:06): Patrick McFadin-------------------Links:Listen to Kelsey’s episodeListen to Lachlan’s episodeListen to Patrick’s episode
Deep Fakes, Responsible Data Science, and Trust with David Danks
This episode features an interview with David Danks, Professor of Data Science and Philosophy and affiliate faculty in Computer Science and Engineering at University of California, San Diego. Prior to UCSD, David was the L.L. Thurstone Professor of Philosophy and Psychology at Carnegie Mellon University. David’s research interests are at the intersection of philosophy, cognitive science, and machine learning. He has also examined the ethics surrounding artificial intelligence in the fields of healthcare, privacy, and security. In this episode, David and Sam dive into responsible data science, deep fakes, and if data is to blame for the lack of trust among consumers.-------------------"There's a, almost, glorification of the technology that's happening at the moment. And the technology is obviously crucial, but what I really care about in a lot of ways is what are the human beings who build and use that technology doing with it? Because the exact same ones and zeros, the exact same code can lead to enormous social benefit or social harm, depending on what we humans do with it. And so, I think we need to recognize that technology is not this hurricane bearing down on us, it's a thing that people build and use. And how do we influence the people, and the companies is maybe an easier thing to do than trying to focus just on the data and algorithms." – David Danks-------------------Episode Timestamps:(01:41): What open source data means to David(05:58): David’s transition from philosophy to AI(09:03): Is data to blame for lack of trust in AI?(13:40): How to be “future aware”(16:32): Data science vs responsible data science(20:20): Deep Fakes(40:17): Advice for Ethical AI newcomers-------------------Links:Connect with David
Cloud Innovation, Analytics, and Data Transformation with Monica Kumar
This episode features an interview with Monica Kumar, Senior Vice President of Marketing and Cloud-Go-To Market at Nutanix. Nutanix is a data platform that is redefining workloads in cloud environments. Prior to Nutanix, Monica spent two decades at Oracle where she launched several market solutions. Monica is passionate about positioning and supporting women in leadership roles. She is a founding limited partner of Neythri Futures Fund, a venture fund dedicated to bringing South Asian women into the investment community. Monica also serves on the board of Directors at Watermark, an organization dedicated to women in leadership. In this episode, Monica and Sam discuss the evolving world of marketing analytics, tech’s biggest innovation to date, and how the data industry can change for the better.-------------------“I believe that cloud has now become more of an operating model. It started out in the public cloud, but now organizations have adopted the same philosophy of self-service, metering, chargeback, quick deployment, on-demand deployment, on-premises as well. So, my assertion is that cloud has become more of an operating model than a location. And what we’re going to see going forward more and more is this notion of multi-cloud and hybrid multi-cloud data platforms that would be able to access data from multiple locations and be able to provide on top of that, the analysis that the user is looking for.” – Monica Kumar-------------------Episode Timestamps:(02:23): What open source data means to Monica(12:37): The evolving world of marketing insight analytics(16:42): How remote work is changing the industry for the better(20:25): How Monica supports diverse entrepreneurs (24:11): The transformation of a database to a data platform(26:37): Why the cloud is tech’s biggest innovation(29:23): What’s next for data storage in 5 years?(29:41): Monica’s analogy for data storage(33:34): Monica’s advice for newcomers -------------------Links:LinkedIn - Connect with MonicaLinkedIn - Connect with NutanixTwitter - Follow MonicaTwitter - Follow NutanixVisit NutanixThe Neythri Futures Fund
Data Lakehouses, Interoperability, and Accessibility with Tomer Shiran
This episode features an interview with Tomer Shiran, Founder and Chief Product Officer at Dremio. Dremio is a high-performance SQL lakehouse platform that helps companies get more from their data in the fastest way possible. Prior to Dremio, Tomer served as VP of Product at MapR and also held product management and engineering roles at Microsoft and IBM Research. He also has a master’s degree from Carnegie Mellon University as well as a bachelor’s from Technion - Israel Institute of Technology.In this episode, Tomer and Sam dive into the economics of storing data, how to build an open architecture, and what exactly a data lakehouse is.-------------------“I think in the world of data lakes and lakehouses, the model has shifted upside down. Now, instead of bringing the data into the engines, you’re actually bringing the engines to the data. So you have this open data tier built on open source technology. The data is represented in open source formats and stored in the company’s S3 account or Azure storage account. And then you can use a variety of engines. We at Dremio, we take pride in building the best SQL engine to use on the data. There are different streaming engines, like Spark and Flink. There are different batch processing and machine learning engines. Spark is an example of that as well that companies can use on that same data. And I think that’s one of the really important things from a cost standpoint, too, is that this really lowers your overall costs, both today and also in the future as you scale.” – Tomer Shiran-------------------Episode Timestamps:(02:04): What open source data means to Tomer(03:14): Tomer’s motivation behind Apache Arrow(06:42): How Tomer solved data accessibility (08:43): The unit economics of storing data(14:31): Tomer’s motivations for Iceberg and how it relates to Project Nessie(17:06): What is a data lakehouse?(18:31): What gives Dremio its magic?(23:39): What cloud data architecture will look like in 5 years(27:19): Advice for building an open data architecture-------------------Links:LinkedIn - Connect with TomerLinkedIn - Connect with DremioTwitter - Follow TomerTwitter - Follow DremioVisit DremioGet started with Dremio
Interoperability, Governance, and Divergent Teams with Prukalpa Sankar
This episode features an interview with Prukalpa Sankar, Co-Founder of Atlan. Atlan is a venture-backed startup building a modern data workspace. Prukalpa also co-founded SocialCops, a data for good company behind landmark projects such as India’s National Data Platform. Prukalpa is a recognized industry leader, landing on the Forbes 30 Under 30 list and Fortune’s 40 Under 40.In this episode, Prukalpa and Sam discuss how diversity is a data team’s biggest strength, why governance isn’t always a bad thing, and what they hope the modern data stack will look like in 5 years.-------------------“Diversity is our biggest strength but our biggest weakness, because it's really hard to make that team collaborate. Because most of the teams in the world are very uniform. So when every single person in the room is a subject matter expert on something, nobody else actually can have oversight on each other's work because they've never done it before. Then how do you create true trust? How do you create trust when things are breaking? If you're able to create a way for these diverse people to collaborate really effectively, to be a dream team, a dream data team where they trust each other and they can collaborate effectively, then magic can happen.” – Prukalpa Sankar-------------------Episode Timestamps:[01:55]: What open source data means to Prukalpa[05:38]: Prukalpa’s journey to data for good movement[04:51]: How Prukalpa and her team provided gas to 80 million Indian women[06:33]: How diversity can help a data team succeed[15:10]: What gives Atlan its magic[18:58]: How being open by default influenced Atlan’s architecture choices[22:45]: The reality of the modern data stack in 5 years[27:36]: Advice for people getting started with DataOps-------------------Links:LinkedIn - Connect with PrukalpaLinkedIn - Connect with AtlanTwitter - Follow PrukalpaTwitter - Follow AtlanVisit Atlan

Trust, Automation, and Trade-Offs with Joseph Jacks
This episode features an interview with Joseph Jacks, Founder and General Partner of OSS Capital. OSS Capital is the first and only COSS (Commercial Open Source Software) company investor that focuses on supporting early-stage COSS founders. Joseph, also known as JJ, has worked at Mesosphere, TIBCO Software, and Talend in various sales, engineering, and strategy roles. In this episode, JJ and Sam weigh the trade-offs of open and closed core companies and discuss how each can go public. JJ also dives into the misconception of trust equating privacy within tech. Guest Quote [25:14]: “There’s a societal recognition that if you use technology to automate some part of your life and you use that regularly, you have to be able to trust it. And I think gradually, consumers are becoming more and more aware that one of the most effective ways of checking the trust box is answering the question, ‘Is the technology I'm using open source at the core, yes or no?’ And if the answer is no, I think it's very difficult and a lot harder to achieve the levels of trust that you can if the answer is yes.” – Joseph Jacks Time Stamps [12:59]: The difference between open and closed core companies [17:23]: Understanding the trade-off between open and closed source [18:23]: Trends within open source data companies [20:21]: Is it possible to go public as a closed source database? [22:35]: Leveraging the automation opportunity of open source systems [23:47]: How can consumers trust the technology they’re using? [34:01]: Advice for those starting open source projects Links LinkedIn - Connect with JJ LinkedIn - Connect with OSS Capital Twitter - Follow OSS Capital Visit OSS Capital See omnystudio.com/listener for privacy information.

Open Source, Adoptability, and Name Changes with Martin Traverso
This episode features an interview with Martin Traverso, CTO at Starburst Data and Co-founder of Trino, a lightning fast distributed SQL query engine. Martin was previously a software engineer at Facebook where he led the Presto (now Trino) development team. Trino has gained worldwide adoption from companies like Netflix, Amazon, and LinkedIn. In this episode, Martin sits down with Sam to discuss the barriers, advantages, and complications of going open-source. Episode Notes -Guest Quote [33:55]: “What makes Trino powerful is the ecosystem around it. You have integrations with all sorts of data sources and that’s part of the power and magic of Trino. You can pull data from all these data sources using a single interface. On the other end is the integrations with all the tools that everyone uses. Once you put all those pieces together, that’s what gives Trino the power.” -Time Stamps [8:38]: How Martin solved Facebook’s analytics problem [13:00]: How the team adapted to customers’ needs [17:07]: What makes Trino stand out among other query engines [19:42]: Going open-source changes the game [30:14]: Presto becomes Trino [33:24]: What gives Trino its magic [35:19]: What Trino’s community looks like today [38:34]: Advice for those starting open-source projects -Links Blog - Intro to Trino for the Trinewbie Trino Community Broadcast - Subscribe GitHub Trino repository - Give Trino a star LinkedIn - Connect with Martin Trino Meetup - Join Play with Trino Rebrand from Presto to Trino - Learn More Slack - Join Trino Trino: The Definitive Guide (Download a free copy) Twitter - Follow Martin Twitter - Follow Trino See omnystudio.com/listener for privacy information.

Season Two Finale and Recap with Open||Source||Data Producer Audra Montenegro
Join Open||Source||Data producer Audra Montenegro as she and Sam cover highlights and takeaways from the ten episodes of season two. And get a sneak peak of what's in store for season three!See omnystudio.com/listener for privacy information.

Embeddings, Feature stores, and MLOps with Simba Khadder
Join CEO of Featureform, Simba Khadder as he talks with Sam about how versioning, immutability, and sharing will accelerate ML workflows. Tune-in on state of the art collaboration in data teams, and the power of focusing on your north star.See omnystudio.com/listener for privacy information.

Abundance, Metadata, and Automation with Mark Grover
How can we make data 10X more accessible for data-driven people within data-driven companies? Tune in to Mark and Sam discussing probabilistic product management, and the emerging metadata ecosystem.See omnystudio.com/listener for privacy information.

Metadata, Communities, and Architecture with Shirshanka Das
How can we evolve an expanding ecosystem of data technologies while making sense of the whole? Tune in to LinkedIn DataHub, and Acryl Data founder, Shirshanka Das, as he and Sam have a discussion on metadata at the center and specialization at the edge to sustainably scale data governance.See omnystudio.com/listener for privacy information.

Data Management Pain Points and Future Solutions for Data Discovery
Data discovery is one of the hardest problems to solve in data management in general and comes up as a major pain point in most data mesh discussions. Tune in to this all-star expert panel recorded in collaboration with the Data Mesh community, and hosted by a previous Open||Source||Data podcast guest, Paco Nathan of Derwen.ai. Paco engages panelists, Shinji Kim (Select Star), Sophie Watson (Red Hat), Mark Grover (Stemma), and Shirshanka Das (Acryl Data) in a 60-minute discussion on not only Data Mesh, but other data strategies and process needs for the data discovery future.See omnystudio.com/listener for privacy information.

ModelOps, ML Monitoring, and Busy Humans with Elena Samuylova
It’s 2 AM - do you know what your models are doing? Listen to Elena Samuylova as she talks to us about how to bridge the critical gaps between data scientists, engineers, and business managers using tooling and empathy.See omnystudio.com/listener for privacy information.

Cloud-Native, Open-Source, and Collaborative with Eric Brewer and Melody Meckfessel
Google Fellow & VP of Infrastructure Eric Brewer, Observable CEO Melody Meckfessel, and DataStax Chief Strategy Officer Sam Ramji explore the state of the art, the near future, and grand challenges for the next decade in cloud-native data.See omnystudio.com/listener for privacy information.

MLOps, AIOps, and Data Startups with Jocelyn Goldfein
Dealing with data hyperabundance, solving economic problems for businesses and changing lives for the better. Tune-in to Managing Director at Zetta Venture Partners, Jocelyn Goldfein as she and Sam have a discussion around engineering leadership, organizational graph structures, and productization of AI.See omnystudio.com/listener for privacy information.

Git-Like Branch and Merge for Data with Einat Orr
What if you could version object storage just like code? Tune in to Einat Orr as she explains how CI/CD and data lineage are being transformed through versioning data, enabling sandboxes, safe rollbacks, and coherent history.See omnystudio.com/listener for privacy information.

Data Discoverability, Products, and User Diversity with Shinji Kim
Learn how an accelerating abundance of data can be harnessed through telemetry. Tune-in while Shinji Kim and Sam explore opening data to more users, PageRank for tables, and pragmatic use of data lineage to find value.See omnystudio.com/listener for privacy information.

Data Observability, Customer-Led Growth, and Confidence with Barr Moses
Barr Moses discusses with Sam about bringing DevOps into Data Engineering, building a data startup, and letting joy guide your way to creating impact. Learn how being data-driven depends on systems of people and trust.See omnystudio.com/listener for privacy information.

Open Source Data & Its Role in the Future of Technology: Season 1 Recap
Wrapping up Season 1, Open||Source||Data producer Audra Montenegro Carter joins Sam Ramji in a conversation about the inspiration and behind-the-scenes production of the podcast, touching upon the top takeaways and lessons learned with Season 1 guests from AWS, Microsoft, ThoughtWorks, Deloitte, Observable, and many more.See omnystudio.com/listener for privacy information.

Data Visualization, Democratization, and Javascript with Melody Meckfessel
Observable Co-Founder and CEO Melody Meckfessel joins Sam in a conversation on how millions of developers are changing how we experience data. Listen-in as Melody explains the importance of data literacy and the shift in data collaboration.See omnystudio.com/listener for privacy information.

DataOps, MLOps, and Self Service: How Data Teams are Changing
Join Data Institute's Managing Director, Jesse Anderson to learn how data teams are changing in response to overwhelming demand for data products.Tune in as he and Sam discuss bringing software engineering into the domain of data - and why he wrote Data Teams.See omnystudio.com/listener for privacy information.

Fabrics, Meshes, and Graphs with Deloitte Principal Dave Thomas
Join Dave and Sam as they discuss data sets evolving from finite to infinite, and finding the needle in the haystack with math. Listen to Dave talk about cutting edge data problems and the essential need for curious people.See omnystudio.com/listener for privacy information.

Metadata, Graphs, and Responsible AI with Paco Nathan
Data Science player and coach, Author, and Venture Amplifier Paco Nathan talks with Sam Ramji about Hybrid AI, mathematical reversibility, and using AI to solve knowledge problems that the exponential growth of data will create for years to come. Join these two as they discuss how you can bring multiple data disciplines together using empathy and math.See omnystudio.com/listener for privacy information.

Data Analytics: Hard Skills vs Soft Skills and the Gift of Thinking Different
Analytics manager, and Women in Data podcast producer and host Karen Jean-Francois walks us through the differences between Data Science and Analytics. Join her and Sam as they discuss valuable skills you’ll need when transitioning to a career in Data Analytics. Hear Karen’s perspective on the benefits of thinking differently and having a mentor to guide you through transitions.See omnystudio.com/listener for privacy information.

Global Connectivity: Share and Democratize Through Open Data
Co Founder and Director of WiDS, and Stanford Professor Margot Gerritsen joins Sam Ramji in a conversation about how a data community provides global connectivity, and how learning is all about seeking discomfort with uncertainty and ambiguity. Learn how data is the new gold, but rather than sitting on the mine - share the wealth through a career in Data Science.See omnystudio.com/listener for privacy information.

From DBA to SRE: 2021 Predictions for Data on Kubernetes
With data comes DBAs and with Kubernetes comes SREs. Listen in as Patrick McFadin and Sam discuss what’s in store in 2021 for Data on Kubernetes, how experienced DBA roles can evolve into very effective SREs, and why today is THE day to learn Kubernetes.See omnystudio.com/listener for privacy information.

Open Source’s Impact in Academia with Open@RIT's Stephen Jacobs
From a 10-page white paper to creating one of the first University OSPOs - Stephen Jacobs will take us through the 12 years of work it took him to launch a program like Open@RIT. Join Stephen and Sam as they discuss the impact an OSPO has on students' futures, as well as a University's surrounding communities.See omnystudio.com/listener for privacy information.

Data Meshes: Big Data Architecture Becoming Distributed, Declarative and Domain Oriented
Beyond The Data Lake was Director of Emerging Technologies at ThoughtWorks, Zhamak Dehghani's 2017 paper that was a guiding light for Sam Ramji at another point in his career. Listen to how a Data Mesh allows composition of multi-model data across an organization and beyond.See omnystudio.com/listener for privacy information.

Data on Kubernetes: Platform, Resource, and Ecosystem tooling with Microsoft Azure’s Lachlan Evenson
How do we create free and open data sets that are trustworthy? Microsoft Azure’s Principal Program Manager Lachlan Evenson and Sam Ramji discuss standards for accessing data, and the magic that can happen with data on Kubernetes.See omnystudio.com/listener for privacy information.

Data, Kubernetes, and Our Best Selves with Google’s Kelsey Hightower
Inspire, collaborate, and solve together. Google Cloud Principal Engineer, Kelsey Hightower joins Sam Ramji to discuss the future of Data and Kubernetes, and what it means to participate in a welcoming developer community, while igniting positive growth.See omnystudio.com/listener for privacy information.

Culture and Cognition in DevOps with Alchemist Accelerator’s Rachel Chalmers
Sam invites Rachel Chalmers, an investor, advisor, and technology industry analyst for over 20 years, for a candid conversation about the DevOps culture of shared purpose and blamelessness. Sam and Rachel explore how process, trust, and care for each other creates more innovation and gives us the opportunity to change and grow as human beings.See omnystudio.com/listener for privacy information.

Open Source Sustainability with AWS Exec + Tech Columnist Matt Asay
Matt Asay shares his journey through open source and behind-the-scenes stories on what gives these communities their strength: its people and their voices.See omnystudio.com/listener for privacy information.

Storytelling in Product Development with Google’s Patricia Boswell
Behind every great product is a great story. Sam invites Google Staff Technical Writer Patricia Boswell to discuss her role of technical writing in software and the importance of using narrative as a North Star when designing a product.See omnystudio.com/listener for privacy information.

Introducing Open||Source||Data with Sam Ramji
What can we learn from cloud-native development and how can we share that with developers, engineers, product owners, and product managers of the new world? Join DataStax Chief Strategy Officer and 25-year open source veteran, Sam Ramji, as he interviews innovators who are shaping the future of open source data, open source software, data on Kubernetes, data in DevOps, data in AI, and much more.See omnystudio.com/listener for privacy information.