E134: Making Complex Data RAG-Ready with Unstructured episode artwork

EPISODE · May 20, 2024 · 37 MIN

E134: Making Complex Data RAG-Ready with Unstructured

from Open Source Startup Podcast

Brian Raymond is Founder & CEO of Unstructured, the platform to extract and transform complex data for use with every major vector database and LLM framework. Their open source project has 7K stars on GitHub and includes libraries and APIs that let users build custom preprocessing pipelines for labeling, training, and production machine learning pipelines. Today, they have over 6M downloads and 50K companies using their tools. Unstructured has raised $65M from investors including Bain, Essence VC, and Menlo Ventures. In this episode, we dig into Brian's process of talking to 100 data scientists before launching Unstructured, why the long tail of data matters for LLMs, competing with their own open source, why being a "boring company" is valuable for today's LLM stack, why they liked having government design partners, why world-class design & marketing are huge differentiators for open source companies & more!

Brian Raymond is Founder & CEO of Unstructured, the platform to extract and transform complex data for use with every major vector database and LLM framework. Their open source project has 7K stars on GitHub and includes libraries and APIs that let users build custom preprocessing pipelines for labeling, training, and production machine learning pipelines. Today, they have over 6M downloads and 50K companies using their tools. Unstructured has raised $65M from investors including Bain, Essence VC, and Menlo Ventures. In this episode, we dig into Brian's process of talking to 100 data scientists before launching Unstructured, why the long tail of data matters for LLMs, competing with their own open source, why being a "boring company" is valuable for today's LLM stack, why they liked having government design partners, why world-class design & marketing are huge differentiators for open source companies & more!

NOW PLAYING

E134: Making Complex Data RAG-Ready with Unstructured

0:00 37:06

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Breaking News Show | eTurboNews Juergen Thomas Steinmetz News is relevant to the global travel and tourism industry, human rights and global issues.Breaking news when it happens and only from the source. That Hoarder: Overcome Compulsive Hoarding That Hoarder Hoarding disorder is stigmatised and people who hoard feel vast amounts of shame. This podcast began life as an audio diary, an anonymous outlet for somebody with this weird condition. That Hoarder speaks about her experiences living with compulsive hoarding, she interviews therapists, academics, researchers, children of hoarders, professional organisers and influencers, and she shares insight and tips for others with the problem. Listened to by people who hoard as well as those who love them and those who work with them, Overcome Compulsive Hoarding with That Hoarder aims to shatter the stigma, share the truth and speak openly and honestly to improve lives. The Small Business Startup School – Business Notes | Financial Literacy | Retail Psychology – For Professionals & Entrepreneurs The Small Business Startup School Inc. Starting or buying a small business? While personal circumstances may vary, business patterns remain timeless. On The Small Business Startup School, we explore strategies, insights, and practical solutions to help entrepreneurs confidently navigate their journey.Hosted by Ola Williams—a retail entrepreneur, fintech founder, and financial coach with over two decades of experience—this podcast marries financial awareness and retail psychology with optimism to deliver actionable takeaways.Join us to learn, grow, and connect as we uncover the keys to business success.Let’s continue to learn together and be encouraged to keep on connecting! DIOSA. Carolina Sanper This podcast is a sacred space created by Carolina Sanper where you connect with your inner wisdom and embody your magnetic feminine power.It is the realization that the mystical realm is where you plant the seeds of your desired reality.It is a portal to your true essence: awareness, presence, and receiving with ease. Welcome home, DIOSA. 🖤

Frequently Asked Questions

How long is this episode of Open Source Startup Podcast?

This episode is 37 minutes long.

When was this Open Source Startup Podcast episode published?

This episode was published on May 20, 2024.

What is this episode about?

Brian Raymond is Founder & CEO of Unstructured, the platform to extract and transform complex data for use with every major vector database and LLM framework. Their open source project has 7K stars on GitHub and includes libraries and APIs that let...

Can I download this Open Source Startup Podcast episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!