Even GenAI uses Wikipedia as a source episode artwork

EPISODE · Feb 20, 2026 · 26 MIN

Even GenAI uses Wikipedia as a source

from The Stack Overflow Podcast

Ryan is joined by Philippe Saade, the AI project lead at Wikimedia Deutschland, to dive into the Wikidata Embedding Project and how their team vectorized 30 million of Wikidata’s 119 million entries for semantic search. They discuss how this project helped offload the burden that scraping was creating for their sites, what Wikimedia.DE is doing to maintain data integrity for their entries, and the importance of user feedback even as they work to bring Wikipedia’s vast knowledge to people building open-source AI projects. Episode notes: Wikimedia.DE announced the Wikidata Embedding Project with MCP support in October of last year. Check out their vector database and codebase for the project. Connect with Philippe on LinkedIn and his Wiki page. Today’s shoutout goes to an Unsung Hero on Stack Overflow—someone who has more than 10 accepted answers with a zero score, making up 25% of their total. Thank you to user MWB for bringing your knowledge to the community!TRANSCRIPTSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Ryan is joined by Philippe Saade, the AI project lead at Wikimedia Deutschland, to dive into the Wikidata Embedding Project and how their team vectorized 30 million of Wikidata’s 119 million entries for semantic search. They discuss how this project helped offload the burden that scraping was creating for their sites, what Wikimedia.DE is doing to maintain data integrity for their entries, and the importance of user feedback even as they work to bring Wikipedia’s vast knowledge to people building open-source AI projects. Episode notes: Wikimedia.DE announced the Wikidata Embedding Project with MCP support in October of last year. Check out their vector database and codebase for the project. Connect with Philippe on LinkedIn and his Wiki page. Today’s shoutout goes to an Unsung Hero on Stack Overflow—someone who has more than 10 accepted answers with a zero score, making up 25% of their total. Thank you to user MWB for bringing your knowledge to the community!TRANSCRIPT See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

NOW PLAYING

Even GenAI uses Wikipedia as a source

0:00 26:54

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Frequently Asked Questions

How long is this episode of The Stack Overflow Podcast?

This episode is 26 minutes long.

When was this The Stack Overflow Podcast episode published?

This episode was published on February 20, 2026.

What is this episode about?

Ryan is joined by Philippe Saade, the AI project lead at Wikimedia Deutschland, to dive into the Wikidata Embedding Project and how their team vectorized 30 million of Wikidata’s 119 million entries for semantic search. They discuss how this project...

Can I download this The Stack Overflow Podcast episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!