Docling: Get your documents ready for generative AI (sps25) episode artwork

EPISODE · Oct 17, 2025 · 33 MIN

Docling: Get your documents ready for generative AI (sps25)

from Chaos Computer Club - recent events feed · host Peter Staar

Docling is an open-source Python package that simplifies document processing by parsing diverse formats — including advanced PDF understanding — and integrating seamlessly with the generative AI ecosystem. It supports a wide range of input types such as PDFs, DOCX, XLSX, HTML, and images, offering rich parsing capabilities including reading order, table structure, code, and formulas. Docling provides a unified and expressive DoclingDocument format, enabling easy export to Markdown, HTML, and lossless JSON. It offers plug-and-play integrations with popular frameworks like LangChain, LlamaIndex, Crew AI, and Haystack, along with strong local execution support for sensitive data and air-gapped environments. As a Python package, Docling is pip-installable and comes with a clean, intuitive API for both programmatic and CLI-based workflows, making it easy to embed into any data pipeline or AI stack. Its modular design also supports extension and customization for enterprise use cases. We also introduce SmolDocling, an ultra-compact 256M parameter vision-language model for end-to-end document conversion. SmolDocling generates a novel markup format called DocTags that captures the full content, structure, and spatial layout of a page, and offers accurate reproduction of document features such as tables, equations, charts, and code across a wide variety of formats — all while matching the performance of models up to 27× larger. about this event: https://talks.python-summit.ch/sps25/talk/QJLGCZ/

Docling is an open-source Python package that simplifies document processing by parsing diverse formats — including advanced PDF understanding — and integrating seamlessly with the generative AI ecosystem. It supports a wide range of input types such as PDFs, DOCX, XLSX, HTML, and images, offering rich parsing capabilities including reading order, table structure, code, and formulas. Docling provides a unified and expressive DoclingDocument format, enabling easy export to Markdown, HTML, and lossless JSON. It offers plug-and-play integrations with popular frameworks like LangChain, LlamaIndex, Crew AI, and Haystack, along with strong local execution support for sensitive data and air-gapped environments. As a Python package, Docling is pip-installable and comes with a clean, intuitive API for both programmatic and CLI-based workflows, making it easy to embed into any data pipeline or AI stack. Its modular design also supports extension and customization for enterprise use cases. We also introduce SmolDocling, an ultra-compact 256M parameter vision-language model for end-to-end document conversion. SmolDocling generates a novel markup format called DocTags that captures the full content, structure, and spatial layout of a page, and offers accurate reproduction of document features such as tables, equations, charts, and code across a wide variety of formats — all while matching the performance of models up to 27× larger. about this event: https://talks.python-summit.ch/sps25/talk/QJLGCZ/

NOW PLAYING

Docling: Get your documents ready for generative AI (sps25)

0:00 33:08

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

LIGHTS, CAMERA, SMILE! Creatives Club Media Lights, Camera, Smile, is a podcast for anyone with a dream to share something with the world, out of the overflow of themselves - be it their mind, their heart, their personalities, and much more. Each of us are alive in this moment in time, with an innate ability to have ideas and create various things to benefit both ourselves and the people around us for a reason, and here, you will find the encouragement, the inspiration, and the motivation to do just that. Hosted by Cicily, founder of Creatives Club, she dives into various topics surrounding creativity and business. Exploring entrepreneurship for creatives in a corporate reality, sharing tips and tricks in a media centered company, answering questions regarding what a creative actually is are just a few of the things discussed on this podcast. Be encouraged to create for yourself as Cicily gets vulnerable by pivoting the camera to herself for the first time.To submit questions for Cicily to answer, or have her address certain t The PFN Cincinnati Bengals Podcast Pro Football Network The PFN Cincinnati Bengals Podcast is where you can stay up-to-date with the latest news and analysis on the Cincinnati Bengals! Our hosts, industry experts Jay Morrison and Dallas Robinson, provide weekly coverage of all the latest rumors and updates about the Bengals. Don’t forget to follow the show to receive new episodes directly in your podcast feed and leave a rating and review to let us know your thoughts. Piramidi Club The Bitcoin Butcher La Migliore Pizza di Firenze IT IS WHAT IT IS with SHALLZ - SHALLY ZOMORODI Shally Zomorodi What?  "It is what it is" with ShallZ – Shally ZomorodiWhen? WeeklyHow long? 35 minutesEvery week, Mother of 4, wife, morning TV news anchor and ultimate hostess, Shally Zomorodi talks about life - its up's and downs and how to stay on track in her weekly podcast, ‘It is what it is.’  Known for her high energy, infectious smile and ability to see the cup as half full Shally talks about all things in life and how to work through its challenges. From parenting, marriage, friendships, current events to how to smile when it just seems impossible ‘It is what it is’ is the perfect podcast to help inspire you to dance through the rain.

Frequently Asked Questions

How long is this episode of Chaos Computer Club - recent events feed?

This episode is 33 minutes long.

When was this Chaos Computer Club - recent events feed episode published?

This episode was published on October 17, 2025.

What is this episode about?

Docling is an open-source Python package that simplifies document processing by parsing diverse formats — including advanced PDF understanding — and integrating seamlessly with the generative AI ecosystem. It supports a wide range of input types...

Can I download this Chaos Computer Club - recent events feed episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!