De-Identification in Multimodal Medical Data (Text, PDF, DICOM) to stay HIPAA & GDPR Compliant episode artwork

EPISODE · Jul 25, 2025 · 15 MIN

De-Identification in Multimodal Medical Data (Text, PDF, DICOM) to stay HIPAA & GDPR Compliant

from The Healthcare AI Podcast · host John Snow Labs

Explore regulatory‑grade multimodal data de‑identification and tokenisation with Youssef Mellah, PhD, Senior Data Scientist at John Snow Labs and Srikanth Kumar Rana, Solutions Architect at Databricks. Learn how to remove, mask or transform PHI across clinical notes, tables, PDFs and DICOMs at scale, while meeting HIPAA, GDPR and CCPA standards — all without sacrificing analytical value. Timestamps00:00 – Welcome & Episode Overview02:43 – How Databricks supports secure De‑identification workflows03:50 – Built-in techniques: masking, encryption, hashing05:26 – Introduction to Multimodal Data De-Identification07:15 – OCR + NLP pipeline for visual & text data – PHI Extraction08:35 – Notebook demo: PHI identification in clinical notes12:00 – PDF de-identification12:56 – DICOM file de-identification14:18 – Output: consistent masking across all modalitiesListen on your favourite platform: • YouTube: https://www.youtube.com/playlist?list=PL5zieHHAlvApZKkwtu746ivthRc5zyTiU • ⁠⁠Apple Podcast⁠⁠: https://podcasts.apple.com/us/podcast/the-healthcare-ai-podcast/id1827098175• ⁠⁠Spotify⁠⁠: https://open.spotify.com/show/2XNrQBeCY7OGql2jVhcP7t • Amazon Music⁠⁠: https://music.amazon.com/podcasts/5b1f49a6-dba8-479e-acdf-9deac2f8f60e/the-healthcare-ai-podcastResources:• John Snow Labs Models Hub: https://nlp.johnsnowlabs.com/models• Spark NLP Workshop Repo: https://github.com/JohnSnowLabs/spark-nlp-workshop• Visual NLP Workshop Repo: https://github.com/JohnSnowLabs/visual-nlp-workshop• JSL Docs: https://nlp.johnsnowlabs.com/docs• JSL Live Demos: https://nlp.johnsnowlabs.com/demos• JSL Learning Hub: https://nlp.johnsnowlabs.com/learnConnect with us: Our website: https://www.johnsnowlabs.com/LinkedIn: https://www.linkedin.com/company/johnsnowlabs/ Facebook: https://www.facebook.com/JohnSnowLabsInc/X: https://x.com/JohnSnowLabs#HealthcareAI #DataPrivacy #HIPAA #PHI #DeIdentification #MedicalAI #GDPR #HealthTech #MultimodalAI

NOW PLAYING

De-Identification in Multimodal Medical Data (Text, PDF, DICOM) to stay HIPAA & GDPR Compliant

0:00 15:15

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Frequently Asked Questions

How long is this episode of The Healthcare AI Podcast?

This episode is 15 minutes long.

When was this The Healthcare AI Podcast episode published?

This episode was published on July 25, 2025.

What is this episode about?

Explore regulatory‑grade multimodal data de‑identification and tokenisation with Youssef Mellah, PhD, Senior Data Scientist at John Snow Labs and Srikanth Kumar Rana, Solutions Architect at Databricks. Learn how to remove, mask or transform PHI...

Can I download this The Healthcare AI Podcast episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!