EPISODE · Jul 25, 2025 · 15 MIN
De-Identification in Multimodal Medical Data (Text, PDF, DICOM) to stay HIPAA & GDPR Compliant
from The Healthcare AI Podcast · host John Snow Labs
Explore regulatory‑grade multimodal data de‑identification and tokenisation with Youssef Mellah, PhD, Senior Data Scientist at John Snow Labs and Srikanth Kumar Rana, Solutions Architect at Databricks. Learn how to remove, mask or transform PHI across clinical notes, tables, PDFs and DICOMs at scale, while meeting HIPAA, GDPR and CCPA standards — all without sacrificing analytical value. Timestamps00:00 – Welcome & Episode Overview02:43 – How Databricks supports secure De‑identification workflows03:50 – Built-in techniques: masking, encryption, hashing05:26 – Introduction to Multimodal Data De-Identification07:15 – OCR + NLP pipeline for visual & text data – PHI Extraction08:35 – Notebook demo: PHI identification in clinical notes12:00 – PDF de-identification12:56 – DICOM file de-identification14:18 – Output: consistent masking across all modalitiesListen on your favourite platform: • YouTube: https://www.youtube.com/playlist?list=PL5zieHHAlvApZKkwtu746ivthRc5zyTiU • Apple Podcast: https://podcasts.apple.com/us/podcast/the-healthcare-ai-podcast/id1827098175• Spotify: https://open.spotify.com/show/2XNrQBeCY7OGql2jVhcP7t • Amazon Music: https://music.amazon.com/podcasts/5b1f49a6-dba8-479e-acdf-9deac2f8f60e/the-healthcare-ai-podcastResources:• John Snow Labs Models Hub: https://nlp.johnsnowlabs.com/models• Spark NLP Workshop Repo: https://github.com/JohnSnowLabs/spark-nlp-workshop• Visual NLP Workshop Repo: https://github.com/JohnSnowLabs/visual-nlp-workshop• JSL Docs: https://nlp.johnsnowlabs.com/docs• JSL Live Demos: https://nlp.johnsnowlabs.com/demos• JSL Learning Hub: https://nlp.johnsnowlabs.com/learnConnect with us: Our website: https://www.johnsnowlabs.com/LinkedIn: https://www.linkedin.com/company/johnsnowlabs/ Facebook: https://www.facebook.com/JohnSnowLabsInc/X: https://x.com/JohnSnowLabs#HealthcareAI #DataPrivacy #HIPAA #PHI #DeIdentification #MedicalAI #GDPR #HealthTech #MultimodalAI
NOW PLAYING
De-Identification in Multimodal Medical Data (Text, PDF, DICOM) to stay HIPAA & GDPR Compliant
No transcript for this episode yet
Similar Episodes
Mar 26, 2026 ·1m
Mar 19, 2026 ·34m
Feb 18, 2026 ·11m
Feb 11, 2026 ·45m