Stealth Web Scraping Techniques for OSINT (WHY2025) episode artwork

EPISODE · Aug 9, 2025 · 52 MIN

Stealth Web Scraping Techniques for OSINT (WHY2025)

from Chaos Computer Club - recent events feed (high quality) · host Soukaina Cherrabi, François GRANIER

Web scraping continues to be a cornerstone of OSINT operations, particularly during Red Team engagements and external attack surface reconnaissance. Yet, as anti-bot technologies grow more sophisticated, traditional scraping methods based on direct HTTP requests are increasingly ineffective. This talk takes a technical dive into browser-based scraping techniques that closely mimic real user behavior to evade detection, inspired by real-world mechanisms observed across major web platforms. In Red Team operations and external attack surface assessments, open-source intelligence (OSINT) is a critical step for identifying internet-exposed assets and assessing the associated risks. One of the most common techniques in this phase is web scraping, which automates the collection of publicly available data—often without relying on official APIs that are frequently rate-limited, monitored, or entirely unavailable. In previous conferences, such as Fabien Vauchelles’s talk "Cracking the Code: Decoding Anti-Bot Systems", the focus was on detecting scraping activities at the network layer using TCP/IP fingerprinting and IP intelligence. This presentation builds on that work by shifting the focus to client-side techniques—specifically, browser-based approaches that mimic legitimate user behavior to evade detection. The objective of this session is to explore modern strategies for conducting stealthy web scraping by avoiding API usage and minimizing anomalies detectable at both the network and application layers. Based on real-world use cases, the talk aims to provide actionable insights for security professionals involved in scraping—whether performing it or defending against it.The talk will present concrete methods for data collection, including: - Making direct HTTP/HTTPS requests to web servers—such as websites or HTTP-based services—using libraries that handle protocol-level communication. This method allows efficient data retrieval by bypassing the need to render the page or load additional resources like images, videos, stylesheets, or scripts. It’s fast and lightweight, especially suited for static or partially dynamic content. - Leveraging headless browsers to simulate real browser behavior without a graphical interface. These tools embed full HTML, CSS, and JavaScript engines, enabling interaction with modern, dynamic web applications. This technique is essential when scraping content that relies on client-side rendering or asynchronous JavaScript operations. - Using browser-side scripting tools, such as TamperMonkey, within standard browsers. These tools allow custom JavaScript code to be injected and executed directly on the page, offering a practical and discreet way to automate data collection from within the browsing environment itself. This technique has been successfully applied in large-scale scraping operations, including on major social networks where traditional approaches are often ineffective due to advanced client-side defenses. Beyond the scraping techniques themselves, the presentation will also cover the current detection methods employed by websites to identify automated behavior and how these can be bypassed, including: - Detection of automation environments via specific JavaScript variables (e.g., navigator.webdriver) or discrepancies in the DOM. - Behavioral detection mechanisms such as mouse movements, keyboard activity, or interaction timing. - Identification of scraping-specific browser extensions or content injection tools. - Detection of headless execution environments using debugging interfaces or timing-based heuristics. This talk will provide a technically grounded exploration of the current capabilities and limitations of stealth web scraping from both offensive and defensive perspectives. Licensed to the public under https://creativecommons.org/licenses/by/4.0/ about this event: https://program.why2025.org/why2025/talk/7DMBVR/

Web scraping continues to be a cornerstone of OSINT operations, particularly during Red Team engagements and external attack surface reconnaissance. Yet, as anti-bot technologies grow more sophisticated, traditional scraping methods based on direct HTTP requests are increasingly ineffective. This talk takes a technical dive into browser-based scraping techniques that closely mimic real user behavior to evade detection, inspired by real-world mechanisms observed across major web platforms. In Red Team operations and external attack surface assessments, open-source intelligence (OSINT) is a critical step for identifying internet-exposed assets and assessing the associated risks. One of the most common techniques in this phase is web scraping, which automates the collection of publicly available data—often without relying on official APIs that are frequently rate-limited, monitored, or entirely unavailable. In previous conferences, such as Fabien Vauchelles’s talk "Cracking the Code: Decoding Anti-Bot Systems", the focus was on detecting scraping activities at the network layer using TCP/IP fingerprinting and IP intelligence. This presentation builds on that work by shifting the focus to client-side techniques—specifically, browser-based approaches that mimic legitimate user behavior to evade detection. The objective of this session is to explore modern strategies for conducting stealthy web scraping by avoiding API usage and minimizing anomalies detectable at both the network and application layers. Based on real-world use cases, the talk aims to provide actionable insights for security professionals involved in scraping—whether performing it or defending against it.The talk will present concrete methods for data collection, including: - Making direct HTTP/HTTPS requests to web servers—such as websites or HTTP-based services—using libraries that handle protocol-level communication. This method allows efficient data retrieval by bypassing the need to render the page or load additional resources like images, videos, stylesheets, or scripts. It’s fast and lightweight, especially suited for static or partially dynamic content. - Leveraging headless browsers to simulate real browser behavior without a graphical interface. These tools embed full HTML, CSS, and JavaScript engines, enabling interaction with modern, dynamic web applications. This technique is essential when scraping content that relies on client-side rendering or asynchronous JavaScript operations. - Using browser-side scripting tools, such as TamperMonkey, within standard browsers. These tools allow custom JavaScript code to be injected and executed directly on the page, offering a practical and discreet way to automate data collection from within the browsing environment itself. This technique has been successfully applied in large-scale scraping operations, including on major social networks where traditional approaches are often ineffective due to advanced client-side defenses. Beyond the scraping techniques themselves, the presentation will also cover the current detection methods employed by websites to identify automated behavior and how these can be bypassed, including: - Detection of automation environments via specific JavaScript variables (e.g., navigator.webdriver) or discrepancies in the DOM. - Behavioral detection mechanisms such as mouse movements, keyboard activity, or interaction timing. - Identification of scraping-specific browser extensions or content injection tools. - Detection of headless execution environments using debugging interfaces or timing-based heuristics. This talk will provide a technically grounded exploration of the current capabilities and limitations of stealth web scraping from both offensive and defensive perspectives. Licensed to the public under https://creativecommons.org/licenses/by/4.0/ about this event: https://program.why2025.org/why2025/talk/7DMBVR/

NOW PLAYING

Stealth Web Scraping Techniques for OSINT (WHY2025)

0:00 52:04

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

No similar episodes found.

LIGHTS, CAMERA, SMILE! Creatives Club Media Lights, Camera, Smile, is a podcast for anyone with a dream to share something with the world, out of the overflow of themselves - be it their mind, their heart, their personalities, and much more. Each of us are alive in this moment in time, with an innate ability to have ideas and create various things to benefit both ourselves and the people around us for a reason, and here, you will find the encouragement, the inspiration, and the motivation to do just that. Hosted by Cicily, founder of Creatives Club, she dives into various topics surrounding creativity and business. Exploring entrepreneurship for creatives in a corporate reality, sharing tips and tricks in a media centered company, answering questions regarding what a creative actually is are just a few of the things discussed on this podcast. Be encouraged to create for yourself as Cicily gets vulnerable by pivoting the camera to herself for the first time.To submit questions for Cicily to answer, or have her address certain t Chewing the Fat with WorkForge WorkForge Bite-Sized Conversations for Building a Stronger Workforce Welcome to Chewing the Fat, a podcast delving deep into the world of food manufacturing. Dive into real conversations around critical topics like staffing, retention, onboarding, and career development in this essential industry. Subscribe now to gain insights from your peers, subject matter experts and more on the biggest issues facing food manufacturers today: -Hiring and retaining employees -Addressing the challenges of the Silver Tsunami -Improving time to productivity of new employees -Engaging employees from hire to retire And more... Tune in to Chewing the Fat, a WorkForge podcast, and join the conversation on how to build and sustain a resilient, high-performing workforce in food manufacturing. Sermons | Countryside Bible Church Countryside Bible Church At Countryside Bible Church, we equip believers to joyfully live holy lives, to serve one another, and to share the gospel of Jesus Christ, all to the glory of God. We are committed to a high view of God, and a high view of Scripture. The PFN Cincinnati Bengals Podcast Pro Football Network The PFN Cincinnati Bengals Podcast is where you can stay up-to-date with the latest news and analysis on the Cincinnati Bengals! Our hosts, industry experts Jay Morrison and Dallas Robinson, provide weekly coverage of all the latest rumors and updates about the Bengals. Don’t forget to follow the show to receive new episodes directly in your podcast feed and leave a rating and review to let us know your thoughts.

Frequently Asked Questions

How long is this episode of Chaos Computer Club - recent events feed (high quality)?

This episode is 52 minutes long.

When was this Chaos Computer Club - recent events feed (high quality) episode published?

This episode was published on August 9, 2025.

What is this episode about?

Web scraping continues to be a cornerstone of OSINT operations, particularly during Red Team engagements and external attack surface reconnaissance. Yet, as anti-bot technologies grow more sophisticated, traditional scraping methods based on direct...

Can I download this Chaos Computer Club - recent events feed (high quality) episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!