From WebSites to Datasets: Unleashing the Power of Data Harvesting with Python
09-09, 14:30–16:30 (Europe/Lisbon), Workshop I

Learn how to develop multiparadigm web scrapers and crawlers leveraging an async model. Discover how to extract valuable information from websites using Python's environment and powerful tools. Master the art of collecting data at scale. Join this exhilarating journey of web exploration.


We will enbark in a captivating journey of extracting and parsing data from the vast online universe. I'll be talking a bit about web app reverse engineering for understanding a bit of how the web is modelled and how we think of it when building scrapers. Present some open source libraries that will help us get to our solution. Will introduce some OOP design concepts when building our Scraper and Crawler and how we can publish our first open source PyPi python library. Then finalise with some insights about customisation and efficiency in terms of execution speed and memory.

See also: Slides (33.9 MB)

A developer with a big interest on how the web works and about the processes of developing utility tools around it, be it extensions or backends for those.