2025-07-25 –, Auditorium
Discover how Ibis provides a single DataFrame API across pandas, DuckDB, and SQL databases, bridging the gap between data science prototypes and production code.
The data ecosystem in Python is fragmented: data scientists typically use pandas or polars, engineers favor SQL or PySpark, and web developers depend on ORMs. This fragmentation leads to silos where code cannot be shared, prototypes have to be rewritten for production, and knowledge transfer is hindered, even though everyone is working with the same underlying data.
As a contributor to Ibis for the past two years, I'll demonstrate how this portable DataFrame library provides a unified API across pandas, DuckDB, PostgreSQL, and more, enabling true collaboration between teams without sacrificing performance.
The presentation will begin with a practical demonstration of the same data query written three different ways - in pandas, SQL, and an ORM - highlighting the costs of this fragmentation. Attendees will then see an overview of the tools different roles typically use and their limitations through a simple demo task.
Next, the talk will dive into Ibis core concepts, explaining its architecture that separates interface from execution engine, its deferred execution model, and support for 20+ backends. This section includes a short demo of building an analytics query step by step.
The heart of the presentation will be two real-world examples: first, taking a data exploration workflow to production using the same Ibis code across pandas, DuckDB, and PostgreSQL with performance comparisons; second, integrating analytics into web applications, showing how data science and web development teams can share code.
The talk will conclude with strategic considerations on when to leverage native SQL capabilities and performance optimization tips.
Attendees will learn to write Pythonic data code that works across multiple engines, strategically use SQL while keeping most code in Python, share code between prototyping and production, and process data significantly faster than with pandas alone.
This talk is for data scientists, engineers, web developers, and anyone working with tabular data. No specific SQL knowledge is required, though familiarity with pandas or other DataFrame libraries will be helpful.
Beginner
What are the main topics of your talk? –Data Science and Data Engineering
Hi, I'm Daniel, a software engineer based in Barcelona with a knack for Data Science and Machine Learning.
I've worked in startups and scale-ups in Brazil, Germany, and the US, mostly building software with Python and Java.
I'm an open-source enthusiast and I've contributed to projects such as dask, xarray, geopandas, ibis, and datafusion-python bindings.