06-05, 14:10–14:40 (Europe/Madrid), Auditorium
Use of regular expressions for searching and parsing text is very common, but it can be dangerous. Innocent-looking searches may turn out to be very slow on specially-crafted inputs, and if such inputs can be provided by users, that is called a REDoS vulnerability. This talk is about the causes of such slowness, possible fixes and prevention.
Regular expressions provide a powerful search mechanism. The theory behind them promises the search can also be efficient, but practice (and specifically the stdlib re
library) deviates from theory.
We'll start from a vulnerability and its fix (with the titular 20,000x improvement).
To explain it:
- We will show the regex/state-machine equivalence, and the promise of linear time
- We will explore the features of Python regex's and how they break the promise
- We will discuss features which seem like they don't have to break the promise, but do anyway
- Then we'll analyze the fix and the speedup
Finally, we'll introduce re2
which can completely prevent the problem, at some cost.
Video: https://youtu.be/jYIw4JD7Nko
Django Internals, General Python, Security
Audience Level –Advanced
35 years in software development, over 25 years using Python. A member of the Django Security Team. Co-founder of PyCon Israel, and celebrating a decade of attending DjangoCon Europe.
Working as a consultant.
Github: shaib
Fediverse: @shaib@tooot.im
Mail: shai@platonix.com