Careful what you search for! - or, how to make a computation 20,000 times faster
06-05, 14:10–14:40 (Europe/Madrid), Auditorium

Use of regular expressions for searching and parsing text is very common, but it can be dangerous. Innocent-looking searches may turn out to be very slow on specially-crafted inputs, and if such inputs can be provided by users, that is called a REDoS vulnerability. This talk is about the causes of such slowness, possible fixes and prevention.


Regular expressions provide a powerful search mechanism. The theory behind them promises the search can also be efficient, but practice (and specifically the stdlib re library) deviates from theory.
We'll start from a vulnerability and its fix (with the titular 20,000x improvement).
To explain it:
- We will show the regex/state-machine equivalence, and the promise of linear time
- We will explore the features of Python regex's and how they break the promise
- We will discuss features which seem like they don't have to break the promise, but do anyway
- Then we'll analyze the fix and the speedup

Finally, we'll introduce re2 which can completely prevent the problem, at some cost.

Video: https://youtu.be/jYIw4JD7Nko


Topics

Django Internals, General Python, Security

Audience Level

Advanced

35 years in software development, over 25 years using Python. A member of the Django Security Team. Co-founder of PyCon Israel, and celebrating a decade of attending DjangoCon Europe.
Working as a consultant.

Github: shaib

Fediverse: @shaib@tooot.im

Mail: shai@platonix.com