Text to … vectors? How feature engineering works in natural language processing
09-24, 15:45–16:20 (Europe/Lisbon), Auditorium

Have you ever looked at a text and wondered how on earth it could be used in a machine learning model? How do we get models to understand what we’re reading? In this talk, we’ll examine different ways we can extract meaning from text for use in modelling.


Do you have an interest in starting your own natural language processing project, but feel overwhelmed by all the talk of attention-based models and text embeddings? Would you like to understand how you can transform a set of texts into features for a model? In this talk, I'll give you a practical demonstration of how meaningful features are created from text data, going from the simplest approaches and working up to cutting edge techniques such as BERT. I’ll demonstrate how to do this using some of the most popular Python packages for NLP, including scikit-learn, nltk and gensim. At each step, we'll discuss why each technique works, what meaning it extracts from the text and what it leaves behind, and the advantages and disadvantages of using each.

Video: https://youtu.be/nxLY0GGw5U0

Dr. Jodie Burchell is the Developer Advocate in Data Science at JetBrains, and was previously the Lead Data Scientist in audiences generation at Verve Group Europe. After finishing a PhD in Psychology and a postdoc in biostatistics, she has worked in a range of data science and machine learning roles across search improvement, recommendation systems, NLP and programmatic advertising. She has a particular interest in topics such as applying behavioural science techniques to ML projects and the relationship between engineering and data science. She is also the author of two books, "The Hitchhiker's Guide to Ggplot2" and "The Hitchhiker's Guide to Plotnine", and writes a data science blog.