Sweaty Horse Blanket - Processing the Natural Language of Beer

Natural language processing (NLP) is among the oldest of Computer Science fields, dating back at least to the 1950s. In this talk I'll present a crash course in simple NLP, focusing on tools to perform document-level summarization and understanding. Specifically, I'll go through TF•IDF and topic modelling. We'll use these techniques to make sense of the language people use on the web when describing beer. I'll introduce a dataset containing some 3 million paragraph length reviews of 120,000 beers.

We'll use this data to create a concise description for any commercially available beer. These descriptions will draw out the differences between the different techniques, at an intuitive level. We will then look at ways to quantify the distance between documents, which will then be used to show how similar different beers are. By the end of this talk, the audience should have enough of an understanding to use document-level NLP and know what the sweat horse blanket thing is all about.

  • Ben is obsessed with data, beer, and music, not necessarily in that order. He has a PhD from the Intelligent Sound and Music Systems group in the Computing Department at Goldsmith University of London. His work there focused on merging social and acoustic similarity spaces to drive playlist creation and related user-facing systems. He is an expert on metadata, structured data, the semantic web and recommendation systems. In his spare time, he is a co-chair of the annual international Workshop On Music Recommendation And Discovery, has given an Ignite London talk about beer styles, occasionally DJs, is an accredited beer judge and homebrews beer. He thinks bios in the third person are weird but figures that’s how they’re meant to be written.