Tag Archives: Sentiment Analysis

Our Sentimental Galaxy

More than 25,000 comments have been made on Milky Way Project Talk since the project began in 2010. That’s a lot of content in itself – beyond the main classification data from the MWP’s main interface.

I’ve been using the Python-based Natural Language Toolkit (NLTK) to perform what’s called sentiment analysis on Zooniverse Talk data. Some of the most stunning results come from the Milky Way Project’s rich dataset.

The process is oddly simple – thanks mostly to NLTK’s great documentation. You train an algorithm to recognise positive and negative words and phrases in text – and then go though all the MWP subjects in Talk looking at the things people say about them, and recording whether the comments are positive or negative. If a comment is really positive (e.g. people say ‘stunning’, ‘wonderful’, ‘brilliant’) then it gets a score around 1. If it’s negative (e.g. people say ‘horrible’, ‘stupid’, ‘disgusting’) then it gets a score of 0. Of course most subjects come in somewhere in between.

So here are the results: the 20 most-positively commented on images from the MWP (click to embiggen). It’s a lovely set, and you can see why people were so positive about these images.

On the flip side, here are the 20 most-negatively commented on images. You see a mix of difficult to classify and blown-out images.

I’m now looking at ways to use this sort of sentiment analysis to extract interesting images from Talk and highlight them to moderators and science teams. It’s something I’ve been toying with on-and-off for several projects – not just the MWP. The Zooniverse Advent Calendar seems like a great time to share and see what people think of this idea.

You can find my code on GitHub along with other examples. As well as the MWP there are galleries for Galaxy Zoo and  Snapshot Serengeti.

Advertisements