Tag Archives: citizen science

A New Paper All About #yellowballs


There is a new Milky Way Project paper in the news today, concerning the #yellowballs that were found by Milky Way Project volunteers.

The Yellowballs appeared on the very first day of the Milky Way Project when user kirbyfood asked ‘what is this?’ and I wasn’t sure so jokingly called it a ‘#yellowball’, since that’s what is looked like. We use hashtags on talk.milkywayprojct.org, and that user, and many others, went off and tagged hundreds of the things over the next few months. Before we knew it there was a catalogue of nearly 1,000 of them. However, we still didn’t know what they really were, and so Grace Wolf-Chase, Charles Kerton, and other MWP collaborators have put a lot of effort into figuring it out. From the JPL press release:

So far, the volunteers have identified more than 900 of these compact yellow features. The next step for the researchers is to look at their distribution. Many appear to be lining the rims of the bubbles, a clue that perhaps the massive stars are triggering the birth of new stars as they blow the bubbles, a phenomenon known as triggered star formation. If the effect is real, the researchers should find that the yellow balls statistically appear more often with bubble walls.

This new paper is the fourth from the Milky Way Project, and adds to the Zooniverse’s growing list of 80+ publications made possible thanks to our amazing volunteers. You can see the complete set at zooniverse.org/publications.

The full list of volutneers who helped tag the yellowballs is shown below. Each and everyone one of you made a valuable contribution to this paper. Thank you to everyone who helped in this search!

KhalilaRedBird, lpspieler, greginak, LarryW, chelseanr, broomrider1970, Dealylama, Cruuux, Mirsathia, suelaine, sdewitt, stukii, kmasterdo, PattyD, HeadAroundU, Fezman92, Jakobswede, Jk478B27Ds395, Kerry_Wallis, iacomo, Ken Koester, ttfnrob, jules, Falconet, Caidoz13, Starsheriff, ascil, simonron, tyna_anna, gwolfchase, Greendragon00, Ranchi, kirbyjp, githensd, katieofoz, harbinjer, ycaruth1, embo, echong, Feylin, stock_footage, zookeeper, joke slayer, karvidsson, Furiat, Tyler Reynolds, Manjingos, cathcollins, legoeeyore, GabyB, eshafto, mtparrish, 59Vespa, amatire, TheScribblery, pschmal, Helice, norfolkharryuk, WilB, jamesw40k, koenvisser, dragonjools, Nocterror, nunyaB, hansbe, meheler, Cahethel, Alice, stellar190, mabbenson, Embyrr922, gnome_king, jumpjet2k, tchan, yoman93, and Loulouuse.

Combining Your Clicks with Milkman

I’ve been building a new app for the Milky Way Project called Milkman. It goes alongside Talk and allows you to see where everyone’s clicks go, and what the results of crowdsourcing look like. It’s open source, and a good step toward open science. I’d love feedback from citizen scientists and science users alike.


Milkman is so called because it delivers data for the Milky Way Project, and maybe eventually some other Zooniverse projects too. You can access Milkman directly at explore.milkywayproject.org (where you can input a Zooniverse subject ID or search using galactic coordinates), or more usefully, you can get to Milkman via Talk – using the new ‘Explore’ button that now appears for logged-in users.

Clicking ‘Explore’ will show you the core view of Milkman: a display of all the clicks from all the volunteers who have seen that image and the current, combined results.

Screenshot 2014-09-09 09.14.38

Milkman 2

Milkman is a live, near-realtime view of the state of the science output from the current Milky Way Project. It might help people discussing items on Talk to understand what other objects are in the MWP images, and it hopefully shows how volunteers’ clicks are used.

Milkman uses a day-old clone of the main Zooniverse database, which means the clicks are at most 24 hours old. The clustering is performed using a technique called DBSCAN, which takes the vast array of clicks on each image and tries to automatically group them up. The resultant, averaged bubbles, EGOs, clusters, and galaxies are often better than any individual drawing, showing the power of crowdsourcing in acton.

Milkman is open source on GitHub and I’m happy to accept issues and feedback through the repo’s issues.

Immediate plans for Milkman include a navigable map on the homepage (to let you explore the whole galaxy), better links to other public astronomical data, and access to the current state of the reduced MWP2 catalogue as a whole. If you have ideas or requests either contact me or create an issue on GitHub.

New MWP paper outlines the powerful synergy between citizen scientists, professional scientists, and machine learning


A new Milky Way Project paper was published to the arXiv last week. The paper presents Brut, an algorithm trained to identify bubbles in infrared images of the Galaxy.

Brut uses the catalogue of bubbles identified by more 35,000 citizen scientists from the original Milky Way Project. These bubbles are used as a training set to allow Brut to discover the characteristics of bubbles in images from the Spitzer Space Telescope. This training data gives Brut the ability to identify bubbles just as well as expert astronomers!

The paper then shows how Brut can be used to re-assess the bubbles in the Milky Way Project catalog itself, and it finds that more than 10% of the objects in this catalog are really non-bubble interlopers. Furthermore, Brut is able to discover bubbles missed by previous searches too, usually ones that were hard to see because they are near bright sources.

At first it might seem that Brut removes the need for the Milky Way Project –  but the ruth is exactly the opposite. This new paper demonstrates a wonderful synergy that can exist between citizen scientists, professional scientists, and machine learning. The example outlined with the Milky Way Project is that citizens can identify patterns that machines cannot detect without training, machine learning algorithms can use citizen science projects as input training sets, creating amazing new opportunities to speed-up the pace of discovery. A hybrid model of machine learning combined with crowdsourced training data from citizen scientists can not only classify large quantities of data, but also address the weakness of each approach if deployed alone.

We’re really happy with this paper, and extremely grateful to Chris Beaumont (the study’s lead author) for his insights into machine learning and the way it can be successfully applied to the Milky Way Project. We will be using a version of Brut for our upcoming analysis of the new Milky Way Project classifications. It may also have implications for other Zooniverse projects.

If you’d like to read the full paper, it is freely available online at at the arXiv – and Brut can found on GitHub.

A New Batch of Milky Way Project Data Has Arrived

After a busy December and January we ran out of data a few weeks ago after 600,000+ classifications of the new images – but the wait is over! Last night a whole new, bigger, batch of data was added to the Milky Way Project. Here’s a few examples of what you might see in the data:

These new data come from the GLIMPSE 2 survey – a comprehensive survey of the middle-part of our galaxy in the infrared. We’re also going to be adding in some of the GLIMPSE 1 data (from the old version of the Milky Way Project) back into the site but with the new colour stretch. We’re doing to that to check the system works, but also because new features and structures will be visible with the change in data and colour palette.

We’re still crunching the data from the new classifications, but we’ve been able to extract lists of galaxies, EGOs and star clusters that you have found. We hope to share those with you soon.

So hop on over the milkywayproject.org and let’s add another 600,000 classifications and continue mapping the galaxy.

Milky Way Project on German TV

Some months ago I was contacted by the producers of a well known German science programme called Nano, which is broadcast on channel ZDF. They were recording a segment for the show on citizen science, and were keen to talk to me about the Milky Way Project. I was happy to help, they visited, we chatted, I walked up and down corridors and through doors, they filmed, and went on their way. The item was finally shown on Nano last week, on 7 September, and they did a great job showcasing our amazing images. You can watch the video for a couple more days here, and an accompanying article can be found on this webpage – these all in German. And yes, that’s me, at my desk in Heidelberg.

Milky Way Project is just one of the projects featured on the programme. I particularly like Artigo, one of the other projects featured. The aim of Artigo is to tag images of artworks, to enable catalogues of artwork to become more searchable. Artigo is set up like a game: two users are simultaneously shown the same image, and they’re asked to type in words that describe an aspect of the work they’re looking at. The users then score points based on the tags they enter: 0 points for a tag that’s never been entered for this image, 25 points if the other player has entered the same work in that session, and 5 points for a word that has previously been entered by another user.

It’s a really neat idea and quite a different approach to classifying images than is used by the Zooniverse projects. The attractive thing about a game approach is that the user gets immediate feedback on how they’re doing. I know that many MWP users regularly ask for feedback on their classifications. The problem with giving feedback, however, is that we don’t want to bias the users towards any particular kind of bubble drawings – we want you to tell us what a bubble looks like. Artigo gets rounds this very nicely by giving feedback based on what other users think, rather than what the art historians think.

This post is part of Citizen Science September at the Zooniverse.

Reducing the Data

I’ve spent much of the past two weeks messing about with different ways to reduce down over 200,000 bubbles (now almost 220,000) into a sensible catalogue. This gets very messy so I will try and explain what I’ve been up to in stages. This is a process called data reduction and for a citizen science, crowd-sourced project like the MWP, it can get complicated. I thought it may interested some of you to see where we currently are in the process of turning your clicks into results.

The key part of the data reduction problem is that we have a very large set of data – the massive number of bubbles that have been drawn – and need to decide which among them are ‘similar’ to each other. We need to keep some flexibility of our definition of similarity because right now, I’m not sure what ‘similar’ means.

Essentially, bubbles are ‘similar’ when two people draw a similarly sized bubble in a similar location. This is something that sounds remarkably easy to say but was hard to do well in code. Comparing 200,000 bubbles to each other is obviously computationally intensive.

Screen shot 2011-02-22 at 10.23.07

In the end I decided that since the size of bubbles was a consideration then I would move across the galaxy, looking on ever-decreasing orders of size. To do this I split the galaxy into 2×2 degree boxes and take each box in turn. In each box I see if there are bubbles here that are of the order of the size of the box (meaning they have a maximum diameter that is between a half- and a whole-box). If there are bubbles on that scale I run a clustering algorithm and pick out groups of these bubbles with central positions clustered to within one quarter of the box size. If a cluster is found, those bubbles are then saved and removed from the whole list. I then divide the box into four and repeat until no bubble are found.

Screen shot 2011-02-22 at 10.22.42

This method means that when a box contains no bubbles, we need not continue down in size scale, but when it does contain bubbles we always split and inspect the four child boxes. In this way we move through the galaxy, in ever-decreasing boxes, but in a fairly efficient manner.

We also have to perform the same analysis with an offset grid. This is exactly the same but making sure we catch bubbles that had fallen on the borders of boxes.

Once we have passed across the galaxy on all size scales, we need to make sure we’ve cleaned up the duplicates created by the offset grid. We do this by considering our newly created list of ‘clean’ bubbles and running through them in order of size. When we find bubbles of a similar size and location they are combined, according to the number of users that drew that bubble. This can be done more easily now that there are far fewer bubbles (in my tests we have dropped to around 5% of the initial number by this stage).


My initial run only looked at bubbles in the longitude range 0-30 degrees. Below are three images, showing one image from the MWP set (one of my favourites as lots of people see it differently). You can the the image, as it is shown to MWP users. Below that you see, overlaid in blue, the original bubbles as drawn by the users. In the third image you can see the same, but this time displaying the ‘cleaned’ results. In the original set the bubbles all have the same opacity, such that when they pile up you can see the similarities. The cleaned set gives the bubbles opacities according to their scores (think more opaque bubbles mean more users drew them).




It should be noted that the cleaned image does not yet display arcs, but rather always shows an entire ellipse. This is because I am not yet including the bubble cut-outs (which you can make out in the middle image) in the data reduction. These will be included at a later time.

You can see that I’m still getting some duplication at the end of the process – I may need to sweep across the final catalogue looking for similar bubbles until I reach a convergence when all bubbles are ‘unique’. I have been experimenting with this with mixed results but will continue my efforts.

If you’re still reading, I look forward to reading your comments. As I continue to make adjustments and progress with this reduction, I shall blog the results again. Many members of the science team are also having a go at this problem and so the final result may be quite different in the end as we improve things. I hope that this is an interesting insight into some of what goes on behind the scenes of the MWP.