Tag Archives: Bubbles

New Milky Way Project Poster

I’ve been diving into the bubbles database recently and ended up creating cutouts of all 3,744 large bubbles from the DR1 data release. From there it was an easy enough job to create this new Milky Way Project poster. It uses all 3,744 bubbles at least once (several are used more than once).

MWP Logo Mosaic of Bubbles

I’m currently working on three new Milky Way Project papers and will be blogging about them in the next weeks and months.

Creating a Bubble Catalogue

In recent weeks, I’ve spent much of my time figuring out how to use all of your drawings to determine where the bubbles are in the Spitzer data. About a month ago we had a breakthrough. Thanks to a lengthy conversation with MWP science guru Matthew Povich, I realised that one of the reasons it is so hard to determine where a bubble should be drawn is that sometimes there is no right answer! There are many bubbles in the MWP that people would disagree on how to draw – the reason is that there is often not necessarily a right answer to the question “where is the bubble?”.

An example of just such a bubble is shown below, with all user drawings shown next to it. You can see that this bubble just isn’t that easy to draw and that there are even two or three structures within the image that one could call a bubble. Instead of trying to make this fit a rigid one-bubble definition, we realised that we should be using the human ability to recognise patterns. After all – this is exactly what you are all so good at, and computers are sometimes not.

Myself and Matthew decided that what we should do in these instances is simply allow two (or even three) bubbles to be deemed as ‘real’. The inner, red structure is a kind of bubble, and so is the open-ended green bubble just outside of it. One could also perceive a third bubble just below and to the left of these, and many people appear to have drawn just that. (This is in addition the multitude of smaller bubbles around the edge, of course). Whatever catalogue is produced by our data reduction, it probably should include at least the first two structures if enough people drawn them.

This decision has made creating a cleaned bubble catalogue much easier. The data reduction process described in my February blog post is still the process I’m using, although it has been greatly refined. More importantly, since February an enormous number of new bubbles have been drawn and this means the averaging process produces better results. Below you can see some results of the latest efforts and hopefully you’ll agree that what is being produced is a good catalogue, based on what you have all drawn. For the sake of testing, I am using one 3-by-2 degree section of the data. This is the region +12 degrees from the galactic centre and contains several interesting and complex features – which makes it a good testing ground.

Below you can see the 3×2 degree tile on its own, with all of your 7,000+ bubbles drawn on top and with the resultant ‘cleaned’ bubbles as well. You can click on any of the images to see the full version.

I have also been looking into other techniques for extracting the bubbles as the crowd sees them. Below you can see just the raw bubble data, drawn by users for this tile. With the background removed, we can use a simple contrast ratio to create a threshold, which we use to cut-out the bubbles from the original image.

This is another method for extracting data, and although it is harder to define a rigid catalogue of bubbles using this method, it may still have use in mapping regions of star formation in our galaxy.

Reducing the Data

I’ve spent much of the past two weeks messing about with different ways to reduce down over 200,000 bubbles (now almost 220,000) into a sensible catalogue. This gets very messy so I will try and explain what I’ve been up to in stages. This is a process called data reduction and for a citizen science, crowd-sourced project like the MWP, it can get complicated. I thought it may interested some of you to see where we currently are in the process of turning your clicks into results.

The key part of the data reduction problem is that we have a very large set of data – the massive number of bubbles that have been drawn – and need to decide which among them are ‘similar’ to each other. We need to keep some flexibility of our definition of similarity because right now, I’m not sure what ‘similar’ means.

Essentially, bubbles are ‘similar’ when two people draw a similarly sized bubble in a similar location. This is something that sounds remarkably easy to say but was hard to do well in code. Comparing 200,000 bubbles to each other is obviously computationally intensive.

Screen shot 2011-02-22 at 10.23.07

In the end I decided that since the size of bubbles was a consideration then I would move across the galaxy, looking on ever-decreasing orders of size. To do this I split the galaxy into 2×2 degree boxes and take each box in turn. In each box I see if there are bubbles here that are of the order of the size of the box (meaning they have a maximum diameter that is between a half- and a whole-box). If there are bubbles on that scale I run a clustering algorithm and pick out groups of these bubbles with central positions clustered to within one quarter of the box size. If a cluster is found, those bubbles are then saved and removed from the whole list. I then divide the box into four and repeat until no bubble are found.

Screen shot 2011-02-22 at 10.22.42

This method means that when a box contains no bubbles, we need not continue down in size scale, but when it does contain bubbles we always split and inspect the four child boxes. In this way we move through the galaxy, in ever-decreasing boxes, but in a fairly efficient manner.

We also have to perform the same analysis with an offset grid. This is exactly the same but making sure we catch bubbles that had fallen on the borders of boxes.

Once we have passed across the galaxy on all size scales, we need to make sure we’ve cleaned up the duplicates created by the offset grid. We do this by considering our newly created list of ‘clean’ bubbles and running through them in order of size. When we find bubbles of a similar size and location they are combined, according to the number of users that drew that bubble. This can be done more easily now that there are far fewer bubbles (in my tests we have dropped to around 5% of the initial number by this stage).


My initial run only looked at bubbles in the longitude range 0-30 degrees. Below are three images, showing one image from the MWP set (one of my favourites as lots of people see it differently). You can the the image, as it is shown to MWP users. Below that you see, overlaid in blue, the original bubbles as drawn by the users. In the third image you can see the same, but this time displaying the ‘cleaned’ results. In the original set the bubbles all have the same opacity, such that when they pile up you can see the similarities. The cleaned set gives the bubbles opacities according to their scores (think more opaque bubbles mean more users drew them).




It should be noted that the cleaned image does not yet display arcs, but rather always shows an entire ellipse. This is because I am not yet including the bubble cut-outs (which you can make out in the middle image) in the data reduction. These will be included at a later time.

You can see that I’m still getting some duplication at the end of the process – I may need to sweep across the final catalogue looking for similar bubbles until I reach a convergence when all bubbles are ‘unique’. I have been experimenting with this with mixed results but will continue my efforts.

If you’re still reading, I look forward to reading your comments. As I continue to make adjustments and progress with this reduction, I shall blog the results again. Many members of the science team are also having a go at this problem and so the final result may be quite different in the end as we improve things. I hope that this is an interesting insight into some of what goes on behind the scenes of the MWP.

The Bubbling Galactic Disk

Some of the most beautiful structures in Spitzer GLIMPSE data are the bubbles. Bubbles are regions of gas, usually found around newly formed stars, often with shells of material surrounding them (the green 8 μm emission above). These appear as rings in the GLIMPSE images and can vary in appearance from strikingly prominent to intriguingly faint. They can be anything from complete circles and ellipses, to fractured, fragmented remains.

As part of Project IX we’re going to ask you to find and measure these bubbles. Researchers can use this information to learn a lot about how these objects form and how they trigger star formation.


Above is an image of RCW 120, the titual “perfect bubble” from a 2009 paper by Deharveng, Zavagno, Schuller, Caplan, Pomarès and De Breuck. This colour-composite image shows Hα emission in blue, 8 μm emission in green and the 24 μm emission of small dust grains in red. The image is approximately 24′ degrees wide.

The green material has been swept up as the region expanded, after the formation of a massive star in the centre. There are about 2000 Solar masses of neutral material here, and this has fragmented into lumps. This is where star formation is occurring. The authors of the study found 138 potential star-forming objects in the ring around RCW 120.

In 2006, a group of astronomers visually inspected the GLIMPSE data for bubbles and catalogued their results in a paper titled ‘The Bubbling Galactic Disk‘. I’m a sucker for a great paper title. The team behind this study has been looking at the GLIMPSE data ever since. As mentioned above, bubbles are important features in the study of star formation. Here’s how they are described in ‘The Bubbling Galactic Disk‘:

The study of bubbles gives information about the stellar winds that produce them and the structure and physical properties of the ambient ISM – interstellar medium – into which they are expanding. Additional physical insights include the hydrodynamics of gas and dust in expanding bubbles, the impact of expanding bubbles on magnetic fields in the diffuse ISM, and mass-loss rates during the evolution of stars.

In 2006 322 bubbles were visually identified by just a handful of people. Since that time two things have changed. Firstly, there is now a lot more data, and therefore more bubbles. Spitzer has also continued to map more of the galactic plane for the GLIMPSE360 project – more on that in a later post. Secondly, the Zooniverse now exists!

Everytime the Zooniverse and bubbles have been mentioned together, someone has been there saying that we should get the public to find and measure them. Whether this is between Chris, myself and others at Zooniverse HQ or between Grace Wolf-Chase at Adler Planetarium and various members of the ‘The Bubbling Galactic Disk‘ study. Bubbles and the Zooniverse should be a match made in the heavens.

Why are bubbles such a good target? For many reasons. They are not only amazing to look at, but also are numerous in the GLIMPSE data. They are tricky to measure but not impossibly hard. They are scientifically valuable objects to catalogue and measure the properties of, and they require more than one independent, human measurement to get a good handle on – this is key of course.

Many of the folks behind the ‘The Bubbling Galactic Disk‘ are part of the Project IX science team. We hope that everyone out there in the Zooniverse community can help refine and expand the existing bubble catalogue as part of Project IX. With the addition of new data, we also hope to find many new bubbles.

We are currently developing the bubble tool – the first user interface portion of the Project IX site  – which will have similarities to the Moon Zoo crater tool. We hope to be able to share it with you all soon so that you can help us to test and refine it. It is exciting to be able to involve everyone at this early stage.

If you’re interested in following this project and its take on bubbles, I’d suggest reading ‘The Bubbling Galactic Disk‘ and looking at the GLIMPSE website. If you have any questions – let us know. Bubble are just one thing we can see in the GLIMPSE data. More posts will follow about other scientifically useful objects lurking in this amazing infrared archive.

We’re Not Cloud Painters

A current hot topic in star-formation science is the study of Infrared Dark Clouds (IRDCs). These are really dense molecular clouds that appear dark even in infrared surveys.

For a while, we hoped that Project IX would be our avenue to exploring IRDCs. However, as you’ll see in this blog post, not every idea become a reality at the Zooniverse – for a good reason. Here’s the story of the IRDC project that never was.


A Bit About IRDCs

IRDCs appear throughout images from far-infrared surveys – there’s a lovely example in the image above. It was not initially known what lay within them. Closer study revealed that they have sizes and masses similar to high-mass star forming molecular clouds. Similarly, the dense cores within IRDCs appear to match the sizes and masses of high-mass prestellar cores – the direct progenitors of stars. IRDCs also seem to be located along the spiral arms of our Galaxy, which is where star formation mostly occurs (Jackson et al, 2008, see image below). A few IRDCs even show evidence that they contain young proto-clusters of stars. In short, evidence seems to suggest that early phase high-mass star formation is occurring within IRDCs.

High-mass star formation is another hot topic in astrophysics. It is not fully understood how stars several or hundreds of times bigger than the Sun form. We know that they do and that they can get really big – but compared to the process of low-mass star formation their origins are a bit of a mystery. IRDCs may hold some answers.


It has been established that there are upward of 10,000 IRDCs visible to us here on Earth (Simon et al. 2006a). For each of these it would be useful to know their size, shape and location. The Spitzer Space Telescope’s Galactic Legacy Midplane Infrared Survey Extraordinaire (GLIMPSE) highlights many of these often beautiful dark clouds and the data is available for use. The question is how do you visually classify this many dark clouds in such a large dataset?

Project IX

This started to sound to me like a potential citizen science problem. Lots of objects to be found, a task that computers find difficult, and a large dataset. We got really excited here at Zooniverse HQ, and began to concoct an idea that would see tens of thousands of volunteers literally painting clouds in space. I developed a prototype HTML5 interface that allowed someone to draw around IRDCs in GLIMPSE data (se image below). Sadly, our enthusiasm wasn’t to last.


We had a prototype interface and we had lots of GLIMPSE data – but Chris was worried. I kept bringing him a new developments and he’d be interested, excited but also cautious. So we sat down to really get to the bottom of this idea. Imagine 1,000 people were to draw around the same IRDC in an image. Each person looks at the image and decides in their own mind where they would say the border between the dark cloud and the surrounding bright emission is. They then draw around the cloud, fairly imperfectly, following roughly that ratio of dark-to-light.

Translating this into a different vocabulary: each person decides on a contrast ratio and tries to follow a fixed-contrast contour around the dense region in the image. We then average those contours to get the group’s decision on the best contrast ratio to use.

Have you spotted the problem? The average contrast ratio achieved by this method is no more right or wrong than any other value. You may as well have taken a computer and told it to draw a contour at a specified contrast ratio across the whole Galaxy. You can derive that ratio via some meaningful number that can be calculated from the data – maybe the extinction or the signal-to-noise. The 2006 paper that identified the Galaxy’s 10,000+ IRDCs already did this – instead of improving upon the existing study, we would have ended up replicating it, only in a slightly different way. We compared our own visually identified clouds to those drawn out by the Simon et al., 2006 algorithm and found the results were very similar. As such the cloud painting project was more-or-less concluded there and then.

Moving On

The Zooniverse has policies on what makes a good citizen science project. These guidelines have been produced following the lessons of Galaxy Zoo and other projects. Chris wrote up a blog post about this the other day. Our one unbreakable rule is that if we ask the public to collaborate on a project, their efforts must produce a meaningful result. We must never waste people’s time.

I’ll be honest, I was a bit gutted. Cloud painting would be possible and it would yield a reasonable result. It would even be fun! However it wouldn’t add anything scientifically useful to what we know about the Galaxy’s IRDCs. Just because a problem can be crowd-sourced doesn’t mean it should be.

Luckily for Project IX, there is a lot more to see in the GLIMPSE data than just IRDCs. So we leave the dark clouds to the machines and in our next post we’ll finally talk about the great science that we can achieve.

If you want to learn more about IRDCs there are lots of papers and talks on the subject out there on the web. If you’d like to read some of the papers here’s a potted history of IRDCs – Simon et al. 2006aRathborne et al., 2006 and 2007Simon et al., 2006bJackson et al, 2008and Chambers et al., 2009.