Monday, March 19, 2012

Crowdsourcing the classification of galaxies

When I was in junior high school I belonged to a group of 'amateur variable star observers'. Our charge was to help astronomers determine if the magnitude (brightness) of certain stars varied over time. I thought it was pretty cool that non-professionals could actually make meaningful contributions to science. Of course, quite a few amateur stargazers have actually discovered astronomical objects like meteors and comets.

That tradition continues thanks to Galaxy Zoo. (GW)

Galaxy Zoo and the new dawn of citizen science

Galaxy Zoo has enabled hundreds of thousands of amateur astronomers to map the obscure corners of the universe since 2007. Tim Adams meets some of them and discovers that Charles Darwin once relied on similar crowd-sourcing methods

By Tim Adams

The Observer

March 17, 2012

For the past few nights, while the whole wide world has been fast asleep, I have been examining corners of the universe that perhaps no human eye has ever seen, and reporting the shape of unknown galaxies. The images I've been studying have been supplied by Nasa's Hubble telescope, and the galaxies at the centre of each picture range from chaotic-looking swirls to glorious Catherine wheel spirals and unlikely cigar-shaped flashes of light. Each galaxy takes a few minutes to get the sense of, and briefly describe; then I am on to the next.

The task is compulsive, and surreal; once you have classified your first 20 or 30 impossibly distant star swirls, you have to remind yourself that the images on your computer exist, or once existed, beyond the square of your screen at a scale you cannot comprehend. It makes you want to reach for your Hitchhiker's Guide (as Douglas Adams trenchantly observed: "Space is big. You just won't believe how vastly, hugely, mind- bogglingly big it is. I mean, you may think it's a long way down the road to the chemist's, but that's just peanuts to space"). Since space is an idea that your tiny mind can't quite hold on to, or at least mine can't, you reassure yourself with the attempt to get your classification straight: elliptical or spiral? Smooth or fuzzy? And, my favourite prompt: "Is there anything odd in this image?" (You mean, odder than a rotating mass of a trillion stars billions of light years away?)

In gazing at these unnerving and beautiful photographs there is curious comfort in the fact you are not alone. You are, in fact, one of thousands of similarly sleepless souls, engaged in just the same task of compare and contrast in a hundred or so countries across the world. The Hubble images are the latest to be posted on a website called Galaxy Zoo, which in the five years of its existence has become perhaps the greatest mass participation science project ever conceived. There are more than 250,000 active "zooites" of all ages, and between them these volunteer "citizen scientists" have classified images from the world's most powerful telescopes numbering in the hundreds of millions – in doing so creating a more detailed map of the known universe than once thought possible. Their work has given rise to more than 30 peer-reviewed science papers, at least one game-changing discovery, countless online friendships and perhaps even a few star-crossed lovers.

Galaxy Zoo, of course, comes complete with a creation myth of its own. Like all the best ideas, it sprang to life late one evening in the back room of a pub. The pub was the Royal Oak on Woodstock Road in Oxford, local to the habitués of the Radcliffe Observatory across the way. One Friday night in July 2007 a young researcher in astrophysics called Kevin Schawinski met his friend and colleague Chris Lintott in the Royal Oak for a beer. Speaking on the phone from his current home at Yale University last week, Schawinski recalled how he had been moaning to Lintott that evening about his grim week. In the course of his research into star formation, Schawinski had formed a theory that, contrary to conventional wisdom, stars could be formed in more ancient (10 billion-year-old) elliptical galaxies as well as younger spiral ones. To prove the theory he was faced with the task of examining a million galaxies using images from the Alfred P Sloan digital sky survey. Since no computer could accurately recognise the patterns he was looking for, there had seemed only one thing for it: he would have to sift through the images himself. Working 12 hours a day non-stop for a week, Schawinski had managed the not inconsiderable task of detailing the characteristics of 50,000 galaxies. He needed a pint.

"The amount of data was just huge," he tells me. "It had been rough work." Between them Lintott and Schawinski wondered aloud if they might recruit some volunteers to help them in the project, and as the evening wore on an idea formed that maybe they could create a website where the images would be posted and interested amateurs might enjoy the challenge of classification. If they had a model it was Nasa's Stardust@Home project. Beginning the previous year, 2006, Nasa had begun posting grainy pictures from its "stardust interstellar dust collector", asking people to look for tiny solid particles. "We figured if you could get lots of people looking for dust," Schawinski recalls, "it might not be too hard to find people to classify galaxies."

With the help of a couple of programmers, Lintott and Schawinski wrote the site in the next day or two, and on 14 July opened up for business. When I meet Lintott in London he smiles when he tells me what they had envisaged. "With so many galaxies to look at, we had thought it would probably take Kevin a couple of years to get through them all," he says. As it turned out, it took a few weeks. Within 24 hours of it being announced on Lintott's website, Galaxy Zoo was receiving 70,000 classifications an hour. They still measure their hit-rate in "Kevin weeks" – a unit of 50,000. "Soon after that we were doing many Kevin weeks per hour," Schawinski says. And what was more surprising was that the quality of the interpretation was extremely accurate. By using their new modelling army, the Zookeepers (as Lintott and Schawinski quickly became known on the message boards) could duplicate observations enough times to effectively eliminate error. "We thought about giving people tutorials and so on," Lintott says, "and monitoring responses, but we quickly saw it would be more effective – and fun – to have people just get going straight away, and use the sheer volume of observers to ensure accuracy."

The website quickly took on a life and character of its own. "Quite early on a strange thought dawned on us," Schawinski recalls. "We had succeeded in creating the world's most powerful pattern-recognising super-computer, and it existed in the linked intelligence of all the people who had logged on to our website: and this global brain was processing this stuff incredibly fast and incredibly accurately. It was extraordinary."

The other surprise was that the zooites spontaneously formed themselves into complex communities (much in the way that groups are established around Wikipedia pages). Almost from the beginning the Zoo was almost entirely self-organising. Interest groups were established, moderators emerged to oversee message boards, the networks developed games and jokes and obsessions of their own around what they were seeing on screen. The Zoo has a reputation, Schawinski suggests, for being the most polite place on the internet. "It is the only website where people say please and thank you."

According to the zooites I spoke to, and Lintott and Schawinski themselves, a good deal of that character was formed by Alice Sheppard, who was the first volunteer to put her stake in the heavens. Sheppard recalls the moment very well. "In July 2007 I was a very bored graduate of environmental studies," she says. "I hadn't been able to get a job after university and I was messing about on the internet a lot and I heard of a book called Bang! that Chris Lintott had published with Patrick Moore. It revived an old childhood interest of mine, in astronomy, that had begun when my mother bought me a book about stars, aged six."

She wrote to Lintott with a couple of questions about his book and, after he replied, stayed in touch. When Lintott mentioned the idea of Galaxy Zoo on his website, Sheppard was ready. "And of course when it went live, I jumped straight in." She has never looked back. "When he saw I was involved I got a message from him saying, 'Would you mind having a look at the message boards for us?

Just keep an eye out for swearing and spam.'" Sheppard hasn't missed a day since. A novice astronomer at the outset, she is currently working on a masters degree in astrophysics ("The astronomy is revoltingly simple," she says. "The maths is more of a challenge…"). And she still maintains the good humour of the ever-expanding zooite universe. "We had no troublemakers on the site for a couple of weeks," she says. "And when we did we had no real flame wars [deliberate wrecking], we just had a few who weren't prepared to work with other people."

She spent a bit of time moderating and blocking but mostly it was a voyage of discovery. The other zooites I speak to concur. Some are retired, some use the site as a way of unwinding after work, some are students, some are teachers. Lintott and Schawinski recently commissioned some research to see what motivated their citizen scientists. Nearly half suggested their primary motivation was a desire to be involved in useful research. Others cited "wonder" and "beauty" and "community" in about equal measure. Hanny van Arkel, a 28-year-old Dutch schoolteacher became a zooite for a different reason. She was a keen guitarist and she joined up because her hero, Brian May, of Queen, suggested she should in a blog. (May, a long time stargazer, is something of a guiding force among the Zoo community. Sheppard later sends me a version of the Queen song "39" with its lyrics rewritten in tribute to the rock star influence over the community, one of several: "Yes, we heard your call, though we're many miles away/Through websites and news from you,/All the stars across the sky, mysteries to satisfy/In the land of our Galaxy Zoo…")

Having heard May's call herself, Van Arkel quickly became hooked on the site, and just a week after she had been looking at galaxies, she noted a startling green cluster on one image. A few other zooites had looked at the same picture, presumably not known what to make of the cluster and moved on. Van Arkel did not do that. "Some friends have suggested to me that the curiosity or the need for the answer is very much part of my personality," she tells me on the phone, with a laugh. She put a note on the messageboards, and wondered if anyone knew what the bright green cluster might be. It became "Object of the Day", an ongoing series. But in this case no one, not Chris, not Kevin, not anyone had ever seen anything like it before. "Hanny's Voorwerp" (object) quickly became a cause célèbre in the wider astronomical universe. And after much debate it was suggested that the voorwerp was indeed unique – it captured the moment that a quasar, a light beacon powered by a black hole, illuminated a gas cloud. The "quasar mirror", the size of the Milky Way, could yet provide a biography of one of the more mysterious processes in the universe.

Van Arkel still spends a lot of time on the site – she was also partly responsible for identifying other remarkable and unexplained patterns, the so-called "green pea" galaxies – but these days, she says, she has become more of a roving ambassador for the possibilities of armchair discovery, talking to children and students across Europe. She is generally met with the same question: what is it like to have a vast area of time and space, 650 million light years distant, named after you? She thinks it is "pretty cool".

If zooites such as Hanny worry about anything it is that the source of their obsession will dry up, that they will run out of visible galaxies to classify. "In the beginning," Alice Sheppard said, "we all were enjoying it so much that we didn't like the idea of getting to the end." As it has worked out, despite the insatiability of the zooites, more data sets have kept becoming available just as one tranche of images has been classified; now Sheppard believes that the work will continue to expand like the objects of its attention, "though no one seems quite sure how many galaxies are in the Hubble database…"

If they ever run out of galaxies, citizen scientists will certainly have plenty of other projects to satisfy their after-hours cravings. Galaxy Zoo itself has created a little solar system of related projects – Zooniverse – based on the original model, that include a moon mapping site (with photographic definition that would allow you to see an astronaut's footprint); an extraordinary project called Old Weather that reproduces millions of pages of scanned ship's logbooks, written in the cursive script of navy captains, from which thousands more volunteers have been painstakingly extracting weather records, to extend our understanding of climate change to the era before meteorological stations were established (along the way, Old Weather is also revealing the spread of infectious diseases, port to port); there are also growing armies of more specialist enthusiasts translating papyri, and cataloguing whale noises. The crowdsourcing of amateur enthusiasts now extends to the search for extraterrestrial life, in the SETI project, to the imaginative classification of all oil paintings in public ownership in Britain (the Public Catalogue Foundation), to all kinds of botanical mapping opportunities.

In some senses, this openness to mass collaboration, particularly in biology, is nothing new; the internet has simply given it an exponentially greater scale and reach. Professor Jim Secord is in charge of the Darwin Project at Cambridge, which is labouring to produce the definitive scholarly edition of the great man's letters. The project has been collating and footnoting for 37 years so far and has completed 16 volumes of a proposed 30. Secord agrees that citizen science, the huge network of amateur botanists and ornithologists and rural vicars and pigeon fanciers, on whom Darwin relied for a good deal of his observational data, was a model for the dispersed pattern-makers of Galaxy Zoo.

"Darwin would have loved the internet," Secord says, "but even as it was, a great deal of what he achieved was made possible by the arrival of the international penny post. He was corresponding with people all over the world who were experts in particular species." Very few of these correspondents were professional scientists or academics, most were amateurs who knew something of Darwin's work and felt they had something to contribute. Darwin encouraged them by replying meticulously to letters (hence the 30-volume lifetime's work).

One thing Secord has been struck by, in poring over these letters, is Darwin's encouragement of women in this scientific endeavour: in 1872, for example he wrote to Mary Peat, an amateur naturalist from New Jersey: "Your observations and experiments on the sexes of butterflies are by far the best, as far as known to me, which have ever been made. They seem to me so important, that I earnestly hope you will repeat them & record the exact numbers of the larvæ which you tempt to continue feeding & deprive of food, & record the sexes of the mature insects. Assuredly you ought then to publish the result in some well-known scientific journal…"

Another thing that seems to characterise this network is the range of backgrounds of the people to whom Darwin wrote. Amateur science was deep-rooted in working men from the 18th century, when pubs in northern cities would host informal meetings of Linnean societies, and there would be informed debate about the taxonomy of local flora and fauna. The spirit of these part-time data-gatherers persisted into the last century in ornithology in particular, even as science itself became professionalised within universities, and increasingly specialised. The latter trend put up barriers between the amateur and the professional, and career scientists have protected their data, and in turn their funding and promotion prospects, until published and peer-reviewed. As Secord, a historian of science, points out: "We are in the strange position where you are currently probably more likely to find a paywall around an academic science website than almost anywhere else."

In this respect the great and growing enthusiasm for citizen science is allied to the open science movement, which calls for a spirit of collaboration and data-sharing and free web publication of papers among professional scientists, to the benefit of wider humanity. The model was established by the successful collaborative approach of the Large Hadron Collider, and the Human Genome Project, both of which serve as telling examples of what can be achieved. It is tempting to see this call for openness as a generational shift in attitudes, but as Chris Lintott of Zooniverse points out: "It is unfortunately generally the younger research scientists who have most to lose by not being published in the traditional way."

The most persuasive voice in the call for openness, for "information wanting to be free" in Stewart Brand's famous phrase, is Michael Nielsen. A physics prodigy and great supporter of Galaxy Zoo, Nielsen's acclaimed book Reinventing Discovery: The New Era of Networked Science was sponsored by the financier George Soros to advance his knowledge-sharing philosophy.

Nielsen makes an unarguable ethical case for the abandonment of the various barriers to sharing knowledge: economic, egotistical and expedient, without ever quite proposing alternative funding models. He makes the point that from the Enlightenment forwards science was aligned with common good. That purpose shifted in the last century; it is possible that the internet will reverse that trend. "We have an opportunity to change the way knowledge is constructed," Nielsen suggests. "But the scientific community, which ought to be in the vanguard, is instead bringing up the rear."

While its crowdsourcing methods are clearly not appropriate to all specialised fields, the advance of citizen science across the globe will no doubt accelerate this revolution. Government funding for scientific research in both Britain and the US now makes mandatory the inclusion of provision for engagement with the wider public, the necessity to let us know how our money is being spent. And there is, as Zooniverse reveals, no surer way to engage the public than to involve people in the research itself. As Alice Sheppard, moderator-in-chief of the known universe, points out: "There are always going to be shoulder shruggers who say this can't or won't happen, but look, it's happened already!"


Post a Comment

<< Home