TED Radio Hour
Fri July 12, 2013
Can You Crowdsource Without Even Knowing It?
Originally published on Tue December 17, 2013 8:59 am
Part 2 of the TED Radio Hour episode Why We Collaborate.
About Luis Von Ahn's TEDTalk
Computer programmer Luis von Ahn wondered how else to use small contributions done by millions on the Internet for greater good. He put CAPTCHAs, those online puzzles to verify you're not a robot, to work by digitizing books and teaching foreign languages.
About Luis Von Ahn
Luis von Ahn builds systems that combine humans and computers to solve large-scale problems that neither can solve alone. Von Ahn is an associate professor of computer science at Carnegie Mellon University, and he's at the forefront of the crowdsourcing craze. His work takes advantage of the evergrowing Web-connected population to achieve collaboration in unprecedented numbers. His projects aim to leverage the crowd for human good. His company reCAPTCHA, sold to Google in 2009, digitizes books with the help of CAPTCHAs, the online word puzzles used to verify a user is a human. His new project is Duolingo, which aims to get 100 million people translating the Web in every major language.
GUY RAZ, HOST:
It's the TED Radio Hour from NPR. I'm Guy Raz. And on the show today, the chaos and the power of collaboration. What happens when hundreds or millions of people contribute toward a singular goal? Well, there's a good chance, a very good chance that you've actually been part of a huge online project unknowingly.
But more on that later. First, if you've ever bought a concert ticket online or signed up for Gmail, there's a box that pops up with words that look kind of wavy, and you have to type in what you think those words are.
LUIS VON AHN: And the reason it's there is to make sure that you, the entity filling out the form, are actually a human and not a computer program that was written to submit the form millions of times.
RAZ: It's like the most annoying thing ever.
VON AHN: I'm to blame. I'm sorry.
RAZ: The man to blame is Luis von Ahn, inventor of CAPTCHAs. I couldn't get tickets to Beyonce this summer 'cause I couldn't figure out the CAPTCHAs.
VON AHN: Did you really want to see her?
RAZ: Yeah, and she put on another show - two other shows in Washington; they were still sold out.
VON AHN: (Chuckling) Yeah, sorry.
RAZ: So CAPTCHAs are actually designed to prevent automated computer programs from doing things like buying up all those Beyonce tickets because those programs cannot read those squiggly lines, believe it or not, as well as humans can. And so today, every single day, almost 200 million CAPTCHAs are typed into computers around the world.
VON AHN: When I first heard this - well, first, I was proud. I thought, look at the impact that my work has had. But then I started feeling bad because not only are they annoying, they're also - each time you type one, you waste about 10 seconds of your time. And if you multiply 200 million by 10 seconds, you get that humanity, as a whole, is wasting, like, 500,000 hours every day.
RAZ: Man, imagine what you could do with those 500,000 hours.
VON AHN: Yeah, so that's exactly what I was thinking about.
RAZ: Here is Luis's TED Talk.
(SOUNDBITE OF TED TALK)
VON AHN: But I started thinking, is there any way in which we could use this effort for something that is good for humanity. So see, here's the thing, when you're typing a CAPTCHA, during those 10 seconds, your brain is doing something amazing. Your brain is doing something that computers cannot yet do. So can we get you to do useful work for those 10 seconds?
Another way of putting it, is there some humongous problem that we cannot yet get computers to solve, but that somehow we can split into tiny 10-second chunks, such that each time somebody solves a CAPTCHA, they solve a little bit of this problem. And the answer to that is, yes, and this is what we're doing now. So what you may not know is that nowadays, when you're typing a CAPTCHA, not only are you authenticating yourself as a human, but in addition you're actually helping us to digitize books.
And the basic idea is you start with a book. You've seen those things, right? Like a book.
(SOUNDBITE OF LAUGHTER)
VON AHN: OK. So you start with a book, and then you scan it.
SYNTHESIZED VOICE: It was the best of times, it was the worst of times. It was the age...
VON AHN: The next step in the process is that the computer needs to be able to decipher all of the words in these pictures.
SYNTHESIZED VOICE: It was the season of light. It was the best of turns. It was the best of trends. It was the best of transfers.
VON AHN: That, unfortunately, is not very accurate. For older books, the ink has faded, and so the pictures look as if you have taken a photocopy of a photocopy of a photocopy of some book. So it looks pretty distorted. So computers can't read them very well, but humans can.
So what we're doing now is we're taking all of the words that the computer cannot recognize and we're getting people to read them for us while they type a CAPTCHA on the Internet. So next time you type a CAPTCHA, those words that you're typing are words that are coming directly from books that the computer could not recognize and we're using what you're entering to help digitize the books.
RAZ: And then that word, what, gets sent back to whoever is digitizing that book and it sort of automatically fills in that part of the puzzle?
VON AHN: That's exactly right. It puts it back in there and then that's it. It goes on to the next word.
RAZ: So, like, more than a billion people have helped digitize books and they have no idea?
VON AHN: Yeah. That's basically what I work on. That's my work. It's taking things that get done by millions of people and try to reuse or recycle the human mental energy towards something else.
(SOUNDBITE OF TED TALK)
VON AHN: The question that motivates my research is the following: If you look at humanity's large-scale achievements - these really big things that humanity has gotten together and done, like, historically, like, for example, building the pyramids of Egypt or the Panama Canal, or putting a man on the moon - there's a curious fact about them and it is that they were all done with about the same number of people. It's weird. They were all done with about 100,000 people. And the reason for that is because, before the Internet, coordinating more than 100,000 people, let alone paying them, was essentially impossible.
So the question that motivates my research is, if we can put a man on the moon with a 100,000, what can we do with a hundred million?
RAZ: The answer? It's Luis's latest project. It started about three years ago. He had just sold his second company to Google.
VON AHN: You know, I retired. I retired for, like, a day.
RAZ: I would.
VON AHN: I was going to watch a lot of TV is was what I was going to do, but then I got bored.
RAZ: Which got him thinking about this question.
(SOUNDBITE OF TED TALK)
VON AHN: How can we get 100 million people translating the web into every major language for free? So I would like to translate all of the web, or at least most of the web, into every major language. OK, so that's...
RAZ: Why couldn't you just use, you know, language software to do it?
VON AHN: It turns out they're just not very good. You would never see a book translated by Google Translate, for example. Even when it's accurate, you don't even know whether to trust it or not because it's so inaccurate the rest of the time. You know, computers can't yet do this. This is why language translation is such a big business is because computers can't do it very well.
RAZ: And there are other hurdles.
(SOUNDBITE OF TED TALK)
VON AHN: The other problem that you're going to run into is a lack of motivation. How are we going to motivate people to actually translate the web for free? And this is normally - you have to pay people to do this, so how are we going to motivate them to do it for free? Now, when we were starting to think about this...
RAZ: And so what Luis came up with is a project and website called Duolingo, and here's how it works. You start by learning a language by typing in basic words, like girl...
SYNTHESIZED VOICE: La nina.
RAZ: ...Or boy.
SYNTHESIZED VOICE: El nino.
RAZ: But now, remember, Luis wanted to harness all that potential human energy into one huge project.
VON AHN: So you learn a language, but while you're learning, you're also helping to translate the web.
(SOUNDBITE OF TED TALK)
VON AHN: OK, so the way this works is whenever you're just a beginner, we give you very, very simple sentences. There is, of course, a lot of very simple sentences on the web.
SYNTHESIZED VOICE: Ella es una nina.
VON AHN: We give you very, very simple sentences, along with what each word means.
SYNTHESIZED VOICE: Ella - she - es - is - una - a - nina - girl.
VON AHN: OK, and as you translate them enough and as you see how other people translate them, you start learning the language. And as you get more and more advanced, we give you more and more complex sentences to translate.
SYNTHESIZED VOICE: (Speaking Spanish)
VON AHN: But at all times, you're learning by doing.
RAZ: OK, so you get better and better at it and then you basically give the user, like, a news article or something and you ask them to translate it.
VON AHN: Yep. That's the basic idea.
RAZ: And then, how do you know it's accurate?
VON AHN: It's basically multiple people translate the same thing and it turns out it's really accurate. We've measured and it's as accurate as a single professional translator.
RAZ: How fast would it take to translate all of Wikipedia into, say, five or six or 10 major languages?
VON AHN: It could be done in really a matter of weeks if we were entirely doing just Wikipedia and had enough people. So, for example, if there were a million people learning English from Spanish, for example, we could translate it in something like 80 hours.
RAZ: All of Wikipedia into Spanish?
VON AHN: Well, if there are a million people going at it, they can do it quickly.
RAZ: Do you think that the key to making this work, to getting millions, tens of millions of people to collaborate, to participate, is you have to give them something back?
VON AHN: Yeah, I think so. It's very hard to mobilize 10 million people, 100 million people. I actually think, even for nice causes, you won't get 100 million people helping to do it if you just have a good cause. I think in most cases you just - you have to give something back.
(SOUNDBITE OF MUSIC)
RAZ: Luis von Ahn. You can find out more about Duolingo and watch Luis's full talk at TED.NPR.org
RAZ: I've put in la femme, l'homme many many times...
VON AHN: OK.
RAZ: ...so I'm pretty sure that I've translated the woman and the man in many French websites.
VON AHN: Yeah, so that actually - you probably weren't doing any useful translation there.
RAZ: Oh, sorry. I'm just trying to, you know, pitch in a little bit.
VON AHN: Oh, sorry.
(SOUNDBITE OF SONG
UNKNOWN VOCALIST: I'll fill your CAPTCHA in; it's not case-sensitive... Transcript provided by NPR, Copyright NPR.