Drowning in spam
January 2, 2011
Arnold Toynbee warned in a 1963 book that one of the two severest challenges America needed to face was getting control of commercial speech. We seem to be farther from achieving that than ever. Instead, private speech is being limited. In that context, I was interested in this article that was sent to me by my friend Rich Spees, who maintains this website and designs and maintains others.
Why We Desperately Need a New (and Better) Google
This semester, my students at the School of Information at UC-Berkeley researched the VC system from the perspective of company founders. We prepared a detailed survey; randomly selected 500 companies from a venture database; and set out to contact the founders. Thanks to Reid Hoffman, we were able to get premium access to LinkedIn—which was very helpful and provided a wealth of information. But some of the founders didn’t have LinkedIn accounts, and others didn’t respond to our LinkedIn “inmails”. So I instructed my students to use Google searches to research each founder’s work history, by year, and to track him or her down in that way.
But it turns out that you can’t easily do such searches in Google any more. Google has become a jungle: a tropical paradise for spammers and marketers. Almost every search takes you to websites that want you to click on links that make them money, or to sponsored sites that make Google money. There’s no way to do a meaningful chronological search.
We ended up using instead a web-search tool called Blekko. It’s a new technology and is far from perfect; but it is innovative and fills the vacuum of competition with Google (and Bing).
Blekko was founded in 2007 by Rich Skrenta, Tom Annau, Mike Markson, and a bunch of former Google and Yahoo engineers. Previously, Skrenta had built Topix and what has become Netscape’s Open Directory Project. For Blekko, his team has created a new distributed computing platform to crawl the web and create search indices. Blekko is backed by notable angels, including Ron Conway, Marc Andreessen, Jeff Clavier, and Mike Maples. It has received a total of $24 million in venture funding, including $14M from U.S. Venture Partners and CMEA capital.
In addition to providing regular search capabilities like Google’s, Blekko allows you to define what it calls “slashtags” and filter the information you retrieve according to your own criteria. Slashtags are mostly human-curated sets of websites built around a specific topic, such ashealth, finance, sports, tech, and colleges. So if you are looking for information about swine flu, you can add “/health” to your query and search only the top 70 or so relevant health sites rather than tens of thousands spam sites. Blekko crowdsources the editorial judgment for what should and should not be in a slashtag, as Wikipedia does. One Blekko user created a slashtag for 2100 college websites. So anyone can do a targeted search for all the schools offering courses in molecular biology, for example. Most searches are like this—they can be restricted to a few thousand relevant sites. The results become much more relevant and trustworthy when you can filter out all the garbage.
The feature that I’ve found most useful is the ability to order search results. If you are doing searches by date, as my students were, Blekko allows you to add the slashtag “/date” to the end of your query and retrieve information in a chronological fashion. Google does provide an option to search within a date range, but these are the dates when website was indexed rather than created; which means the results are practically useless. Blekko makes an effort to index the page by the date on which it was actually created (by analyzing other information embedded in its HTML). So if I want to search for articles that mention my name, I can do aregular search; sort the results chronologically; limit them to tech blog sites or to any blog sites for a particular year; and perhaps find any references related to the subject ofeconomics. Try doing any of this in Google or Bing
The problem is that content on the internet is growing exponentially and the vast majority of this content is spam. This is created by unscrupulous companies that know how to manipulate Google’s page-ranking systems to get their websites listed at the top of your search results. When you visit these sites, they take you to the websites of other companies that want to sell you their goods. (The spammers get paid for every click.) This is exactly what blogger Paul Kedrosky found when trying to buy a dishwasher. He wrote about how he began Googleingfor information…and Googleing…and Googleing. He couldn’t make head or tail of the results. Paul concluded that the “the entire web is spam when it comes to major appliance reviews”.
Unfortunately, it isn’t just appliance reviews that are the problem. Almost any popular search term will take you into seedy neighborhoods.
Content creation is big business, and there are big players involved. For example, Associated Content, which produces 10,000 new articles per month, was purchased by Yahoo! for $100 million, in 2010. Demand Media has 8,000 writers who produce 180,000 new articles each month. It generated more than $200 million in revenue in 2009 and planning an initial public offering valued at about $1.5 billion. This content is what ends up as the landfill in the garbage websites that you find all over the web. And these are the first links that show up in your Google search results.
The bottom line is that we’re fighting a losing battle for the web and need alternative ways of finding the information that we need. I hope that Blekko and a new breed of startups fill this void: that they do to Google what Google did to the web in the late 90’s—clean up the spam and clutter.
Editor’s note: Vivek Wadhwa is an entrepreneur turned academic. He is a Visiting Scholar at UC-Berkeley, Senior Research Associate at Harvard Law School and Director of Research at the Center for Entrepreneurship and Research Commercialization at Duke University. You can follow him on Twitter at @vwadhwa and find his research at http://www.wadhwa.com.