February 15, 2003

Google's Moment of Truth

Maybe this would surprise some folks who know me, but the above article about Google taking flak over controversial links to anti-Scientology sites, or hate sites, or sites some government doesn't like, issues I would normally say are moral or ethical, leaves me with the urge to take a totally operational and functional view of Google's ethical quandry.

In other words, I don't believe Google has an ethical quandry, for the same reason as the ACLU defends Nazis and Amnesty International looks after the rights of prisoners, some of whom may not be nice people. If I were Google, I would model my stance on the logic these two groups use.

Uh, am I saying it is a basic human right to have a functional, machine-based search? Yes I am, for the same reasons as Ben Franklin once advcated public libraries. We shouldn't be building walls that information can't cross. It is Orwellian--it risks rewriting history or erasing things that are actually part of the discourse that socially constructs our world.

Google's functional credibility is on the line. And let me ask this from a propietary software point of view: Would anyone ask the Dialog database to compromise its functionality by using a less than complete dataset? How about Lexis Nexis?

If an organization corrupts the data in its dataset, its search functions lose credibility, plain and simple. How did Google win the search engine wars?

I can tell you what made the difference for me. I was with Lycos and Hotbot and Alta Vista all the way, and long before Google showed up, I would run searchs for things I KNEW were out there but I couldn't remember where they were. The search engines often could not find these things. Frustrated, I longed for the day I'd have a perfect search engine, one that looked at full texts, not just titles, or not just metatags, or not just titles, metatags, abstracts and keywords. Maybe people don't care that search engines work from an incomplete dataset. Maybe all they care about is pretty banners cluttering up search engine functionality with bullshit. Frankly, THOSE PEOPLE ARE FUCKED.

Search engines are databases that sell searching and parsing functionality, PERIOD. The other stuff is noise. I don't care about entry-level newbie marketing, cuz newbies benefit from higher quality products as much as researchers do. If this game were about portals, Google wouldn't have won the war in the first place. This axiom should be part of Stupid Marketing Mistakes 101.

Reliable searching is about far more than site ratings or quick displays. Reliable searching is about credible searching, and that is about TRUST. Trust in the dataset. You can't do datamining, you can't do credible research, if the dataset has been fucked with from an editorial point of view.

In Google's universe, to keep from being trumped by the next hot parsing AI search engine that is in development right now, GOOGLE MUST INDEX EVERY LOCATION THAT LAUNCHES A PAGE ON A BROWSER. It can rank and parse all day long, so long as the dataset is complete. The dataset is dynamic, in constant flux. That might argue against any credibility, in an old universe where software can't adapt to a changing dataset on the fly. But these days it can be credibly done, and not just by Google, but by Dialog and Lexis Nexis and other expensive systems.

If Google knows its business, it knows the high-end proprietary databases are its primary competition, although I feel certain that by being so widely used in a distributed system, Google's technology is likely far more sophisticated than Dialog and Nexis, which both operate on a very expensive, scarcity model, as do many other elitist products. Would these databases choose not to include data items on topics they didn't like or found odious? Given their high end cost and demanding clients, they would have a lot to lose by corrupting their datasets.

Just because Google's reputation isn't built on sky-high fees and high-end clients doesn't mean Google shouldn't have the same concerns for credibility.

Or to put it another way: If Google corrupts its dataset (heaven forbid!), Google leaves the door wide open for the next level of search engine with exhaustive Hoovering dataset and advanced parsing to blow it away as surely as Google decimated Alta Vista.


