I've been suspecting this for a while, but I had no documentation. I found a hiccup either in Google's indexing, or in the way Typepad builds its blog content management system.
This may have started back when Typepad's servers got overwhelmed and they altered a few key features that are pretty essential for the way blog menus have traditionally been built. Recent Posts or Recently Commented posts that were more than 30 days old started falling off the side indexes.
Generally, blog software allows you to make a setting for how many items will be displayed in the Recent Posts or Recently Commented, by direct number, not by date backwards in time. It weakened the archive system and the blog ecosystem in Typepad considerably, because the point of Recently Commented is to show where the community action is in the back archived posts. It is also a key feature of self-organizing sites, just like the rating system is in Scoop. Instead of going forward in these features, Typepad took a major step backward.
I've published e-books in blog software, used it as a database, an RSS-based content management system. In short, my use of blog software is not determined by the popularity and now-focused element, that new-new-new blip and burble that dominates much of the blogosphere. I'm more interested in creating deeper discussions, creating posts that are not dependent on timeliness, but rather, become, by blog standards anyway, timeless. Many academic and independent research blogs have this goal, and this is a growing and influential sector of the blogosphere, what I tag as "Public Intellectuals."
I have several posts deep in the archives of some of my older blogs, essays and think-pieces, that get more traffic than the post of the moment.
That was easier to do in 2002 than now. Lots more bloggers now. Or maybe it was because I was in Radio Userland then. Funny thing, though. Those old 2002 and 2003 posts are still pulling hits, same as my 1998 dissertation. This fact alone, I believe, proves my point.
Ultimately, the current, pointer-focused now-now-now attitude of this moment in time in the blogosphere WILL FADE. In a blog-glut, as you sit scanning your news feed reader, you might crawl out of your squirrel wheel and seek out words that have value, value that you want to hold on to, value that isn't based on the fact that it was posted in the last five minutes. Analysis does that. Essays do that. Good writing does that. And people out in the blogosphere are making spaces to post these sorts of things, just as much as they are making spaces for the quick hits and pointers.
Mainstream media focuses exclusively on the quick hits and pointer blogs, perhaps because it is easier to marginalize the blogosphere that way, to dismiss it as insubstantial. Or perhaps because that is the extent of mainstream media's attention span, and it is only capable of looking in a mirror at something that, like for Narcissus, looks like itself. In love with its own reflection, it fails to recognize significant differences in form, presuming ultimate dominance as it does.
But we must not fall into that trap ourselves and let mainstream media and quick hit blogs define us. Juan Cole is not a quick hitter. Riverbend is not a quick hitter. Neither is Glenn Greenwald nor Jay Rosen.
For the first time since being a beta tester for Typepad in 2003, I'm beginning to have second thoughts about the software's apostasy from the blog faith. By downplaying, disempowering, and unhinging key navigation points, Typepad is playing the game, helping to marginalize bloggers as insignificant, rather than developing a tool that works consistently across all types of content, empowering bloggers in this media revolution.
(Yes, I do know that this primarily happened because Typepad grew so fast and its servers were simply overwhelmed with all that rampant indexing. Typepad doesn't have Google's massive server resources, yet. I'm still not happy about it.)
And here's where the Google hiccup comes in.
I have Google ads and customized Google search on all my Typepad blogs. It's just a simple domain search, which is unfortunate, but I'd rather have Google's power, even if I can't at this time limit the searches to subdirectories.
My steady numbers of archive hits have been falling off. Poor me, I thought I was just fading in the ever-expanding blogosphere, an insignificant microbe, according to Truth Laid Bear. It was odd, though. It followed a pattern. Peculiar.
But tonight, on my poetry site, Headpiece Filled With Straw, I wanted to add the Wallace Stevens poem below, "Emperor of Ice Cream." I knew I had another Wallace Stevens poem in my archive of that site, and I wanted to find it using my own customized Google search (which also is supposed to earn me revenue through Google Adsense). When I searched "Wallace Stevens," Google said it had NO MATCHES.
Hey, that usually only happens on Old Media newspaper or TV station sites, with their weird unsearchable obtuse content management systems, where you search for something you know is there, and it doesn't return any results. I can't tell you how many times that has happened to me on the New York Times site, for instance. Old Media CMS's are some of the most poorly designed things out there, but at least now they work with RSS.
Blogs were always superior to those sites in their simple RSS construction, categories, permalinks, etc. PERMALINKS were an important key to that POLITICAL MOVE to stop links from going dead. That's why blogs gained such an edge over Old Media in times of big world news events. It was simply easier to find links that worked to get the information you were looking for, rather than wasting time wrestling with those piss-poor Old Media site search engines and CMS design with no permalinks.
So I tried a few more keywords in my custom site Google search, words that I KNEW were used in topics deep in my archives across all my public, pinging sites. AGAIN, NO MATCHES.
WTF?!
Is it Typepad's fault that my archives have disappeared from the Google crawlers and bots, Google, which intends to INDEX EVERYTHING IN THE WORLD?
Or is Google overwhelmed and deliberately ditching past archives that were already scanned once, perhaps to make room for all the data coming in from its massive Library of Everything book-scanning project?
Is Google imposing subtle hierarchies where a keyword isn't a keyword isn't a keyword, even if it is fully linked and pinged, unless the poster is "important enough" by some other secret, commercial, Old Media measure? Google's strength is the egalitarian nature of its data parsing. If its parsing is corrupted, search data becomes useless, and results delivered based on Google's algorithms aren't accurate.
The hierarchy of Google's world is the link. I accept that hierarchy. I'm willing to live by that hierarchy, with links functioning as dollar votes, so to speak, the coin of this realm. I know link farms and such are working day and night, trying game Google, and I know they will fail, not only because of Google's vigilance, but because the sites AREN'T REAL. Yes, fake link farm blogs are bleeding into Technorati results, but I think that's Technorati's problem, unless the corruption crept into the info coming from Google APIs.
There is a difference between having something to say and just stacking up links. I thought Google's parsing was sophisticated enough for that sort of AI distinction, perhaps a bit of natural language processing, secret sauce, what-have-you.
I DON'T accept other hierarchies of influence corrupting that data set. I want to trust Google's code crunch, if not its people (it's not personal; humans are more fallible than algorithms).
So why are my own archives disappearing from my own custom Google Adsense search?
Has someone judged my words insignificant outside the realm of algorithm-link-crunching-formulas? Has the link crunching formula returned to aspects of the old 1990's HTML Metatag days, where whole pages are no longer scanned for the big crunch, but whole books are?
OR (and I suspect this is the more likely culprit), has the shift in Typepad's server processing made archives older than 30 days fall off the Google universe on subsequent crawls?
I may not be technically adept enough to understand exactly how that might happen, except to guess that in dynamic processing in the CMS (what Six Apart added to Movable Type with 3.0), PERMALINKS are no longer treated as INDIVIDUAL PAGE ARCHIVES. By creating those archive pages only on the fly when someone links to them, does that mean the PERMALINK is virtual? If archives are not "real" Permalinks, would that be why Google thinks they're no longer present on the Internet?
Thing is, my backlist, my archive, is part of my revenue model here. If I'm a Google Adsense user, and I'm directly invoking the LONG TAIL in creating an archive meant to have long-term value, and my archive is being removed from the Internet record, I'm screwed.
Or I have to look for a new blog content management system.
If this problem extends beyond Typepad's borders, then the development is even more ominous for the social phenomenon of the blogosphere. To de-rank older blog archives, to ERASE them, across the board, from the Google crawl of record, is a POLITICAL DECISION TO ATTACK THE LONG TERM INFLUENCE OF SUCH SITES.
In my dissertation and beyond, I started calling this "the politics of deep structure interfaces," where seemingly benign interface design issues have far-reaching political ramifications, just like the overpasses designed by Robert Moses on roads to Long Island were deliberately made a few inches shorter than the clearance needed for NYC buses, effecting a kind of social segregation for lower income people to be able to travel to those areas.
If Long Tail blog archives disappear from Google, it is effectively the same as if our Blog City Buses are prevented from getting to Media Parity Long Island because of the "benign" technology of Robert Moses's overpass design.
Somebody's lowering the height of the overpasses. Is it Typepad, or Google?
Perhaps someone from Typepad or Google will see this and set me straight. I sure do hope they do it before this post vanishes into search engine oblivion.
Jeff Said,
May 16, 2006 @ 6:36 pm
Matt,
I know google is not giving us webmasters a full picture with the link command. I did the link command on yahoo and msn and I noticed some scraper sites copied my content and added some links to a few of my websites. I have a feeling google is looking at these links as questionable. I am in the process of emailing these scraper sites webmasters and getting the links removed because I did not request to put them there and they violated copywrite by taking our content.
Since google crawls better than msn and yahoo, will there be a way in the future for us webmasters to see these links? Honestly right now if a competitor wants to silently tank a websites rankings in google all they need to do is drop a bunch of bad links. Without google giving us webmasters the ability to see the links we may never even know this could happen.