Digital Cruft 2006
Brooklyn, NY 2006-12-16T18:12:54Z
I have been doing some end of year digital cleanup. This was prompted by discovering my 40Gb laptop had 1Gb of free space remaining. Some of that space was Windows Update files, but I was curious how I’d filled up what I thought was 15Gb of free space.
Some years ago my boss at IBM and I killed an hour of an afternoon figuring out how much storage was “under management” by the MVS system we were using. I recall it being something like 1.5Tb and that we were both a bit floored (this wasn’t just DASD, er, disk, but also some tape storage, but mostly disk).
I’m sitting here now at my desk
and am staring at…lessee,
maybe 1.2Tb of disks attached to three systems. And while a lot of that disk
space is filled with media files (something like 15,000 mp3s, and maybe 100
videos offloaded from the Tivoim), a lot of the space is filled with backups
and digital cruft.
The cruft has built up over the years, and this is likely not my first post about it. I have email dating back to the early 1990s (but sadly lost my earliest email archives from CMU due to poor magnetic media planning). I have digital photos dating back to 1996 (with a single weird outlier from maybe 1993). I have multiple copies of programs, articles I’ve written, backups of websites I’ve developed or had a hand in.
But back to the laptop, I started looking into what the current crop of cruft contains and found a bookmarks.html file I’d saved (for some reason) from 2004.
That does not seem that long ago, however it represents the end of an era for me. From maybe 1995-1996 I maintained some sort of bookmark file, independent of the browser I was using, which I’d sort of categorized and would zip through on a near daily basis to “keep up” with the web. This bookmark file seems to have the remnants of that daily bookmark file as it even has links to IBM internal sites which I’m sure had fallen silent by 2004 (I’d left IBM in 2001 so can’t verify).
By 2004 I was using Bloglines to follow weblogs, but other sites I still had to manually go and look at the damn site to see what was new. Some sites offered RSS headlines but no excerpts or copy.
Looking at this file I see that JoDI has fallen off my radar. It’s a peer-reviewed electronic journal for digital information studies. In 2004 it was located at http://jodi.ecs.soton.ac.uk/. Since then it has moved twice, once to http://jodi.tamu.edu/ and is currently at http://journals.tdl.org/jodi. Sadly it still does not provide a web feed (I’m ecumenical, I’ll take either Atom or RSS) so it’s fallen off my radar.
A problem with using hosted services for your blog becomes evident: they don’t provide for the day you decide to stop using the service, so Black Belt Jones’ “Flyingcarpet” blog from 2004 404’s today (the correct URL is now http://www.blackbeltjones.com/work/). A simple redirect service from TypePad would keep them in the graces of their ex-customers (I ran into something similar when I dropped my Radio Userland site, which years later still shows up in Google results on “Ed Costello” though it is content-free).
I’d say, based entirely on informed guessing, that half of the blog sites I was following in 2004 have either shut down entirely or drastically changed focus.
Apparently in 2004 I was interested in politics and urban issues. I can’t tell why I bookmarked a number of the sites I did. One site I just checked is obviously a blog, but has no name (not even on the “About” page) and the writing, while good, isn’t of any interest to me today.
It’s depressing to see how many sites, which must have had some interesting thing in 2004, have become junk search portal pages (you know these, they have some generic bland design and the headline is “Resources about goshIGotADomainName.com”). Even Gene Kranz’s web site has become a SEO gateway page.
It was interesting to come across Jet Lag: How Boeing Blew It in light of this week’s NY Times article: A Humbled Airbus Learns Hard Lessons. Net: 2003’s loser is 2006’s winner.
Another surprising observation: how many professionally run sites (either by bozoes like myself who should know by now how to run a web site, or commercial sites) don’t use redirects when they redesign or restructure. The content is still on the site, but the old bookmark just results in a 404 – file not found error, instead of a redirect (or a gateway page saying “Hey, we’ve restructured and don’t have a clue how to redirect you”). R.E.M. apparently had a MySpace page at http://myspace.com/rem/ in 2004. Since 2004 MySpace has apparently changed something because that URL responds with a generic Microsoft IIS 404 page, while stripping the trailing “/” off yields the current R.E.M. page. Perfectly valid thing to do, but if you’ve structured the site one way and restructure it later, at least trap the older URL and issue a redirect, or something.
Digging deeper I found a number of links from my IBM.com days, both to IBM sites (apologies about knocking on the ibm.com staging site’s door there, who would have thought the URL wouldn’t change in 6 years!) and sites I apparently was interested in in 1997-1999.
I found a link to the first phishing site I ever came across, the “IBM-AOL Rewards” scam from 1999. I don’t recall the exact details, but we came across it because people started sending email to our webmaster mail complaining that they had not received their “IBM AOL” reward. I could not believe, then, that people would believe that IBM and AOL would host a major corporate program on angelfire.com, but that didn’t trip people’s skepticism wire.
The file goes way back, covering the whole Y2K imbroglio, IBM.com’s “Bullseye” redesign, “Deep Blue”, various Olympic Games related links, various news articles. This one from ten years ago was humorous to read: Banner Ads on Internet Attract Users (December 3, 1996). I am surprised that the NYT allows access to that without the paywall popping up.
So, there’s still lots of interesting information in the file, but it is of little or no use to me today. I’ll keep it, but it’ll move into the archives on our Mac, safely off my laptop, perhaps never to be seen again. I don’t know if Google Desktop Search would help here (partly because it doesn’t even occur to me to search my own archives for stuff, I mean, why would I need to when most things I’m interested in are discoverable through Google Search of the open web).
And there’s a problem: I keep all of this stuff, this digital crap, around because at the moment I decide to keep it I think it might be useful some day. But it rarely is, and if it was useful, it doesn’t even occur to me to look in my own archives. So why do we keep all of this stuff (and sign up for even more) if we don’t have a way of managing it and discovering it?