I have been doing some end of year digital cleanup. This
was prompted by discovering my 40Gb laptop had 1Gb of free space remaining.
Some of that space was Windows Update files, but I was curious how I’d filled
up what I thought was 15Gb of free space.
Some years ago my boss at IBM and I killed an hour of an
afternoon figuring out how much storage was “under management” by the MVS
system we were using. I recall it being something like 1.5Tb and that we were
both a bit floored (this wasn’t just DASD, er, disk, but also some tape
storage, but mostly disk).
I’m sitting here now at my desk
and am staring at…lessee,
maybe 1.2Tb of disks attached to three systems. And while a lot of that disk
space is filled with media files (something like 15,000 mp3s, and maybe 100
videos offloaded from the Tivoim), a lot of the space is filled with backups
and digital cruft.
The cruft has built up over the years, and this is likely
not my first post about it. I have email dating back to the early 1990s (but
sadly lost my earliest email archives from CMU due to poor magnetic media
planning). I have digital photos dating back to 1996 (with a single weird
outlier from maybe 1993). I have multiple copies of programs, articles I’ve
written, backups of websites I’ve developed or had a hand in.
But back to the laptop, I started looking into what the
current crop of cruft contains and found a bookmarks.html file I’d saved (for
some reason) from 2004.
That does not seem that long ago, however it represents the
end of an era for me. From maybe 1995-1996 I maintained some sort of bookmark
file, independent of the browser I was using, which I’d sort of categorized and
would zip through on a near daily basis to “keep up” with the web. This
bookmark file seems to have the remnants of that daily bookmark file as it even
has links to IBM internal sites which I’m sure had fallen silent by 2004 (I’d
left IBM in 2001 so can’t verify).
By 2004 I was using Bloglines to follow weblogs, but other
sites I still had to manually go and look at the damn site to see what was
new. Some sites offered RSS headlines but no excerpts or copy.
Looking at this file I see that JoDI has fallen off my
radar. It’s a peer-reviewed electronic journal for digital information
studies. In 2004 it was located at http://jodi.ecs.soton.ac.uk/.
Since then it has moved twice, once to http://jodi.tamu.edu/
and is currently at http://journals.tdl.org/jodi.
Sadly it still does not provide a web feed (I’m ecumenical, I’ll take either
Atom or RSS) so it’s fallen off my radar.
A problem with using hosted services for your blog becomes
evident: they don’t provide for the day you decide to stop using the service,
so Black Belt Jones’ “Flyingcarpet” blog from 2004 404’s today (the correct URL
is now http://www.blackbeltjones.com/work/).
A simple redirect service from TypePad would keep them in the graces of their
ex-customers (I ran into something similar when I dropped my Radio Userland
site, which years later still shows up in Google results on “Ed Costello”
though it is content-free).
I’d say, based entirely on informed guessing, that half of
the blog sites I was following in 2004 have either shut down entirely or drastically
changed focus.
Apparently in 2004 I was interested in politics and urban
issues. I can’t tell why I bookmarked a number of the sites I did. One
site I just checked is obviously a blog, but has no name (not even on the
“About” page) and the writing, while good, isn’t of any interest to me today.
It’s depressing to see how many sites, which must have had
some interesting thing in 2004, have become junk search portal pages (you know
these, they have some generic bland design and the headline is “Resources about
goshIGotADomainName.com”). Even Gene Kranz’s web site has become a SEO gateway
page.
It was interesting to come across Jet Lag: How Boeing Blew It in
light of this week’s NY Times article: A
Humbled Airbus Learns Hard Lessons. Net: 2003’s loser is 2006’s winner.
Another surprising observation: how many professionally run
sites (either by bozoes like myself who should know by now how to run a
web site, or commercial sites) don’t use redirects when they redesign or
restructure. The content is still on the site, but the old bookmark just
results in a 404 – file not found error, instead of a redirect (or a gateway
page saying “Hey, we’ve restructured and don’t have a clue how to redirect
you”). R.E.M. apparently had a MySpace page at http://myspace.com/rem/ in 2004.
Since 2004 MySpace has apparently changed something because that URL responds
with a generic Microsoft IIS 404 page, while stripping the trailing “/” off
yields the current R.E.M. page. Perfectly valid thing to do, but if you’ve
structured the site one way and restructure it later, at least trap the older
URL and issue a redirect, or something.
Digging deeper I found a number of links from my IBM.com
days, both to IBM sites (apologies about knocking on the ibm.com staging site’s
door there, who would have thought the URL wouldn’t change in 6 years!) and
sites I apparently was interested in in 1997-1999.
I found a link to the first phishing site I ever came
across, the “IBM-AOL Rewards” scam from 1999. I don’t recall the exact
details, but we came across it because people started sending email to our
webmaster mail complaining that they had not received their “IBM AOL” reward.
I could not believe, then, that people would believe that IBM and AOL would
host a major corporate program on angelfire.com, but that didn’t trip people’s
skepticism wire.
The file goes way back, covering the whole Y2K imbroglio, IBM.com’s “Bullseye” redesign, “Deep Blue”, various Olympic Games related links, various news articles.
This one from ten years ago was humorous to read:
Banner Ads on Internet Attract Users (December 3, 1996).
I am surprised that the NYT allows access to that without the paywall popping up.
So, there’s still lots of interesting information in the
file, but it is of little or no use to me today. I’ll keep it, but it’ll move
into the archives on our Mac, safely off my laptop, perhaps never to be seen
again. I don’t know if Google Desktop Search would help here (partly because
it doesn’t even occur to me to search my own archives for stuff, I mean, why
would I need to when most things I’m interested in are discoverable through
Google Search of the open web).
And there’s a problem: I keep all of this stuff, this
digital crap, around because at the moment I decide to keep it I think it
might be useful some day. But it rarely is, and if it was useful, it
doesn’t even occur to me to look in my own archives. So why do we keep all of
this stuff (and sign up for even more) if we don’t have a way of managing it
and discovering it?
e.p.c. posted this at 18:12 GMT on 16-Dec-2006 from Brooklyn, NY.
Source,
Archive Link