A glut of headlines relevant to the post-apocalyptic pirate internet have popped up over the last few weeks. Here’s a quick review with commentary.
This first batch is regarding the temporary loss of major online repositories for “user generated content” (to invoke the cliché). Another post discussing the Wikileaks saga is forthcoming in the context of the post-apocalyptic pirate internet is forthcoming.
Users’ trust is shaken by this sort of thing, and a day after the outage they released a backup application that lets users save all of their Tumblr posts to their hard disks.
Here’s the official line:
Unlike other publishing sites’ approach to backups, our goal was to create a useful copy of your blog’s content that can be viewed on any computer, burned to a CD, or hosted as an archive of static HTML files.
Wherever possible, we use simple file formats. Our backup structure is optimized for Mac OS X’s Spotlight for searching and Quick Look for browsing, and we’ll try to use the same structure and achieve the same benefits on other platforms.
To me this reads more like, “Keep uploading! If we implode, we won’t take your data with us.”
The backup app strikes me as Hail Mary decision executed in the interest of damage control (with the side effect of actually being good news for the survivability of the 2+ billion posts Tumblr hosts on their servers). There’s a tension on social media websites between giving users access to their own data (in the form of database dumps) and maximizing “lock in” — since giving users downloadable access to their data can provide an easy means of egress from one service and migration to a competitor. (cf. Facebook’s recent decision to let users dump their data in one step.)
Of course, like most prophylactics, the download tool would only be useful in the context of the post-apocalyptic pirate internet if it 100% of Tumblr publishers used it 100% of the time. Nevertheless, the fact that this piece of preservationist infrastructure was officially released suggests that some portion of the Tumblr staff / users are paranoid enough to prepare for a data or infrastructure related disaster. The app also implicitly migrates the worst-case backup burden from the host to the client. (e.g. “Oops, we lost everything… what, you didn’t back up your posts?”) This represents a significant shift in one of the basic contracts of Web 2.0, which is the idea that “files” as we know them on our PCs don’t exist, you don’t have to worry about which directory things go in, you don’t plan for a day when you’ll need to open Word 3.0 files, and you certainly don’t have to back up. The understanding between consumer and provider is that once something’s uploaded, it’s safe from loss due to technical failure — where every bit is tucked away in multi-million-dollar data centers and placed under the careful watch of bespectacled geeks pacing up and down miles of server racks.
Of course, that’s not how things work out, but the cloud = safe truism is one that will need to be proven catastrophically false before the basic tenet of the post-apocalyptic pirate internet — that local bits are safe bits — can take hold.
Another outage of reasonably high profile (although certainly not on the scale of Tumblr) struck GitHub on November 14th. A botched command by a systems administrator wiped out a database and destroyed some data along the way. The site was unusable for about three hours.
GitHub is much more esoteric than Tumblr, but for the uninitiated it’s basically a web site layering social-networking tools on top of Git. Git, in turn, is a piece of software that runs locally on your computer to keep track of collaborations around / revisions to source code written in the course of developing software.
Anyway, here’s what bad news looked like, as delivered by GitHub’s mascot, the Octocat:
The nature of Git (the version-control system) means that even a total loss of GitHub (the community build on Git) would be inconvenient, but not catastrophic. When you’re working with a Git repository, you have a local copy on your hard disk that is periodically updated and synced to the GitHub server.
If 50 people are working on a particular project, then 50 copies of that project exist on local hard disks in one corner of the world or another. Thus the degree to which a projects is insured against disaster rises proportionally to a project’s popularity / number of collaborators.
So there are two particularly great things about the Git + GitHub combination that should be kept in mind as plans for the post-apocalyptic pirate internet are drawn up:
The same basic software (Git) is running on both your own computer and GitHub’s servers. In this sense, GitHub makes the most of the web when it’s available (by adding a social layer to Git), but Git itself doesn’t completely melt down in the absence of GitHub. In short, Git’s use of the centralized web is value added, not mission critical.
Local backups are generated automatically in the course of using GitHub — unlike Tumblr’s proposed solution, which calls on users to make a conscious decision to back up at regular intervals if they want the safety of their data.