Dead Media Beat: speculating about migration hell

*Dream on. The industry doesn't do hundred-year periods, and that's why digital archives are no more stable than, say, the finance industry.

http://www.theregister.co.uk/2013/04/03/archive_technology_problem/

(...)

"100 year archive

"The problem is actually a very large one. Take the second-biggest company in the world in revenue terms; Shell, properly known as Royal Dutch Shell, which came into being 106 years ago. It has, in effect, a 100+ year archive consisting almost entirely (bar the last few years) of paper documents.

"Let's imagine a 100-year tape archive. How would that work?

"A little over every two years its tape format would advance a generation. LTO-1 was announced in 2000 with a 100GB capacity. Now, 13 years later, we have LTO-6 with 2.5TB capacity; that's 6 format generations over 13 years. Even delaying the format transitions for the archive to every five years (instead of every two years, as recent history shows us) would mean 20 tape format transitions in a century.

"As the archive capacity mounted up the bulk of the tape archive's work would increasingly consist of migrating the contents of old tapes to new ones. It would present ever fewer of its resources in response to archive users' data access requests. The bulk of the cost of the archive would be spent internally, having it chase its own data migration tail, and its cost per user access would skyrocket.

"We are not even thinking yet about how Word 2113 would be able to read a Word 2013 format document; that sort of problem would have to be dealt with possibly by a constant ongoing content format migration as well.

"In a word, this is nonsense. (((The idea may be nonsense; the reality is massive decay.)))

"Unless we reach a stage where archival technology becomes as stable as paper and printing had been for decades, centuries even, then we cannot, unquestioning, keep all the data we digitally collect. The oldest, least-wanted data, will have to be let go, deleted. Unless there is a clear need to keep it then some kind of digital filtering mechanism will have to be used to scrap the least-wanted data and delete it.

"The archive will have to be trawled by digital spider-bots; data killers, looking for useless data and destroying it to make space for wanted data.

"Somebody could make a business out of taking this old data and storing it in a kind of digital deep-freeze for potential re-activation. Maybe this could be on the Moon, in a nuclear-powered mega-flash-vault, with plenty of space to expand; there'll always be another crater ... but this is science fiction. (((Why stop at that?)))

"The real moral of this tale is that virtually no data is needed for ever. Big data bigots' mad claims notwithstanding, digital archives will have to be regularly cleared. Physical space runs out; digital space runs out; formats change; applications change; and preserving access to older and older data will become crushingly expensive.

"Technological change might come up with a solution to this problem, but it's a problem created by that very process of technological change. Beware what you wish for."