The Dark Age usually conjures thoughts of Western Europe during the Middle Ages: fumbling, stagnant, lost, without inspiration or hope. However, a modern Dark Age may be on the horizon, according to Vint Cerf, a vice president and chief internet evangelist at Google. In Cerf’s eyes, civilization is nearing the beginning of a Digital Dark Age.
“It is a time when future members of society are unable to correctly interpret digital information created in the past for lack of access to software that can render previously recorded documents, programs, data, etcetera,” Cerf said in an email interview. On Friday, Cerf brought the issue to attention during the meeting of the American Association for the Advancement of Science in San Jose, California.
The term, he said, is a reference to the period in Europe called the Dark Ages, from medieval times until the Renaissance. During this period, writing was largely inaccessible to the population except for religious orders that preserved and copied documents from the past, especially from the Greek and Roman times.
Loss of old knowledge today, though, due to incompatibility with newer formats, is not an entirely new phenomenon, said Leonid Reyzin, a professor of computer science in Boston University’s College of Arts and Sciences. One entity that has fallen prey to uninterpretable files is the British Broadcasting Corporation.
“The BBC Domesday Project is an example of formats going obsolete,” Reyzin said. “It was a collection of contemporary UK, I believe in the 1980s, and it was later obsolete, and there were no devices that could read the collection. This is definitely a concern.”
Reyzin said the primary issue is in not preserving files in a way future generations will find valuable.
“You should keep your files in formats you can read. But on a large scale, or university collection, those kind of things, they have protocol,” he said. “For physical things, offsite storage in case of fire. For digital things, formats that will be maintained and will be useful later.”
With technology innovation, some industries have adopted digital means of keeping records. Though the impact of file loss varies for each situation, Cerf said the probability of file loss is equal for all industries.
“I don’t think any industry is necessarily more vulnerable than another. We are all generally users of proprietary, commercial software, and if we cannot assure that the digital objects we create can be correctly understood by evolving software, [they] may someday no longer be useful if compilers for that source software are unavailable for new hardware,” he said. “We need a regime that offers privileges to organizations devoted to preserving bits, application programs, operating systems and hardware descriptions.”
For example, a decade ago, the Storage Networking Industry Association brought up the idea of a 100-year archive. However, the effort has not gained much traction, Cerf said. To counter the loss of files, Cerf proposed what he called digital vellum.
“Basically, one creates virtual machines capable of emulating older hardware and running older operating systems and applications,” he said. “As long as virtual machines can be created to run these emulations, they can run older operating systems and application software.”
Cerf said that at the moment, Google has no role in accelerating the digital vellum, although other companies, such as Digital Preservation Network, have started to concoct their own methods of preservation. The DPN is a 2-year-old company that works with academic institutions to preserve scholarly work.
“If you think about what has typically happened over the centuries with scholarly work, they have been widely available, and typically, they have been published. You can look at it 100 years later and understand what was actually written and interpret it,” said Dave Pcolar, technical manager at DPN. “The problem with the digital world is the formats. If you lose access to scholarly work over time, you can’t verify if those works were correct or validate the hypothesis they put forward, and you can’t retest it the same way. You are basically relying on memory or other studies.”
Though many discussions regarding file loss have been centered on large institutions, individuals are just as susceptible, Pcolar said.
“We [lose] files when you move from one laptop to another. Music collections may be lost because they are misplaced and … there are complications with digital rights,” he said. “If you look at the Amazon Cloud, you are looking at a single vendor. So there are a lot of gray issues and moving parts that make it difficult to nail down, certainly more difficult than having a CD or DVD in your possession and being able to read that.”
Andrea Goethals, manager of digital preservation and repository services at Harvard Library in Cambridge, is also working on finding solutions. At Harvard Library, Goethals said she uses two general strategies for preservation: migration and emulation. Migration is keeping content accessible through modern technology. Emulation is copying the functionality of an outdated system. While Goethals credited Cerf for raising public awareness on this issue, she said the situation is not exactly as Cerf made it seem.
“It is definitely a problem, but he made it seem like nobody is doing anything about it,” Goethals said. “He gave one example of a Carnegie Mellon project, but there’s actually been different organizations working on this for at least 15 years.”
That’s not to discount it, however. The future for digital preservation, Goethals said, lies in partnerships between commercial giants such as Google, smaller organizations such as DPN and research institutions such as BU.
“If we don’t preserve files, we lose our culture, we lose our art, we lose our history,” she said. “There would be nothing left to research in the future if we didn’t preserve the content now.”