À LA RECHERCHE DU TEMPS PERDU:
ETERNITY IN CYBERSPACE

by M. E. Kabay, PhD, CISSP-ISSMP

mekabay@gmail.com

Professor of Computer Information Systems

Department of Computer Information Systems

Norwich University

 

Everyone makes backups and stores them, right?  And everyone keeps archives of electronic data in accordance with legal requirements or organizational policy, right?

Well, no.

Many of us are storing records in ways that make it unlikely we will ever be able to read them in the long term required for archival use. And archivists ought to know better.

Storing records is only half the task of records management; supporting availability and utility is the essential function.  No one wants a WOM (write-only memory) for their records.  For short-term storage, there is no problem ensuring that stored information will be usable.  Even if a software upgrade changes file formats, the previous versions are usually readable.  In a year, technological changes such as new storage formats will not make older formats unreadable.

Over the medium term, up to five years, difficulties of compatibility do increase, although not catastrophically.  There are certainly plenty of five-year old systems still in use, and it is unlikely that this level of technological inertia will be seriously reduced in the future.

Over the longer term, however, there are serious problems to overcome in maintaining the availability of electronic records.  Over the last ten to twenty years, certain forms of storage have become essentially unusable.  As an example, AES was a powerful force in the dedicated word-processor market in the 1970s; eight-inch disks held dozens or hundreds of pages of text and could be read in almost any office in North America.  By the late 1980s, AES had succumbed to word-processing packages running on general-purpose computers; by 1990, the last Canadian company supporting AES equipment closed its doors in Montreal.  Today, it would be extremely difficult to recover data from AES diskettes.

The problems of obsolescence include data degradation, software incompatibilities and hardware incompatibilities.

Magnetic media degrade over time.  Over a period of a few years, thermal disruption of magnetic domains gradually blurs the boundaries of the magnetized areas, making it harder for I/O devices to distinguish between the domains representing ones and those representing zeroes.  These problems affect tapes, diskettes and magnetic disks and cause increasing parity errors.  Specialized equipment and software can compensate for these errors and recover most of the data on such old media.

Tape media suffer from an additional source of degradation:  the metal oxide becomes friable and begins to flake off the Mylar® backing.  Such losses are unrecoverable.  They occur within a few years in media stored under inadequate environmental controls and within five to ten years for properly-maintained media.  Regular regeneration by copying the data before the underlying medium disintegrates prevents data loss.

Optical disks, which use laser beams to etch bubbles in the substrate, are much more stable than magnetic media.  Because CD-ROMs and laser disks are still so new, no one knows exactly how long optical disks will last. In some cases, there have been documented cases of fungal and bacterial degradation of the optical coating; in others, use of multiple wavelengths of light for overlaying multiple tracks of data has caused interference and data integrity problems. Nonetheless, technologists predict that the information will remain readable for decades and more.  They will remain readable if and only if future CD-ROM systems include backward compatibility.

Software incompatibilities include the application software and the operating system.

The data may be readable, but will they be usable?  Manufacturers provide backward compatibility, but there are limits.  WordPerfect 6.1 can convert files from earlier versions of WordPerfect – but only back to version 4.2.  Over time, application programs evolve and drop support of the earliest data formats.  Database programs, e-mail, spreadsheets – all of today’s and tomorrow’s versions may have trouble interpreting data files correctly.

In any case, all conversion raises the possibility of data loss since new formats are not necessarily supersets of old formats.  For example, in 1972, RUNOFF text files on mainframe systems included instructions to pause a daisy-wheel impact printer so the operator could change daisy wheels – but there was no requirement to document the desired daisy wheel.  The operator made the choice.  What would document conversion do with that instruction?

Even operating systems evolve.  Programs intended for the DOS of a decade ago do not necessarily function on today’s DOS version 6.20.  And the operating systems of yesteryear do not necessarily run on today’s hardware.  Even emulators can cause problems because, again, there is no guarantee of compatibility between the emulated system and the emulator.

Finally, even hardware eventually becomes impossible to maintain.  As mentioned above, it would be extremely difficult to retrieve and interpret data from word-processing equipment from even twenty years ago.  No one outside museums or hobbyists can read an 800 bpi 9-track ¾-inch magnetic tape from a 1980 HP3000 Series III minicomputer.  Over time, even such parameters as electrical power attributes may change, making obsolete equipment difficult to run even if they can be located.

The most robust method developed to date for long-term storage of data is COM (Computer Output to Microfilm).  Documents are printed to microfilm, appearing exactly as if they had been printed to paper and then microphotographed.  Storage densities are high, storage costs are low, and in the worst case, the images can be read with a source of light and a simple lens.

Information security demands that we be able to read old data: it is time for us to pay serious attention to long-term storage technologies.

_____________________________________________

The original version of this article appeared in an issue of the British Secure Computing magazine in 1995 and was later republished in For the Record magazine.