Like what you've read?

On Line Opinion is the only Australian site where you get all sides of the story. We don't
charge, but we need your support. Here�s how you can help.

  • Advertise

    We have a monthly audience of 70,000 and advertising packages from $200 a month.

  • Volunteer

    We always need commissioning editors and sub-editors.

  • Contribute

    Got something to say? Submit an essay.


 The National Forum   Donate   Your Account   On Line Opinion   Forum   Blogs   Polling   About   
On Line Opinion logo ON LINE OPINION - Australia's e-journal of social and political debate

Subscribe!
Subscribe





On Line Opinion is a not-for-profit publication and relies on the generosity of its sponsors, editors and contributors. If you would like to help, contact us.
___________

Syndicate
RSS/XML


RSS 2.0

Fading away: the problem of digital sustainability

By Danny Kingsley - posted Wednesday, 5 September 2007


Software - so last millennium

Software incompatibility is a serious problem for long-term data storage. Some software programs are simply no longer readable, such as WordStar or WordPerfect. Even earlier versions of current software are sometimes unreadable. It all comes down to backward compatibility - the ability of newer versions of programs to convert older files into the new format.

The 2007 versions of Microsoft Word, Excel and PowerPoint have been designed so they will not automatically read earlier versions of the same programs. That is, they are not backwardly compatible. People who are buying the new versions of the software will need to also obtain a “compatibility pack” to allow the migration of these files across.

This is a problem for digital repositories. APSR is developing a program called Automated Obsolescence Notification System (AONS), which scans the repository and gives an alert to the repository manager if there are items that have become, or are about to be, obsolete.

Advertisement

“The scanning runs overnight and is configured by the repository manager. Scans can occur daily, weekly or monthly,” explains Peter Raftos, Project Manager in Scholarly Technology Services at ANU. “We won’t know until the program is operational how often things will become obsolete. The terms ‘obsolete’ and ‘obsolescent’ depend as much on context as format.”

Considering those home albums once again, the sustainability problem goes further than having hardware able to read your storage medium. What about the format the images have been saved as? Images are commonly saved as JPEG files, but this is simply a standard method of compression of images created by the Joint Photographic Experts Group committee - hence the name. (The video equivalent MPEG was created by the Moving Picture Experts Group.) There is no guarantee these standards will remain in the future as digital image requirements change.

404 error - document not found

The Internet began in the 1980s and the World Wide Web software was released with the first web server in 1991. The World Wide Web Consortium was created in September 1994. The Internet was originally designed to withstand disaster by having multiple nodes so if one node was wiped out, the information would be sitting elsewhere. But the Internet provides its own challenges.

The web in itself is not a way to archive anything. For example search engines regularly rewrite the past by updating their indices, overwriting web pages with new ones. The “Wayback Machine” (www.archive.org) is one attempt to create a record of what has passed, by taking snapshots of the web at given points in time, and unlike web pages these are properly stored in the Internet Archive in California.

Academic publishing is increasingly moving onto the web. The issue of URL permanence gains importance as more articles include online citations. The phenomenon of URLs disappearing over time has been described as “link rot”. A recent study showed that over a four-year period more than 37 per cent of online citations of top refereed communication journals had disappeared.

An attempt to address the problem of link rot is to use a globally unique persistent identifier. “These are in theory everlasting,” explains Scott Yeadon, a DSpace Committer based at ANU, which means he’s one of the few people in the world who can access the code behind the research-focused repository system. “As long as the identifier resolution service and object repository keeps running it’s not a problem.”

Advertisement

There are several persistent identifier programs, with the Handle System being the most well known. Handles provide a single URL based at a separate server which points to a document. The fairly well-known Digital Object Identifier (DOI) System is a subset of the Handle System with a cost attached. “These systems are arguably better than using an arbitrary URL,” explains Yeadon. “They are external and the URL is meaningless, which avoids any problems when the meaning of the object changes.”

Another approach is by a group that started out of Stanford University called LOCKSS (Lots of Copies Keep Stuff Safe) and involves creating a distributed, self-repairing, robust, digital preservation system. LOCKSS uses a process called format migration which converts material to a newer format that the browsers do understand.

Show me the money

Organisations and governments worldwide are grappling with this problem and working towards solutions is proving costly. On an institutional scale there is the general cost of setting up repositories. Even repositories based on open source software have set-up costs, and all repositories have costs associated with their running and maintenance.

The Federal Government has recognised the need to address sustainability issues and has started investing considerable sums into the area. The Systemic Infrastructure Initiative was announced in 2001, funding several projects including APSR, Australian Research Repositories Online to the World (ARROW), the Digital Theses Project, the Meta Access Management System (MAMS) to the tune of tens of millions of dollars. Future funding for sustainability will come out of the National Collaborative Research Infrastructure Strategy (NCRIS).

But there is only so much money in the bucket and this is causing discomfort in research circles. “Researchers are concerned that funds will be taken away from their research to cover sustainability,” explains Henty.

Solutions?

There are no easy solutions to the problem of large scale digital sustainability, but there are things that you can do to look after your own digital information. Run regular back-ups and make sure that when you upgrade your computer you also update all of your files. Put versions of your work into your own institutional repository. And those baby photos? Print them out and put them in an album.

  1. Pages:
  2. 1
  3. Page 2
  4. All

First published in the Winter 2007 edition of the ANU Reporter and in ScienceAlert on August 23, 2007.



Discuss in our Forums

See what other readers are saying about this article!

Click here to read & post comments.

4 posts so far.

Share this:
reddit this reddit thisbookmark with del.icio.us Del.icio.usdigg thisseed newsvineSeed NewsvineStumbleUpon StumbleUponsubmit to propellerkwoff it

About the Author

Danny Kingsley is a PhD student looking into the barriers to the uptake of open access in Australia. She is a project officer at the Australian Partnership for Sustainable Repositories.

Other articles by this Author

All articles by Danny Kingsley

Creative Commons LicenseThis work is licensed under a Creative Commons License.

Article Tools
Comment 4 comments
Print Printable version
Subscribe Subscribe
Email Email a friend
Advertisement

About Us Search Discuss Feedback Legals Privacy