How to archive a website: A short overview with examples from the Environment & Society Portal

by Martin Spenger

Many great sites are floating around the World Wide Web and scholars are using and citing them frequently. Sometimes, those sites can disappear without a warning and often the content is gone. If the preparations for sustaining the website are not properly realized, the website might be lost forever. This post offers suggestions on what can be done to avoid this. I mention how we have addressed this for the Environment & Society Portal, the Rachel Carson Center’s open access digital archive and publication platform (with which Ant Spider Bee is affiliated). In addition to its two peer-reviewed born digital publications (Arcadia and Virtual Exhibitions), the Portal curates a variety of scholarly and popular materials on the relationship between nature and society. The website is rapidly growing and has monthly version updates.

Digital archiving is a process of preserving data for long-term retention. This can extend the life span of content and metadata and rationalize the ways to access. In order to avoid future issues, it makes sense to address the question of how to archive ones work rather at the beginning than the end of a project.

One of the first steps should be the search for a cooperation partner. Many libraries and similar institutions have great archiving programs and are often very helpful in answering questions and assisting with sustainability concerns. One of the leading institutions in digital archiving is the Internet Archive, the largest Web archiving institution in the world. Among other features it allows you to browse older versions of existing as well as deleted websites through their Wayback Machine. Also the Library of Congress has a large project with the purpose to archive websites and create collections out of them. The UK Web Archive is doing a similar job and has a fast growing archive. All these projects collect news coverage and governmental records that could be relevant for environmental historians.

The German National Library selectively archives websites and most of the larger academic libraries in Germany have similar programs. The Environment & Society Portal is glad to have the Bavarian State Library (BSB) as a partner. The Library Archiving and Access System (BABS) for Long-term Preservation at the Bavarian State Library is an exemplary service. With simply submitting a website URL to the BSB, the site will be professionally harvested by crawler software twice a year. The result can be accessed from all around the world through the OPAC and the content is getting Persistent Identifier through an URN . The Environment & Society Portal has been archived since March 2014 and already has three complete backups. Many university libraries around the world offer similar options.

1

Screenshot of the catalogue record for the Environment & Society Portal website

In addition, requesting an ISSN number may bring your website more visibility and recognition in the academic world. ISSN numbers, used for periodicals, can be assigned not only to a printed publication, but also to other media. A blog or a regular publication is eligible for an ISSN if it is properly marked as a periodical. Here you can find more information on how to do that. An ISSN helps to make a website better archivable. With the ISSN number, your website can be easily added to a library catalogue or an Electronic Journals Library. Two sections of the Environment & Society Portal have their own ISSN numbers, Arcadia and the Virtual Exhibitions.

If there isn’t a partner institution that can help, there are simple ways to archive a website yourself. One thing that can be very helpful is the accessibility of your source code. Especially if your website is designed and written in a unique code, you can share the code via GitHub or other source code hosting platforms. One positive effect of making your code available is that upcoming projects can use your code, or parts of it, as a basis for their projects. Then also other researchers and programmers might add to the code and can possibly help find bugs or optimize the website. All code is documented for the future and can be easily accessed. We hope to make much of the Environment and Society Portal code available in 2016.

To archive a website for your own personal records, there are also several options. The recommended way is to back up the website through your server hosting company. Most hosts offer the option to export all data in a MySQL database file. This file is perfect for your own archiving records and can be helpful if you plan to move the website to another server or to re-launch it. It is also a great emergency backup to have. If you want, you can also use software to convert your website for offline usage and save it on portable drives. In the past, some scholars included CDs of their websites as an addition to their books.

There are many other options on how to sustain your content. The following links might be helpful:

Web Archiving at the Library of Congress

nestor (German competence network for digital preservation)

The National Archives: Web Archiving Guidance (pdf)

Rockwell et al.: Burying Dead Projects: Depositing the Globalization Compendium

 

Martin Spenger is a Research Associate at the Rachel Carson Center for Environment and Society, where he is responsible for the Center’s library and working as a Library Liaison for the Enviroment and Society Portal. He is writing a dissertation about the American poet Gary Snyder.

Leave a Comment

Your email address will not be published. Required fields are marked *