[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: a preservation experience



I read with interest the recent thread pertaining to the long-term
preservation of electronic content.  Jim O'Donnell's story of discovery
had a happy ending, but I take from his "gob smacked" response that he
felt lucky that it did.  I can not help but wonder if the story will have
the same happy ending 10 years from now?  In order for it do so, we need
broad recognition of an important point that I have not yet seen raised in
this discussion:  the long-term preservation of electronic scholarly
resources will require deliberate, careful, and sustained effort that
extends beyond the harvesting of web pages or reliance upon any single
organization.

As a community, we are obviously still wrestling with how to preserve our
growing number of important electronic resources.  We are still working to
imagine what shape reliable archives of these materials might take.  The
Wayback Machine offers one example; LOCKSS, national libraries, and
institutional repositories offer other models.  The critical question now
is how will we assess the viability of any particular approach?  What
elements are necessary to ensure the long-term preservation of and access
to electronic scholarly materials? If we are to effectively preserve these
resources for the long term - to "archive" them - then as a community we
must have a broad-based and thorough understanding of the characteristics
of a trusted, credible archive.

There are several components which must be present in any trustworthy
archive.  The 1996 Report of the Task Force on Archiving of Digital
Information <http://www.rlg.org/ArchTF/> and the 2002 report Trusted
Digital Repositories: Attributes and Responsibilities
<http://www.rlg.org/longterm/repositories.pdf> offer clear and useful
descriptions of these elements.  My experience at JSTOR, where staff have
been creating an organizational context to support long-term preservation
of digital scholarly content since the inception of JSTOR, leads me to add
to this existing documentation and to present the following framework your
consideration.  I offer it not to promote any particular implementation,
but to encourage us all to think about what might make an archive
trustworthy.

In our experience, the long-term preservation of and ongoing access to
digital materials requires at a minimum 5 organizational components
specifically dedicated to or consistent with the archival objective:
mission; business model; technological infrastructure; relationships with
libraries; and relationships with publishers.  Without at least these
five, the future of an electronic resource cannot be assured. There may be
other important components as well, but these offer a necessary
foundation.

1)  Organizational mission - This component is absolutely critical because
it drives the resource allocation, decision-making, and routine priorities
and activities of the organization.  When an organization's mission is to
be an archive it will by necessity dedicate its available resources to
this core activity, avoiding the all too frequent competition between
preservation needs and other priorities.  Similarly, when long-term
preservation is mission critical, preservation values and concerns will
necessarily inform the shape of an organization's routine procedures and
processes.

2) Business model - An archive must generate a diverse revenue stream
sufficient to fund the archive, including both the considerable cost of
developing the archive's basic infrastructure and the ongoing operation of
the archive over the long term.  A single source of funding - a single
donor, a government agency, or a foundation - should be evaluated
carefully for its ability to support the longevity of the archive.  We
have all seen noble efforts come and go with the shifting priorities of
those who pull the purse strings.

3) Technological infrastructure:  This infrastructure must support content
ingest, verification, delivery, and multiple format migrations in
accordance with accepted models such as OAIS and best preservation
practices.  It must include and support the automated and manual quality
control processes necessary to protect the ongoing integrity of the
materials and to protect against format or hardware obsolescence.

4) Relationships with libraries:  The archive must meet the needs of the
library community, and it must find a way to balance these needs with
those of other participants in the scholarly communication process taking
into account, for example, what content should be preserved for the long
term.

5) Relationships with content producers:  The archive must establish
agreements for the secure, timely, and reliable deposit of content, and
it must work with publishers and other content producers to secure the
rights necessary to archive the material entrusted to its care.

These components could be implemented in any number of organizational
models.  Indeed, the community will be best served by having multiple
organizations serving as trusted archives.  But if we are to develop a
network of trusted archives - and we have much work to do to reach this
point - we must first find a way to evaluate the efficacy and reliability
of proposed archiving models.  Doing so is an essential step toward an
important goal:  a trusted, reliable, and long-lived record of
scholarship.

Eileen Fenton
Executive Director
Electronic-Archiving Initiative
www.jstor.org/about/earchive.html
609/258-8355 or egfenton@jstor.org

-----Original Message-----

From: Anthony Watkinson [mailto:anthony.watkinson@btopenworld.com]
Sent: Thursday, October 23, 2003 10:31 PM
To: liblicense-l@lists.yale.edu
Subject: Re: a preservation experience

It would be interesting to know how many institutional archives have
long-term funding assured. I would certainly trust a national library
fulfilling the function of a national digital archive of published
material much more and I only wish they could start performing this
function quicker. They certainly have a track record in print.

----- Original Message -----

> At 17:37 21/10/03 -0400, James O'Donnell wrote:

>>It might not be irrelevant to this list's consideration of issues
>>surrounding digital resources and their preservation to hear a little
>>story of discovery.
>>
>>A colleague had 'published' an article in the proceedings of an
>>international conference about three years ago.  The proceedings were
>>only published on-line, and she had linked from her own home page to
>>the official version.  On looking for that article a couple of days
>>ago (to verify some quotations and figures), she discovered that the
>>original publisher had either moved or deleted the original file. A
>>moderately thorough search of the site showed that it was advertising
>>*next* year's conference in the same series, but the publication
>>itself was gone.  A Google search was no help.
>>
>>Consulted on this, I wondered what would happen if . . .  So I went to
>>the Internet Archive site (www.archive.org
<outbind://3/Local%20Settings/Local%20Settings/Temporary%20Internet%20Fi
les/www.archive.org> ) and used their "Wayback Machine": type in the URL 
>>of the desired resource and see what happens. 
>>
>>In a few seconds (good DSL), I had the list.  Hits are listed by 
>>Wayback by date of archiving sweep -- thus, if the same file was 
>>modified over time, captures at different dates will capture different 
>>versions. There were 6 hits for the year 2001 and 1 for February 2002, 
>>none since (suggesting when the original was lost).  The first hit 
>>proved a null set -- file not found. The second through seventh were all 
>>gold:  the original file in its original 'published' form, complete with 
>>all graphics and links.
>>
>>I was gobsmacked!  It left me feeling as I do when I try some
>>improbable keystroke combination deep in the bowels of Microsoft Word,
>>and something I thought impossible suddenly happens.  I feel equally
>>sure that the achievement might be hard to reproduce.  (Naturally we
>>made a copy to hold onto.)
>>
>>Does this model suggest the value of a comprehensive Internet archive?
>>Does it exemplify the "Lots of Copies Keep Stuff Safe" principle?  Or
>>was it gross dumb luck?  I leave these questions to others to discuss.
>>
>>Jim O'Donnell
>>Georgetown University