[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

archiving thoughts



Recently on the e-collections list there's been a thread going about
archiving/perpetual access requirements for web subscriptions to
databases. I contributed something to that discussion that David Goodman
suggested would also interest the liblicense community, since archiving
has been a topic we discussed in the past.  So here's a revised version of
what got posted on e-collections.  Maybe it will start another lively
discussion on this list...

--Kimberly Parker


Since I think about the "archiving" question on and off regularly, I
occasionally come up with ideas to which I like to seek reactions.  
Please keep in mind as you read that I'm not trying to say the technical
questions have been solved.  However, I think there are ways that we can
think about the issue that can inform the technical and economic
directions that companies and libraries will choose to pursue.  And I hope
this opens debate!

There are at least two aspects of the "archiving of e-resources" issue
that are different but interrelated.  One aspect is the
"archive/preservation/historical record." The other aspect is the
"perpetual access/ownership of subscribed-to content."

The approach to these questions is different for different types of
e-resources.  There are serial publications that have distinct and
discretely available chunks of content that become available sequentially.  
There are also coherent publications that, while they may be updated over
time, have integrated content.

In the former case ("e-serials" for short), we already see many solutions
in place. Companies and institutions are tackling the "e-serials" archive
and perpetual access problem by stating some level of commitment to
maintaining the content of the data over time.  If enough people do this
and in enough places, the archive question is pretty much solved. (I say
that so blithely.)  If there is commitment to maintaining *subscription
records* (who subscribed to which years) for as long as the maintaining
company cares to restrict subscribers' access to *only* those years (I'm
assuming here that at some point, the whole older record might end up in
the public domain), then perpetual access becomes available in an
acceptable form.

The latter case ("e-databases" for short), can be thought of in two ways.  
One way is as regular "editions" of a work like an encyclopedia or a
directory.  Another way is as a loose-leaf publication which has parts
that are regularly edited or appended.  Let's take the "editions" model
first.

If the producing company is willing to take regular (once a year, once
every two years, once every five years?) snapshots of their product and
maintain those, they have content to which they can provide perpetual
access while still providing an economic incentive for people to keep
subscribing.  When an institution stops subscribing, they'd get access to
the next "oldest" snapshot until the "new" snapshot (containing, at least
partially, data they subscribed to) is available, at which point they'd
get access to that newer snapshot, and there their access would remain. In
this model, there's a tension between producing snapshots more often,
costing more money to maintain more versions, and waiting longer to
produce snapshots, thereby giving some subscribers better (more complete,
more up-to-date) data than they originally paid for.  Of course, a
producing company might choose to roll through snapshots-- deleting older
ones, thereby reducing their load of versions to maintain, and constantly
providing perpetual access to all subscribers to the oldest version.  
This idea, however, would only solve perpetual access and would not solve
the archive question.

In the "editions" model, the archive is intended to preserve the
historical record, and it is important to have all the different
"editions" available for a scholar to review.  Just as some libraries
collect every edition of Encyclopedia Britannica, or every fifth edition
of the CRC Handbook of Chemistry and Physics, the goal would be to have a
sequence of snapshots available for review.

The "loose-leaf" model requires more technical capability.  Here I am
assuming that every bit of data added into the coherent "e-database" would
have an invisible (or visible?) time-stamp/datestamp.  With appropriate
filters in the interface, what can be seen by any specified subscriber is
only that data that passes certain time/date-signature filters.  For
perpetual access, subscriber information needs to be maintained to record
what date filtered version is appropriate.  For the archive, the whole
database including these time/date signatures is maintained indefinitely
and future scholars can filter the data however they need to to see what
information was available at any moment in time.

So...are there gaping holes in my logic?  Besides the economic ones, that
is...


------------------------------------------------------------- 
Kimberly Parker 
Electronic Publishing and Collections Librarian 
Yale University Library 
130 Wall Street              Voice (203) 432-0067 
P.O. Box 208240              Fax (203) 432-8527 
New Haven, CT  06520-8240   
<mailto:kimberly.parker@yale.edu>mailto:kimberly.parker@yale.edu 
-------------------------------------------------------------