[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: MARC records for LION database



Dear Corinna,

Since this mail has been posted to a number of listservs, I'm 
taking the liberty of posting a response to the LES, liblicense 
and AUTOCAT lists, with apologies to the many recipients who will 
therefore receive this message three times. We take your concerns 
very seriously, and would welcome any feedback from the library 
community on these issues, either in direct response to these 
listservs, or via the kind of collective consultation that you 
describe in your mail.

Please accept my apologies for the fact that your initial queries 
were not all dealt with in a timely manner, and my assurances 
that we aim to respond to all customer queries promptly. Your 
document raises a number of important issues, which deserve a 
full response: I have therefore attempted to address the broader 
editorial questions in the body of this email, and have attached 
a document which deals with each of the specific points. Some of 
these points are clearly errors on our part, which we are happy 
to correct (corrected records will be made available in our 
December application release); others, however, raise quite 
fundamental editorial or procedural questions, and it is in these 
areas that we would particularly welcome input from the members 
of these lists.

In response to your first point, about the choice of bibliographic unit,
I think the important distinction to be made is that Literature Online
(LION) is a specialist database of literary texts, unlike Early English
Books Online (EEBO), which is an archive of print volumes in digital
facsimile. The texts are of course originally taken from print volumes,
but the basic unit of the database is the new electronic file that we
have created, rather than, as it is in EEBO, the source printed volume.
In some cases, an electronic file in LION corresponds to one print
volume, but in many cases poems, plays and other works have been
extracted from larger print volumes to create individual files. I agree
that is therefore misleading to describe our MARC records as
representing the 'volume' level, and we will correct the text on this
page accordingly. By 'volume' we meant electronic file (as opposed to,
say, the individual poems contained in within a file), but this needs to
be corrected to avoid the inference that each MARC record corresponds in
all cases to a print volume.

Since EEBO is an archive of print volumes, its MARC records are 
effectively book records; LION's, by contrast, faithfully 
catalogue the electronic files, which are unique, editorially 
created entities. The conventions used in creating the records 
are therefore quite different from those adopted in EEBO. LION's 
MARC records are provided without charge as finding aids for the 
electronic texts in LION, and do not claim to contain the kind of 
expanded bibliographic information of the source volumes that you 
might expect from a complete catalogue record for a printed book. 
They are of course created in accordance with cataloguing 
standards, and contain the full bibliographic information 
contained within the source texts, but the way we represent the 
relationship between the file's contents and the original volume 
differs in many cases. The addition of further information that 
is not present in the source texts, such as uniform titles, 
subject headings, systematic identification of subtitles, and 
standardization of the use of brackets, would be a substantial 
undertaking, which would probably necessitate either charging for 
the records (as we do for the EEBO records), or entering into a 
partnership project with external cataloguers. We already know of 
at least one librarian who has undertaken some of this work on 
the LION MARC records, and we are keen to find the most 
appropriate way of sharing and disseminating this enriched data.

Most of the bibliographic inconsistencies which you have 
identified relate to how the data was created, rather than how 
the records were created. Literature Online was not created all 
of a piece: the 16,000 files are taken from 19 separate 
electronic collections, published over the course of 15 years, 
each of which had its own editorial policy, and most of which 
were originally published on CD-ROM with no expectation that they 
would one day be cross-searchable. Whereas the original 
collections are internally consistent, there will be many 
editorial differences in areas such as the title field, often 
determined by issues such as the practicalities of searching in a 
drama database as opposed to a poetry database. In some cases the 
data was digitised by collaborating academic institutions, who 
made completely different editorial decisions in these areas, and 
we have preserved those decisions rather than standardising with 
our own policies.

Our current policy for new Chadwyck-Healey collections (such as 
the African Writers Series) is to include the full contents of 
the print volumes wherever possible. However, this was not 
feasible or appropriate for earlier collections, which has left 
us with a legacy of inconsistencies across the contents of LION. 
We have a long-term aim of standardizing the bibliographic data 
in LION: this would involve re-structuring the data and search 
functionality, modifying the file titles and bibliographic 
headers, and using this new structure as the basis for the MARC 
records and Z39.50 database. Clearly, this would be a 
considerable task: before embarking on it, we would need to be 
sure that we were taking the right approach and providing the 
data in the most useful way for our customers. We would therefore 
be grateful for any suggestions in this area.

I look forward to hearing your thoughts on these matters, and to working
with you and your colleagues to help improve the service that we
provide.

Best regards,

Matt Kibble,
Development Manager, Literature,
ProQuest Information and Learning
Cambridge, UK
http://lion.chadwyck.co.uk <http://lion.chadwyck.co.uk/> 
http://lion.chadwyck.com <http://lion.chadwyck.com/>

-----Original Message-----
<mailto:owner-liblicense-l@lists.yale.edu> ] On Behalf Of Corinna Baksik
Sent: 07 October 2006 01:16
To: liblicense-l@lists.yale.edu; AUTOCAT
Subject: MARC records for LION database

[please excuse cross-postings]

I would like to publicly raise concerns regarding the MARC 
records for the full-text titles in the LION database (Literature 
Online). The MARC records (over 16,000) are available from the 
vendor at no additional cost to subscribers to the full database.

Our intention at Harvard was to load these records into our 
catalog, but close analysis reveals that they are problematic and 
of poor quality. I have written a document describing the 
problems in detail and posted it here: 
<http://ois.harvard.edu/%7Ecorinna/docs/LION_problems.pdf> >

I am interested in whether other libraries would like to approach 
the vendor as a group and work with them to address these issues. 
It is my understanding that this is a popular database and good 
MARC records would be very valuable to subscribers. I would 
appreciate any comments or suggestions you have. We are 
investigating whether resolution of these problems can be brought 
about through license negotiations, but the more subscribers that 
are concerned about the quality of the records, the better.

In short, there are three issues that concern me most:

1) The truncation of titles in the 245, e.g. the MARC record 
contains "The poems" when the original work is entitled "The 
poems of Maria Lowell."

2) The inconsistent use of brackets in the title field:
             [Poems, in] The loyalist poetry of the Revolution
             The word of Congress ; the factious demagogue. a portrait
 		[In,
            The loyalist poetry of the Revolution]

3) Lack of uniform titles, e.g. MARC record contains "The 
tragedie of King Richard the Second" and no uniform title for 
"King Richard II."

Please feel free to contact me on or off list. I will summarize 
feedback.

Thank you,

Corinna Baksik
Systems Librarian
Harvard University Library