[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Publisher Constraints and the Version of Record



Forwarding some more wise words from the Antipodean 
Archivangelist, Arthur Sale, about the Brisbane Declaration and 
why the OA IR deposit draft should be the author's final refereed 
preprint rather than the publisher's "version of record":

---------- Forwarded message ----------
Date: Fri, 10 Oct 2008 18:15:49 +1100
From: Arthur Sale <ahjs -- ozemail.com.au>
To: institutionalrepositoriescommunity-anz -- googlegroups.com
Subject: RE: [IRCommunity-ANZ] Re: Brisbane Declaration

Rebecca

Paula Callan has already replied to part of your letter, but the 
issue is so important that I think it deserves further 
elaboration. The Brisbane Declaration was worded as it was 
precisely to head off assumptions like yours that repositories 
should be filled with publisher's pdfs. I apologise to the list 
for the length of this reply, which stands in contrast to the 
succinctness of the Brisbane Declaration.

Let me use the NISO terms in this post: roughly "Accepted 
Manuscript" (AM) = author's final draft = postprint; "Version of 
Record" (VoR) = publisher's pdf, however see NISO-RP-8-2008 for 
precise definitions. I prefer these terms because sometimes a 
Version of Record is not a pdf - in most open access journals 
that I publish in, the Version of Record is a collection of html 
files and images. Sometimes it is in XML, and the Version of 
Record can exist in multiple formats.

Now to the point. There are several reasons for the wording in 
the Declaration.

1.  The first is explained by Paula. The Version of Record is 
nearly always prohibited by the publisher from being made open 
access in a university repository. This is true even of some Open 
Access journals, who would prefer that readers access their open 
access website rather than a secondary repository's. Thus a 
repository that contains all or mostly Versions of Record is 
likely to have no claim to being an Open Access Repository - 
rather it is a record-keeping collection of little interest to 
the outside world. In such a case it is pretty pointless 
activity, except that it will absolve the University of keeping 
paper copies of its HERDC research outputs for the Australian 
Government audits, which I suppose is some justification.

In contrast a far greater proportion of publishers are relaxed 
about the Accepted Manuscript being made open access, sometimes 
after a brief embargo period. I expect these embargos to 
disappear with time. A repository full of Accepted Manuscripts is 
substantially an open access repository.

I might just hazard a comment on the "prettiness" of a version. 
It is hard to imagine any researcher thinking that a blank screen 
with the text "Access Denied" is prettier than their Accepted 
Manuscript. If this occurs it is the result of misinformation or 
lack of awareness.

2.  The second concerns the legal status of the two versions. The 
Accepted Manuscript's copyright status rests solely with the 
author and/or his/her employer (generally a university). 
Accordingly, it is quite feasible and legally binding for the 
employer to make a prior claim on all Accepted Manuscripts of its 
employees, such as a mandate to deposit the Accepted Manuscript 
in its repository. Universities may also constrain their graduate 
students (eg theses and degree-related articles) under Rules of 
Degrees. In addition funders such as the ARC and NH&MRC may 
include a similar stipulation into the funding contract offered 
to researchers and their institutions. Accepting the grant 
carries with it a contractual obligation on both the university 
and the researcher which over-rides any subsequent contract.

All of these types of mandates are legally binding if worded 
appropriately. The author is rendered legally incapable of 
signing a contract with a publisher that purports to prevent 
deposit of the Accepted Manuscript, or if they do so sign then 
the contract is unenforceable in this regard. The prior contract 
takes precedence.

The situation with a Version of Record is different. However 
small (and sometimes it is only page numbering), the publisher 
has put some content into the Version of Record, and its 
copyright situation is joint in nature. Universities and funding 
agencies have no authority over publishers, and they therefore 
cannot mandate deposit of a Version of Record, except for private 
record-keeping purposes (like HERDC).

3.  Thirdly, I turn to what a Version of Record is (a fairly 
minor point, but illustrative). Strictly, for paper journals, the 
primary VoR is the printed pages; however most paper journals 
also have websites, from which a page-numbered electronic version 
of the article is downloadable, perhaps under licence. Such a 
file is of course a digital copy of the paper record, even if it 
may even have preceded the paper copy in time and even if it has 
a lower reproduction quality (eg dpi). The practice has arisen of 
calling this "publisher's pdf" as the Version of Record in an 
electronic world though the term is somewhat misleading.

However, not all Versions of Record are available as pdfs. A 
journal which is published online only (and it may be a 
toll-access journal, an open access journal or any other type) 
may have the Version of Record as one or more html files 
accompanied by images. I have several articles of this sort. It 
would be intensely irritating to have a repository manager insist 
on having a pdf. Of course in the face of such insistence, one 
can simply comply and create a fake paginated pdf from the 
unpaginated html Version of Record. But it is stupid. Anyone who 
can access a pdf can surely access html.

4.  Finally, I turn to the other really important issue: pdfs are 
a dumb (obsolete?) format to disseminate research in. Recall that 
a pdf (portable document format) is a way of communicating the 
look of a printed page or pages. It lives uneasily in a digital 
world. The contents of a pdf document contain the text characters 
for sure. However, the non-text items (diagrams, charts, tables, 
captions) have much of the useful information residing in the 
original Accepted Manuscript thrown away. For example the numbers 
in tables are reduced in accuracy to what you can see; images are 
auto-reduced and compressed, charts are reduced to drawing 
instructions or images, and captions are difficult to associate 
with images. A pdf is intended to approximately reproduce a 
printed page, not to be electronically useful.

The reader gets to see what he or she would see if they saw the 
printed page (and in many cases they print it if the paper sizes 
are compatible), but further digging into the document is 
difficult, to know what the data were that went into the chart, a 
full-res version of the CT-scan, or full accuracy of the data in 
an important table. A robot (spider, crawler) is as helpless as 
the human reader or more so.

The format of choice is an XML version of the Accepted 
Manuscript. XML does not need to lose any information in 
conversion; it is preferred for preservation; it is easily read 
by viewers; it is easily generated from common document 
preparation programs (eg Word). Text-based search engines can as 
readily parse XML as html and pdf formats, so the indexing 
capability is not harmed. However, the embedded objects can also 
retain their full quality from the draft: numbers to full 
precision, data in charts, big images. A harvesting robot (or a 
researcher) can access these and extract the real data that 
underlies paper, not a sanitized version. Such tools are in their 
infant stages, but they are coming (eg Google Images and xx).

We also need to start looking beyond the current emphasis on 
collecting documents, vitally important as it is to achieve 100% 
Open Access in that as soon as possible. The Brisbane declaration 
also talks about open access to research data. Some datasets 
(small ones) will find a home in an institutional repository. 
Larger datasets may require dedicated repositories. But in both 
cases pdfs are irrelevant and XML is the format of choice.

May I then turn to your other argument - that you have to do what 
your researchers want. This is a fallacy. You need to lead them, 
not follow them. All methods of convincing a substantial number 
of researchers to voluntarily self-deposit have failed, globally. 
No Australian university is going to make a break-through unless 
it is a very tiny institution (say 100 academics). The only way 
forward is to make self-deposit a routine matter of research 
activity. If it is routine, it gets done - there is nothing more 
than that. That is how HERDC works. This is what mandates are 
designed to do. The university, or the grant-giving body, simply 
says this is what you have to do if you are using our resources 
to do research. And the researchers do it. They don't even 
grumble (much!).

Returning to the Accepted Manuscript, this is the last point in 
time when researchers have hold of the born-digital file that 
constitutes their research output, and it makes a great deal of 
sense to capture it at that time, before it gets lost in the mess 
of researcher offices or disks. It also appears in the repository 
well before the VoR, and according to most citation research has 
a greater chance of attracting citations by the Early Advantage 
effect. Since the AM and the VoR differ in no essential respects 
(otherwise authors would be in arms), the citation advantage 
should trump prettiness. You might also note that the National 
Institute of Health (NIH) mandate in the USA asks for the 
Accepted Manuscript for all the above reasons.

Of course if you can get the rare permission to add a Version of 
Record to your Accepted Manuscript, go for it. It certainly does 
no harm and could be beneficial. But it should be an option only.

Arthur Sale


-----Original Message-----
From: institutionalrepositoriescommunity-anz -- googlegroups.com
on behalf Of Rebecca Parker
Sent: Thursday, 9 October 2008 4:28 PM
Subject: [IRCommunity-ANZ] Re: Brisbane Declaration

Hi Arthur (and all)

I see that there is quite a lot of support for the Brisbane 
Declaration on this and other lists and blogs around the world. 
As someone who didn't attend the Open Access and Research 
Conference in Brisbane last month, I'd like some further 
clarification on one of the points below.

I wonder why the architects of the Brisbane Declaration want the 
'preferred' version of the work to be the author's final draft?

At Swinburne, wherever possible we archive the published version 
of the work. This is, after all, the definitive version---it 
looks more professional, hence authors prefer it. Where we are 
not able to provide access to the published version, we post the 
author's final draft instead. However, we (and more particularly 
our authors) regard this as a poor man's orange---a consolation 
prize. When we negotiate with publishers over permissions, we 
state our preference for the published version; we will accept 
the final manuscript only if there is no alternative.

The argument of the Declaration that the 'essence' of the work 
doesn't change from final draft to published version seems 
irrational---if the content doesn't alter between versions, then 
why not seek to present it in a less amateurish and more visually 
appealing format? PDF can be read by text-to-speech screen 
readers; if we're about opening up access to knowledge, we need 
to prioritise the accessibility needs of all our users, including 
the potential for use of repositories by visually-impaired 
researchers.

In response to the claim that PDF harms the potential for 
harvesting data, I would actively disagree. Swinburne (and other 
institutions) convert all author final drafts from Word (and 
LaTeX, don't forget) to PDF anyway; it's neater, platform 
independent, and currently best practice in the library industry 
in terms of preservation. While I'd rather that this discussion 
remains software independent, I do have to mention that the 
software Swinburne and all ARROW members use for their 
repositories automatically extracts a plain text version of every 
PDF uploaded to the repository. This means that each PDF is 
searchable, and appears in Google free from the proprietary, 
non-standard formatting contained in Microsoft Word documents.

I think it's excellent that Australian higher education 
stakeholders are taking open access so seriously. I'm very 
pleased that the Declaration goes against much of the established 
theory and makes provision for more than just peer-reviewed 
journal articles. After all, these are such a phenomenally small 
subset of the research output published at any university.

However, I'm afraid I can't personally commit to this Declaration 
as it stands.

As a repository manager, I act solely as an agent of my 
university's authors' wishes. All the theory in the world can't 
overturn the fact that without research content, there is no 
repository. And frankly, I think mandating deposit of a 
manuscript version of a work in a repository threatens to further 
reduce contribution rates nationally. Despite all the rhetoric 
about the 'success' of institutional repositories, we all know 
the truth is that most universities globally have woefully low 
self-deposit rates. If authors from backgrounds that don't 
already utilise preprint archives in their disciplines come to 
see their institutional repositories as a space for 
non-definitive works only, they may choose not to use them. 
Academics don't want what they regard as inferior versions of 
their work hosted on university-endorsed websites. It can be 
difficult enough to build a relationship of trust with 
researchers---I don't want to risk breaking that for something 
that at my institution has proved to work well enough already.

As long as I believe that the terms of the Declaration 
misrepresent the needs of my researchers, I'm afraid I'm not able 
to promote this Declaration to members of my university 
community. I absolutely respect the rights of other repository 
managers and IR stakeholders to disagree---it saddens me to have 
to take such a negative attitude to anything that furthers the 
course of open access to knowledge. However, I'd be interested to 
see whether my colleagues at other higher education institutions 
who manage and promote active, successful and 
university-integrated repositories might endorse my point of view 
on this.
___________________________________

Rebecca Parker
Assistant Content Management Librarian
Swinburne University of Technology
John Street, Hawthorn 3122
Australia
Phone: +61 3 9214 4806
Email: rparker -- swin.edu.au
___________________________________