[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Does the arXiv lead to higher citations and reduced publisher downloads?

This is a nice mind experiment, as in "What is the mass of an object as it approaches the speed of light?" Thus we imagine a world of 100% OA.

At 100% OA there are repositories of whatever kind, no publishers, and no peer review. An author deposits a paper in an archive somewhere, where it is indexed by Google or a Google-like mechanism. There is no peer review because no one will pay for it, and there are no libraries because the Google-like service has done away with the librarian's function. An author in this scenario is a more sophisticated version of a blogger.

I share Harnad's vision, except that I view it as dystopian.

Joe Esposito

----- Original Message -----
From: "Stevan Harnad" <harnad@ecs.soton.ac.uk>
To: <liblicense-l@lists.yale.edu>
Sent: Wednesday, March 15, 2006 5:13 PM
Subject: Re: Does the arXiv lead to higher citations and reduced publisher

On Tue, 14 Mar 2006, Phil Davis wrote:

Liblicense, While our study confirms the same citation advantage reported by others, it does not attribute Open Access as the cause of more citations, but to Self-Selection. Open Access therefore may be a result, not a cause, of authors promoting higher-quality work.

Does the arXiv lead to higher citations and reduced publisher downloads for mathematics articles? Authors: Philip M. Davis, Michael J. Fromerth Date: March 14, 2006 http://arxiv.org/abs/cs.DL/0603056
The full text of Phil Davis's paper is not yet accessible, so I can only respond to the abstract.

There are many plausible components of the OA advantage, of which
self-selection (Quality Bias: QB) is certainly one -- but not the
only one, and unlikely to be the principle one, except under a
few special conditions. QB is a temporary phenomenon, obviously,
disappearing completely at 100% OA. Same is true for the
Competitive Advantage (CA) of (comparable) OA papers over non-OA
papers in the same journal issue, as well as the Arxiv Advantage
(the advantage of appearing jointly in a central, widely
consulted repository).

Once 100% OA is reached, QB, CA and AA all vanish. (AA vanishes
because of OAI interoperability and central harvesting services.)

But there are three other components that remain even at 100% OA:

Early Access Advantage (EA): The permanent citation boost from earlier
Quality Advantage (QA): The permanent advantage of quality once the
playing field has been levelled and affordability/accessibility no
longer biases what is and is not accessible
Usage Advantage (UA): Average downloads for OA articles are at least
double those of non-OA articles

OA Impact Advantage = EA + (AA) + (QB) + QA + (CA) + UA

An analysis of 2,765 articles published in four math journals
from 1997-2005 indicated that articles deposited in the arXiv
received 35% more citations on average than non-deposited
articles (an advantage of about 1.1 citations per article), and
this difference was most pronounced for highly-cited articles.
The most plausible explanation was not the Open Access or Early
View postulates, but Self-Selection, which has led to higher
quality articles being deposited in the arXiv.
Without seeing the full text one cannot be sure of how this was
ascertained, but let us assume that it was by correlation
(looking at the author's track record, and their comparable
non-OA articles, to show that there is a strong correlation
between prior author/article citation rates and probability of
later self-archiving).

There is no doubt at all that this is a causal factor, and indeed
it is the example set by the high-quality authors that helps
encourage other authors to self-archive.

But the only systematic way to show that QB is the *only*
component of the OA advantage, or the biggest one, is to test it
at all levels of self-archiving, from 1% to 99%. Obviously a
citation advantage that persists even as a larger and larger
proportion of the research in the field becomes OA is less and
less likely to be due to the fact that the best author/articles
are the ones being self-archived.

And it also has to be tested for articles at all citation levels
(i.e., for comparable low, medium, and high-citation articles).
The OA advantage is bigger at the higher citation levels, to be
sure, but if it is even present at the lower ones, that already
shows that QB is unlikely to be the only factor.

As to estimating the relative size of the causal contributions of
each of the 6 factors -- this will require a more fine-grained
analysis, taking into account not only %OA, citation level, and
article age, but also article deposit date. Equating average
citation levels for the authors and for the specialty domain will
be necessary in the comparisons, and a lot of journals will need
to be sampled, in diverse fields, to make sure patterns are not

Yet in spite of their citation advantage, arXiv-deposited
articles received 23% fewer downloads from the publisher's
website (about 10 fewer downloads per article) in all but the
most recent two years after publication. The data suggest that
arXiv and the publisher's website may be fulfilling distinct
functional needs of the reader.
That sounds like the Arxiv Advantage (AA) expressed in the
downloads (UA).

Apart from total citation counts and downloads, other interesting
variables to look at (and compare for OA effects) include:
citation latency, citation longevity and other temporal measures;
same for downloads; also authority impact (similar to google's
PageRank: citations by higher-cited citers count for more),
inbreeding/outbreeding coefficients, co-citations, and semantic

Stevan Harnad