[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: McCabe and Snyder respond to criticism on their paper



On 2011-02-10, at 5:26 PM, Philip Davis wrote:
> "After reading the original post regarding our paper, and the
> subsequent comments, I thought it would be appropriate to
address
> the issue that is generating some heat here, namely whether our
> results can be extrapolated to the OA environment..."
>
> read full response here:
> http://j.mp/hGSY6Z

MCCABE: . . . I thought it would be appropriate to address the
issue that is generating some heat here, namely whether our
results can be extrapolated to the OA environment . . . . (1)
Selection bias and other empirical modeling errors are likely to
have generated overinflated estimates of the benefits of online
access (whether free or paid) on journal article citations in
most if not all of the recent literature.

If "selection bias" refers to author bias toward selectively
making their better (hence more citeable) articles OA, then this
was controlled for in the comparison of self-selected vs.
mandated OA, by Gargouri et al (uncited in the M & S article, but
known to the authors -- indeed the first author requested, and
received, the entire dataset for further analysis: we are all
eager to hear the results).

If "selection bias" refers to the selection of the journals for
analysis, I cannot speak for studies that compare OA journals
with non-OA journals, since we only compare OA articles with
non-OA articles within the same journal. And it is only a few
studies like Evans and Reimer's, that compare citation rates for
journals before and after they are made accessible online (or, in
some cases, freely accessible online). Our principal interest is
in the effects of immediate OA rather than delayed or embargoed
OA (although the latter may be of interest to the publishing
community).

MCCABE: 2. There are at least 2 "flavors" found in this
literature: 1. papers that use cross-section type data or a
single observation for each article (see for example, Lawrence
(2001), Harnad and Brody (2004), Gargouri, et. al. (2010)) and 2.
papers that use panel data or multiple observations over time for
each article (e.g. Evans (2008), Evans and Reimer (2009)).

We cannot detect any mention or analysis of the Gargouri et al.
paper in the M & S paper

MCCABE: 3. In our paper we reproduce the results for both of
these approaches and then, using panel data and a robust
econometric specification (that accounts for selection bias,
important secular trends in the data, etc.), we show that these
results vanish.

We do not see our results cited or reproduced. Does "reproduced"
mean "simulated according to an econometric model"? If so, that
is regrettably too far from actual empirical findings to be
anything but speculations about what would be found if one were
actually to do the empirical studies.

MCCABE: 4. Yes, we only test online versus print, and not
OA versus online for example, but the empirical flaws in the
online versus print and the OA versus online literatures are
fundamentally the same: the failure to properly account for
selection bias. So, using the same technique in both cases should
produce similar results.

Unfortunately this is not very convincing. Flaws there may well
be in the methodology of studies comparing citation counts before
and after the year in which a journal goes online. But these are
not the flaws of studies comparing citation counts of articles
that are and are not made OA within the same journal and year.

Nor is the vague attribution  of "failure to properly account for
selection bias" very convincing, particularly when the most
recent study controlling for selection bias (by comparing
self-selected OA with mandated OA) has not even been taken into
consideration.

Conceptually, the reason the question of whether online access
increases citations over offline access is entirely different
from the question of whether OA increases citations over non-OA
is that (as the authors note), the online/offline effect concerns
*ease* of access: Institutional users have either offline access
or online access, and, according to M & S's results, in
economics, the increased ease of accessing articles online does
not increase citations.

This could be true (although the growth across those same years
of the tendency in economics to make prepublication preprints OA
[harvested by RepEc] through author self-archiving, much as the
physicists had started doing a decade earlier in Arxiv, and
computer scientists started doing even earlier [later harvested
by Citeseerx] could be producing a huge background effect not
taken into account at all in M & S's painstaking temporal
analysis.

But any way one looks at it, there is an enormous difference
between comparing easy vs. hard access (online vs. offline) is
hardly the same thing as comparing access with no access -- as it
is when we compare OA vs non-OA for all those potential users
that are at institutions that cannot afford subscriptions
(whether offline or online) to the journal in which an article
appears. The barrier, in other words (though one should hardly
have to point this out to economists) is not an ease barrier but
a price barrier: Non-OA articles are not just harder to access
for users at nonsubscribing institutions: They are *impossible*
to access unless a price is paid.

(I certainly hope the M & S will not reply with "let them use
interlibrary loan (ILL)"! A study analogous to M & S's
online/offline study comparing citations for offline vs. online
vs. ILL access in the click-through age would not only strain
belief if it too found no difference, but it too would fail to
address OA, since OA is about access when one has reached the
limits of one's institutions subscription/license/pay-per-view
budget. Hence it would again miss all the citations that an
article would have gained it is had been accessible to all its
potential users and not just those whose institutions could
afford access, by whatever means.)

It is ironic that M & S draw their conclusions about OA in
economic terms (and, predictably, as their interest is in
modelling publication economics) in terms of the cost/benefits of
paying to publish in an OA journal for an author, concluding that
since they have shown it will not generate more citations, it is
not worth the money.

But the most compelling findings on the OA citation advantage
come from OA author self-archiving (of articles published in
non-OA journals), not from OA journal publishing. Those are the
studies that show the OA citation advantage, and the advantage
does not cost the author a penny!

And the extra citations are almost certainly coming from users
for whom access to the article would otherwise have been
financially prohibitive. (Perhaps it's time for econometric
modeling from the user's point of view too)

I recommend that M & S look at the studies of Michael Kurtz in
astrophysics. Those too were sophisticated long-term studies of
the effect of the wholesale switch from offline to online, and
Kurtz found that total citations were in fact slightly reduced,
overall! But astrophysics, too, is a field in which OA
self-archiving is widespread. Hence whether and when journals go
online is moot, insofar as citations are concerned. (The likely
hypothesis for the reduced citations -- compatible also with our
own findings in Gargouri et al -- is that OA levels the playing
field for users: OA articles are accessible to everyone, not just
those whose institutions can afford toll access. As a result,
users can *self-selectively* decide to cite only the best and
most relevant articles of all, rather than having to do with a
selection among only the ones to which their institutions can
afford toll access. -- Corollary of this [though probably also a
spinoff of the Seglen/Pareto effect] is that the biggest
beneficiaries of the OA citation advantage will be the best
articles.)

MCCABE: 5. At least in the case of economics and business titles,
it is not even possible to properly test for an independent OA
effect by specifically looking at OA journals in these fields
since there are almost no titles that *switched* from
print/online to OA (I can think of only one such title in our
sample that actually permitted backfiles to be placed in an OA
repository). Indeed, almost all of the OA titles in econ/business
have always been OA and so no statistically meaningful before and
after comparisons can be performed.

The multiple conflation here is so flagrant that it is almost
laughable. Online does not equal OA and OA does not equal OA
journal.

First, the method of comparing the effect on citations before vs.
after the offline/online *switch* will have to make do with its
limitations. (We don't think it's of much use for studying OA
effects at all.) The method of comparing the effect on citations
of OA vs. non-OA within the same (economics/business,
toll-access) journals can certainly proceed apace in those
disciplines, the studies have been done, and the results are much
the same as in other disciplines.

M & S have our latest dataset: Perhaps they would care to test
whether the economics/business subset of it is an exception to
our finding that (a) there is a significant OA advantage in all
disciplines, and (b) it's just as big when the OA is mandated as
when it is self-selected.

MCCABE: 6. One alternative, in the case of cross-section type
data, is to construct field experiments in which articles are
randomly assigned OA status (e.g. Davis (2008) employs this
approach and reports no OA benefit).

And another one -- based on an incomparably larger N, across far
more fields -- is the Gargouri et al study that M & S fail to
mention in their article, and for which they have the full
dataset in hand, as requested.

MCCABE: 7. Another option is to examine articles before and after
they were placed in OA repositories, so that the likely selection
bias effects, important secular trends, etc. can be accounted for
(or in economics jargon, "differenced out"). Evans and
Reimer's attempt to do this in their 2009 paper but only meet
part of the econometric challenge.

M & S are rather too wedded to their before/after method and
thinking! The sensible time for authors to self-archive their
papers is immediately upon acceptance for publication. That's
before the published version has even appeared. Otherwise one is
not studying OA but OA embargo effects. (But let me agree on one
point: Unlike journal publication dates, OA self-archiving dates
are not always known or taken into account; so there may be some
drift there, depending on when the author self-archives. The
solution is not to study the before/after watershed, but to focus
on the articles that are self-archived immediately rather than
later.

Stevan Harnad

Gargouri, Y., Hajjem, C., Lariviere, V., Gingras, Y., Brody, T.,
Carr, L. and Harnad, S. (2010) Self-Selected or Mandated, Open
Access Increases Citation Impact for Higher Quality Research.
PLOS ONE. http://eprints.ecs.soton.ac.uk/18493/