[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Followup data on articles being made free



I've corresponded over the last two weeks with several librarians and
publishers about the statistics on "free back issues" that the
HighWire-hosted publishers provide. Since the program provides free full
text journal articles to the general public, it has some resemblance to
the public-access component of the NIH plan now under discussion. My
correspondents have asked me to post the data on liblicense. Here it is. (This being election night/week/month/year in the US, I'm a little worried
about slinging stats without attracting the attention of homeland
security.)

First, some of the facts about the program itself. The Free Back Issues
program got started in 1997 when J Biol Chem, PNAS, and J Cell Biol
publishers agreed that articles "of a certain age" could benefit
educational non-subscribing institutions even if they weren't the "hot
articles" that cutting-edge researchers demanded. So these publishers and
others had HighWire develop the programming to release articles for public
access after 6 months, 12 months, etc. The program now has 186
HighWire-hosted journals participating in Free Back Issues (some other
journal sites not associated with HighWire also have a similar program) plus about 20 additional sites are entirely free without delay. These
sites are listed here: http://highwire.stanford.edu/lists/freeart.dtl

Together, these journals have released over 777,000 full text articles to
date. The amount of free content grows not only as new articles are
published, but as back files are retrospectively scanned in. Somewhere
around 200,000 additional articles are made free each year.

What is of interest about this program in the current NIH Plan discussion?

1. We can see what portion of NIH-funded research is already being made
available to the public through this particular program.

2. We can see what portion of publishers release content in less than or
equal to six months, at 12 months, or later.

However, there are caveats associated with each of these (no data seem to
come without strings attached...).

1. NIH-funded Research

As previously reported to this list by Mark Funk, it is possible to search
pubmed to identify which articles are based on research funded by NIH/PHS. The answer we got (with help from an NLM librarian :), for 2003, is
64,879. Of those, 26,004 (or 40%) were published in journals that are
hosted at HighWire (the J Biol Chem alone with 3,422 of those articles
publishes 5% of the NIH-funded research results).

So what portion of those final articles (not the author mss, but the final
published paper) are made free eventually as things now stand? Answer:
35% of the NIH-funded research is now already being made free (22,996 of
64,879 articles). (Of course, a lot of non-NIH-funded articles are made
free as well.) Another way of looking at it: of the NIH-funded research
that is reported by publishers who host with HighWire, 88% of it is
eventually made open (22,996 of 26,004).

The caveats on this data: 1) Medline indexers can only identify
NIH-funded research if the authors/journals include that fact in the
papers. Journals HighWire works with are considering editorial policies
to ensure this information is included in article text for accurate
indexing. 2) The search technique used here combines NIH with PHS, and it
is non-trivial to tease the agencies apart.

2. Release Dates, a caveat about apples and oranges

I wrote about this in an earlier post to liblicense, responding to a query
from David Goodman:

-----

"Most (85% as I recall) of those 200+ journals make their content free
after a year (a few after 18 or 24 months, sometimes reviews journals have
longer delays). Included in that, 23, or 12%, of the 200 journals that
make content free after a delay period make it free in six months or less. (I am excluding from the percentage the sites that are totally free, since
a number of them are not original-research journals; but there are 24 such
sites if you want to add them back in; that would make it 23% are free or
6 months or less.)"

-----

The caveat that I'd like to add here for these data: All of the
information I have about "free within xx months", involves making content
free after an *issue* has been out a certain number of months. A
significant number of journals we work with publish material online ahead
of print. It turns out that the "nn months" that NIH is counting
(according to a conversation David Lipman of NCBI had with Marty Frank of
Amer Physiol. Soc. I am told) is the time since the *article* (not the
issue) was published.

The effect of this is that the data by month I am reporting is not
strictly comparable to the delay period that NIH is requesting. So I am
not sure it makes sense to use my data it in this context. It *is* a
material difference. Quite a few journals have articles appear online 8
weeks to even 4-5 months ahead of the print date. So if NIH makes an
article free 6 months after its online posting, that might only be 1-2
months after the publisher has printed it in an issue, and another 4-10
months before the publisher would be making it free.

Publishers use the issue date, as I understand it, because it matches what
librarians subscribe to (a number of issues in a volume, not a number of
articles published in a year), but probably also because it is a
traditional marker.

My apologies for the long exposition. There may be simple explanations,
but there are no simple data it seems!

John


-------------------
John Sack, Director
HighWire Press, Stanford University
Phone: 650-723-0192; fax: 650-725-9335
http://highwire.stanford.edu/~sack
sack@stanford.edu