Ranking Web of Repositories: July 2010 Edition

From: Leslie Carr <lac -- ecs.soton.ac.uk>
Date: Fri, Jul 9, 2010 at 1:04 PM
Ranking Web of Repositories: July 2010 Edition

On 9 Jul 2010, at 08:12, Isidro F. Aguillo wrote:

> However perhaps you will like this page we prepared for the
> University rankings related to UK universities commitment to
> OA: http://www.webometrics.info/openac.html

Thanks for preparing the page - it is very informative and
helpful in answering questions about the interpretation of the IR
ranking relating to the discrepancy between the relative ordering
of institutions in the IR list and other (independent) research

As you point out, much of the difference is explained by the
relative "openness" of each institution's literature. Since 50%
of the score is devoted to in-links, and there is little
motivation to link to an empty bibliographic record, a high
proportion of OA papers will tend to attract more links, more
traffic and hence a more "impactful" repository.

Some institutions have therefore benefited from their efforts to
deposit OA papers, becoming more visible and hence more highly
rated. Others are seeing the opposite effect - institutions that
would normally be at the top of any research list are much lower
down than expected. Some of these institutions don't have very
effective repositories and some do but hide them behind
firewalls. Either way the net effect is the same - not much
visible public literature to attract links or traffic.

I hope that the effect of this "league table" will be to
encourage institutions to redouble their efforts in regard to
Open Access. I also hope that it will be possible to have further
public dialogue so that the process can be increasingly open and
the community can better understand, verify and trust your

Thanks again for your contribution! -- Les Carr

>> Dear Isidro,
>> If I may intervene with some comments too, as this discussion
>> has some wider implications:
>> Yes, you are measuring both contents and visibility, but
>> presumably you want the difference between (1) the ranking of
>> the top 800 repositories and (2) the ranking of the top 800
>> *institutional* repositories to be based on the fact that the
>> latter are institutional repositories whereas the former are
>> all repositories (central, i.e., multi-institutional, as well
>> as institutional).
>> Moreover, if you list redundant repositories (some being the
>> proper subsets of others) in the very same ranking, it seems
>> to me the meaning of the ranking becomes rather vague.
>>> Certainly HyperHAL covers the contents of all its
>>> participants, but the impact of these contents depends of
>>> other factors. Probably researchers prefer to link to the
>>> paper in INRIA because of the prestige of this institution,
>>> the affiliation of the author or the marketing of their
>>> institutional repository.
>> All true, but perhaps the significance and usefulness of the
>> rankings would be greater if you either changed the weight of
>> the factors (volume of full-text content, number of links) or,
>> alternatively, you designed the rankings so the user could
>> select and weight the criteria on which the rankings are
>> displayed.
>> Otherwise your weightings become like the "h-index" -- an
>> a-priori combination of untested, unvalidated weights that
>> many users may not be satisfied with, or fully informed by...
>>> But here is a more important aspect. If I were the president
>>> of INRIA I will prefer people using my institutional
>>> repository instead CCSD. No problem with the last one, they
>>> are makinng a great job and increasing the reach of INRIA,
>>> but the papers deposited are a very important (the most
>>> important?) asset of INRIA.
>> But how much INRIA papers are linked, downloaded and cited is
>> not necessarily (or even probably) a function of their direct
>> locus!
>> What is important for INRIA (and all institutions) is that as
>> much as possible of their paper output should be OA,
>> simpliciter, so that it can be linked, downloaded, read,
>> applied, used and cited. It is entirely secondary, for INRIA
>> (and all institutions), *where* their papers are OA, compared
>> to the necessary condition *that* they are OA (and hence
>> freely accessible, usaeble, harvestable).
>> Hence (in my view) by far the most important ranking factor
>> for institutional repositories is how much of their full-text
>> institutional paper output is indeed deposited and OA. INRIA
>> would have no reason to be disappointed if the locus from
>> which its content is searched, retrieved and linked is some
>> other, multi-institutional harvester. INRIA still gets the
>> credit and benefits from all the links, downloads and
>> citations of INRIA content!
>> (Having said that, locus of deposit *does* matter, very much,
>> for deposit mandates, Deposit mandates are necessary in order
>> to generate OA content. And, for strategic reasons that are
>> elaborated in my reply to Chris Armbruster, it makes a big
>> practical difference for success in agreeing on the adoption
>> of a mandate that both institutional and funder mandates
>> should require convergent *institutional* deposit, rather than
>> divergent and competing institutional vs. institution-extermal
>> deposit. Here too, your repository rankings would be much more
>> helpful and informative if they gave a greater weight to the
>> relative size of each institutional repository's content and
>> eliminated multi-institutional repositories from the
>> institutional repository rankings -- or at least allowed
>> institutional repositories to be ranked independently on
>> content vs links.
>> I think you are perhaps being misled here by the analogy with
>> your sister rankings http://www.webometrics.info/ RWWU of
>> universities rather than their repositories In university
>> rankings, the links to the university site itself matter a
>> lot. But in repository rankings links matter much less than
>> *how much institutional content is accessible*. For the degree
>> of usage of that content, harvester sites may be more relevant
>> measures, and, after all, downloads and citations, unlike
>> links, carry their credits (to the authors and institutions)
>> with them no matter where the transaction happens to occur...
>>> Regarding the other comments we are going to correct those
>>> with mistakes but it is very difficult for us to realize that
>>> Virginia Tech University is "faking" its institutional
>>> repository with contents authored by external scholars.
>> I have called Gail McMillan at Virginia Tech about this, and
>> she has explained it to me. The question was never whether
>> Virginia Tech was "faking"! They simply host content over and
>> above Virginia Tech content -- for example, OA journals whose
>> content originates from other institutions.
>> As such, the Virginia Tech repository, besides providing
>> access to Virgina Tech content, is also conduit or portal for
>> accessing the content of those other institutions. The
>> "credit" for providing the conduit, goes to Virginia Tech, of
>> course. But the credit for the links, usage and citations goes
>> to those other institutions! (When an institutional repository
>> is also used as a portal for other institutions, its function
>> becomes a hybrid one -- both an aggregator and a provider. I
>> think it's far more useful and important to try to keep those
>> functions separate, in both the rankings and the weightings.
>> Best wishes,
>> Stevan
>>> El 07/07/2010 23:03, Helene.Bosc escribio:
>>>> Isidro, Thank you for your Ranking Web of World Repositories
>>>> and for informing us about the best quality repositories!
>>>> Being French, I am delighted to see HAL so well ranked and I
>>>> take this opportunity to congratulate Franck Laloe for
>>>> having set up such a good national repository as well as the
>>>> CCSD team for continuing to maintain and improve it.
>>>> Nevertheless, there is a problem in your ranking that I have
>>>> already had occasion to point out to you in private
>>>> messages. May I remind you that:
>>>> Correction for the top 800 ranking:
>>>> The ranking should either index HyperHAL alone, or index
>>>> both HAL/INRIA and HAL/SHS, but not all three repositories
>>>> at the same time: HyperHAL includes both HAL/INRIA and
>>>> HAL/SHS .
>>>> Correction for the ranking of institutional repositories:
>>>> Not only does HyperHAL (#1) include both HAL/INRIA (#3) and
>>>> HAL/SHS (#5), as noted above, but HyperHAL is a
>>>> multidisciplinary repository, intended to collect all French
>>>> research output, across all institutions. Hence it should
>>>> not be classified and ranked against individual
>>>> institutional repositories but as a national, central
>>>> repository. Indeed, even HAL/SHS is multi-institutional in
>>>> the usual sense of the word: single universities or research
>>>> institutions. The classification is perhaps being misled by
>>>> the polysemous use of the word "institution."
>>>> Not to seem to be biassed against my homeland, I would also
>>>> point out that, among the top 10 of the top 800
>>>> "institutional repositories," CERN (#2) is to a certain
>>>> extent hosting multi-institutional output too, and is hence
>>>> not strictly comparable to true single-institution
>>>> repositories. In addition, "California Institute of
>>>> Technology Online Archive of California" (#9) is misnamed --
>>>> it is the Online Archive of California
>>>> http://www.oac.cdlib.org/ (CDLIB, not CalTech) and as such
>>>> it too is multi-institutional. And Digital Library and
>>>> Archives Virginia Tech University (#4) may also be
>>>> anomalous, as it includes the archives of electronic
>>>> journals with multi-institutional content. Most of the
>>>> multi-institutional anomalies in the "Top 800 Institutional"
>>>> seem to be among the top 10 -- as one would expect if
>>>> multiple institutional content is inflating the apparent
>>>> size of a repository. Beyond the top 10 or so, the
>>>> repositories look to be mostly true institutional ones.
>>>> I hope that this will help in improving the next release of
>>>> your increasingly useful ranking!
>>>> Best wishes
>>>> Helene Bosc