[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Impact Factor, Open Access & Other Statistics-Based Quality



On Wed, 26 May 2004, Michael Leach wrote:

> As we build institutional repositories (IR) and begin the process of
> linking these repositories, we could have the ability to create our own
> impact factors, linking the articles and citations among repositories all
> over the world.

This is not only already possible, but already happening. See:

OpCit: The Open Citation Project providing Reference Linking and Citation 
Analysis for Open Archives:  <http://opcit.eprints.org/>

Citebase: The Cross-OAI-Archive Citation and Download Ranking Search
Engine: <http://citebase.eprints.org/>

Citeseer: The oldest citation engine of them all, operating on harvested
non-OAI articles in computer science archived on arbitrary websites:
<http://citeseer.ist.psu.edu/cs>

and the Usage/Citation Correlator, which can be used to predict eventual
citations from current downloads:
<http://citebase.eprints.org/analysis/correlation.php>

Many other new forms of digitometric analyses and performance indicators
will emerge as the Open Access Corpus grows.

>  Similarly, as IR administrators work with publishers
> (including open access as well as more traditional publishers) to directly
> deposit postprint copies of articles and other digital objects in IRs, the
> new IR-Impact Factors could gain a similar weight to the Thomson/ISI
> Impact Factor.  It is likely that the IR-Impact Factor could cover
> literature not currently covered by Thomson/ISI, so while the two Impact
> Factors overlap, they would provide some independent means of assessing a
> journal's or article's impact in a given community.

They can, and already do. Their only limit is the limited size of the OA
corpus so far.

> However, there may be another way to create an "Impact Factor-like"  
> statistic to analyze open access materials and other published works.  
> With the COUNTER standard and similar e-journal statistical tools, it is
> possible for a variety of libraries to merge their user access statistics
> and produce lists of "most accessed papers" or "most accessed ejournals"  
> for given fields.

These are the download statistics that Tim Brody's citebase and
usage/citation correlator already gather. As the OA corpus grows, there
will no doubt be cross-archive arrangements for monitoring, storing and
harvesting download statistics along with citation statistics.

> For instance, the NERL (NorthEast Research Library) Consortium could pool
> their statistics to produce such lists, or perhaps the top research
> institutes in a given field (e.g. MIT, Harvard, Stanford, CalTech, etc. in
> physics) could produce the lists.  Granted, this "ranking" would be less
> "scientific" than the current Thomson/ISI Impact Factor, but it may still
> serve the purpose our users and readers want, which is defining quality
> and relevance.

The only handicap OAI digitometrics has over ISI measures is the size and
scope of the OA corpus. There is nothing less "scientific" about it.

> License agreements would have to be adjusted with publishers to include a
> provision for publishing and pooling the statistical data.  Open access
> publishers would have to be willing and able to supply such data as well.

If we wait for OA journals to prevail in order to approach 100% OA
coverage we will wait till doomsday. OA self-archiving will prevail far
earlier. I doubt that non-OAI publishers will mind pooling usage data once
OA prevails, perhaps even earlier.

> The debate surrounding open access, in part, resides with quality and
> relevance issues.  Waiting five years for an Impact Factor, as IOP's New
> Journal of Physics did, could hinder the process of open access
> acceptance.  Creating other measures of quality, such as the "pooled
> statistics/ranking" or IR-Impact Factor model above could provide another
> measure, and an earlier one, for many new publications.  With many such
> quality models available, individual readers and authors could pick what
> works best for them in determining quality and relevance.

OA Eprint archives will not only provide early-days metrics and predictors
in the form of download and citation counts for the published final drafts
(postprints), but also for the even earlier-days pre-refereeing preprints.

And other, richer digitometric measures will develop too, such as
co-citation statistics (already available with citebase), Google
PageRank-like weightings, but using citations rather than links,
Hub/Authority analysis, co-text semantic analysis, correlation and
prediction, time-series analysis, and much more. All it awaits is the
growth of the Open Access Corpus.

Stevan Harnad