[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: RECENT MANUAL MEASUREMENTS OF OA AND OAA
- To: liblicense-l@lists.yale.edu
- Subject: Re: RECENT MANUAL MEASUREMENTS OF OA AND OAA
- From: Stevan Harnad <harnad@ecs.soton.ac.uk>
- Date: Thu, 12 Jan 2006 17:57:08 EST
- Reply-to: liblicense-l@lists.yale.edu
- Sender: owner-liblicense-l@lists.yale.edu
On Wed, 11 Jan 2006, David Goodman wrote (in liblicense-l): > Within the last few months, Stevan Harnad and his group, and we in our > group, have carried out together several manual measurements of OA (and > sometimes OAA, Open Access Advantage). The intent has been to > independently evaluate the accuracy of Chawki Hajjem's robot program, > which has been widely used by Harnad's group to out similar measurements > by computer. > > The results from these measurements were first reported in a joint > posting on Amsci,* referring for specifics to a simultaneously posted > detailed technical report,** in which the results of each of several > manual analyses were separately reported. > > * http://listserver.sigmaxi.org/sc/wa.exe?A2=ind05&L > =american-scientist-open-access-forum&D=1&O=D&F=l&P=96445) > > ** "Evaluation of Algorithm Performance on Identifying OA" by Kristin > Antelman, Nisa Bakkalbasi, David Goodman, Chawki Hajjem, Stevan Harnad (in > alphabetical order) posted on ECS as http: eprints/ecs.soton.ac.uk/11689, > > From these data, both groups agreed that "In conclusion, the robot is not > yet performing at a desirable level and future work may be needed to > determine the causes, and improve the algorithm." I am happy that David and his co-workers did an independent test of how accurately Chawki's robot detects OA. The robot over-estimates OA (i.e., it miscodes many non-OA articles as OA: false positives, or false OA). Since our primary interest was and is in demonstrating the OA citation impact advantage, we had reasoned that any tendency to mix up OA and non-OA would go against us, because we were comparing the relative number of citations for OA and non-OA articles: the OA/non-OA citation ratio. So mixing up OA and non-OA would simply dilute that ratio, hence the detectability of any underlying OA advantage. (But more on this below.) We were not particularly touting the robot's accuracy in and of itself, nor its absolute estimates of the percentage of OA articles. There are other estimates of %OA, and they all agree that it is roughly between 5% and 25%, depending on field and year. We definitely do not think that pinning down that absolute percentage accurately is the high priority research goal at this time. In contrast, confirming the OA impact advantage (as first reported in 2001 by Lawrence for computer science) across other disciplines *is* a high priority research goal today (because of its importance for motivating OA). And we have already confirmed that OA advantage in a number of areas of physics and mathematics *without the use of a robot.* Brody, T. and Harnad, S. (2004) Comparing the Impact of Open Access (OA) vs. Non-OA Articles in the Same Journals. D-Lib Magazine 10(6). http://eprints.ecs.soton.ac.uk/10207/ Harnad, S., Brody, T., Vallieres, F., Carr, L., Hitchcock, S., Yves, G., Charles, O., Stamerjohanns, H. and Hilf, E. (2004) The Access/Impact Problem and the Green and Gold Roads to Open Access. Serials review 30(4). http://eprints.ecs.soton.ac.uk/10209/ Brody, T., Harnad, S. and Carr, L. (2005) Earlier Web Usage Statistics as Predictors of Later Citation Impact. Journal of the American Association for Information Science and Technology (JASIST). http://eprints.ecs.soton.ac.uk/10713/ For the OA advantage too, it is its virtually exception-free positive polarity that is most important today -- less so its absolute value or variation by year and field. The summary of the Goodman et al. independent signal-detection analysis of the robot's accuracy is the following: This is a second signal-detection analysis of the accuracy of a robot in detecting open access (OA) articles (by checking by hand how many of the articles the robot tagged OA were really OA, and vice versa). A first analysis, on a smaller sample (Biology: 100 OA, 100 non-OA), had found a detectability (d') of 2.45 and bias of 0.52 (hits 93%, false positives 16%; Biology %OA: 14%; OA citation advantage: 50%). The present analysis on a larger sample (Biology: 272 OA, 272 non-OA) found a detectability of 0.98 and bias of 0.78 (hits 77%, false positives, 41%; Biology %OA: 16%; OA citation advantage: 64%). An analysis in Sociology (177 OA, 177 non-OA) found near-chance detectability (d' = 0.11) and an OA bias of 0.99 (hits, 9%, false alarms, -2%; prior robot estimate Sociology %OA: 23%; present estimate 15%). It was not possible from these data to estimate the Sociology OA citation advantage. CONCLUSIONS: The robot significantly overcodes for OA. In Biology 2002, 40% of identified OA was in fact OA. In Sociology 2000, only 18% of identified OA was in fact OA. Missed OA was lower: 12% in Biology 2002 and 14% in Sociology 2000. The sources of the error are impossible to determine from the present data, since the algorithm did not capture URLs for documents identified as OA. In conclusion, the robot is not yet performing at a desirable level and future work may be needed to determine the causes, and improve the algorithm. In other words, the second test, based on the better, larger sample, finds a lower accuracy and a higher false-OA bias. In Biology, the robot had estimated 14% OA overall; the estimate based on the Goodman et al sample was instead 16% OA. (So the robot's *over*coding of the OA had actually resulted in a slight *under*estimate of %OA -- largely because the population proportion of OA is so low: somewhere between 5% and 25%.) The robot had found an average OA advantage of 50% in Biology; the Goodman et al sample found an OA advantage of 64%. (Again, there was not much change, because the overall proportion of OA is still so low.) Our robot's accuracy for Sociology (which we had not tested, so Goodman et al's was the first test) turned out to be much worse, and we are investigating this further. It will be important to find out why the robot's accuracy in detecting OA would vary from field to field. > Our group has now prepared an overall meta-analysis of the manual > results from both groups. *** We are able to combine the results, as we > all were careful to examine the same sample base using identical > protocols for both the counting and the analysis. Upon testing, we found > a within-group inter-rater agreement of 93% and a between-groups > agreement of 92%. > > *** "Meta-analysis of OA and OAA manual determinations." David Goodman, > Kristen Antelman, and Nisa Bakkalbasi, > <http://eprints.rclis.org/archive/00005327/> I am not sure about the informativeness of a "meta-analysis" based on two samples, from two different fields, whose main feature is that there seems to be a substantial difference in robot accuracy between the two fields! Until we determine why the robot's accuracy would differ by field, combining these two divergent results is like averaging over apples and oranges. It is trying to squeeze too much out of limited data. Our own group is currently focusing on testing the robot's accuracy in Biology and Sociology (see end of this message), using a still larger sample of each, and looking at other correlates, such as the number of search-matches for each item. This is incomparably more important than simply increasing the robot's accuracy for its own sake, ot for trying to get more accurate absolute estimates of the percentage of OA articles, because if the robot's false-OA bias were to be large enough *and* were correlated with the number of search-match items (i.e., if articles that have more non-OA matches on the Web are more likely to be falsely coded as OA) then this would compromise the robot-based OA-advantage estimates. > Between us, we analyzed a combined sample of 1198 articles in biology and > sociology, 559 of which the robot had identified as OA, and 559 of which > the robot had reported as non-OA. > > Of the 559 robot-identified OA articles , only 224 actually were OA (37%). > Of the 559 robot-identified non-OA articles, 533 were truly non-OA (89%). > The discriminability index, a common used figure of merit, was only 0.97. It is not at all clear what these figures imply, if anything. What would be of interest would be to calculate the OA citation advantage for each field (separately, and then, if you wish, combined) based on the citation counts for articles now correctly coded by humans as OA and non-OA in this sample, and to compare that with the robot-based estimate. More calculations on the robot's overall inaccuracy averaging across these two fields is not in and of itself providing any useful information. > (We wish to emphasize that our group's results find true OAA in biology at > a substantial level, and we all consider OAA one of the many reasons that > authors should publish OA.) It would be useful to look at the OAA (OA citation advantage) for the Sociology sample too, but note that the right way to compare OA and non-OA citations is within the same journal/year. Here only one year is involved, and perhaps even the raw OA/non-OA citation ratio will tell us something, but not a lot, given that there can be journal-bias, with the OA articles coming from some journals and the non-OA ones coming from different journals: Journals do not all have the same average citation counts. > In the many separate postings and papers from the SH group, such as **** > and ***** done without our group's involvement, their authors refer only > to the SH part of the small manual inter-rater reliability test. As it was > a small and nonrandom sample, it yields an anomalous discriminability > index of 2.45, unlike the values found for larger individual tests or for > the combined sample. They then use that partial result by itself to prove > the robot's accuracy. > > **** such as "Open Access to Research Increases Citation Impact" by > Chawki Hajjem, Yves Gingras, Tim Brody, Les Carr, and Stevan Harnad > http://eprints.ecs.soton.ac.uk/11687 > > *****: "Ten-Year Cross-Disciplinary Comparison of the Growth of Open > Access and How it Increases Research Citation Impact" by 5. C. Hajjem, S. > Harnad, and Y. Gingras in IEEE Data Engineering Bulletin, 2005, > http://eprints.ecs.soton.ac.uk/11688/ No one is "proving" (or interested in proving) robot accuracy! In our publications to date, we cite our results to date. The Goodman et al. test results came out too late to be mentioned in the ***** published article, but they will be mentioned in the **** updated preprint (along with the further results from our ongoing tests). > None of the SH group's postings or publications refer to the joint > report from the two groups, of which they could not have been ignorant, > as the report was concurrently being evaluated and reviewed by SH. Are Goodman et al. suggesting that there has been some suppression of information here -- information from reports that we have co-signed and co-posted publicly? Or are Goodman et al. concerned that they are not getting sufficient credit for something? > Considering that both the joint ecs technical report ** and the separate > SH group report***** were both posted on Dec .16 2005, we have here > perhaps the first known instance of a author posting findings on the > same subject, on the same day, as adjacent postings on the same list, > but with opposite conclusions. One of the postings being a published postprint and the other an unpublished preprint! Again, what exactly is Goodman et al.'s point? > In view of these joint results, there is good reason to consider all > current and earlier automated results performed using the CH algorithm > to be of doubtful validity. The reader may judge: merely examine the > graphs in the original joint Technical Report; **. They speak for > themselves. No, the robot accuracy tests do not speak for themselves. Nor does the conclusion of Goodman et al's preprint (***) (which I am now rather beginning to regret having obligingly "co-signed"!): "In conclusion, the robot is not yet performing at a desirable level and future work may be needed to determine the causes, and improve the algorithm." What *I* meant in agreeing with that conclusion was that we needed to find out why there were the big differences in the robot accuracy estimates (between our two samples and between the two fields). The robot's detection accuracy can and will be tightened, if and when it becomes clear that it needs to be, for our primary purpose (measuring and comparing the OA citation advantage across fields) or even our secondary purpose (estimating the relative percentage of OA by field and year), but not as an end in itself (i.e., just for the sake of increasing or "proving" robot accuracy). The reason we are doing our analyses with a robot rather than by hand is to be able to cover far more fields, years and articles, more quickly, than it is possible to do by hand. The hand-samples are a good check on the accuracy of the robot's estimates, but they are not necessarily a level of accuracy we need to reach or even approach with the robot! On the other hand, potential artifacts -- tending in opposite directions -- do need to be tested, and, if necessary, controlled for (including tightening the robot's accuracy): (1) to what extent is the OA citation "advantage" just a non-causal self-selection quality bias, with authors selectively self-archiving their higher-quality, hence higher citation-probability articles? (2) to what extent is the OA citation "advantage" just an artifact of false positives by the robot? (because there will be more false positives when there are more matches with the reference search from articles *other* than the article itself, hence more false positives with articles that are more cited on the web, which would make the robot-based outcome not an OA effect, and circular) A third question (not about a potential artifact, but about a genuine causal component of the OA advantage) is: (3) to what extent is the OA advantage an Early (preprint) Advantage (EA)? For those who are interested in our ongoing analyses, I append some further information below. Stevan Harnad Chawki: Here are the tests and controls that need to be done to determine both the robot's accuracy in detecting and estimating %OA and the causality of the observed citation advantage: (1) When you re-do the searches in Biology and Sociology (to begin with: other disciplines can come later), make sure to (1a) store the number as well as the URLs of all retrieved sites that match the reference-query and (1b) make the robot check the whole list (up to at least the pre-specified N-item limit you used before) rather than the robot's stopping as soon as it thinks it has found that the item is "OA," as in your prior searches. That way you will have, for each of your Biology and Sociology ISI reference articles, not only their citation counts, but also their query-match counts (from the search-engines) and also the number and ordinal position for every time the robot calls them "OA." (One item might have, say, k query-matches, with the 3rd, 9th and kth one judged "OA" by the robot, and the other k-3 judged non-OA.) Both the number (and URLs) of query-matches and the ordinal position of the first "OA"-call and the total number and proportion of OA-calls will be important test data to make sure that our robot-based OA citation advantage estimate is *not* just a query-match-frequency and/or query-match frequency plus false alarm artifact. (The potential artifact is that the robot-based OA advantage is not an OA advantage at all, but merely a reflection of the fact that more highly cited articles are more likely to have online items that *cite* them, and that these online items are the ones the robot is *mistaking* for OA full-texts of the *cited* article itself.) (2) As a further check on robot accuracy, please use a subset of URLs for articles that we *know* to be OA (e.g., from PubMed Central, Google Scholar, Arxiv, CogPrints) and try both the search-engines (for % query-matches) and the robot (for "%OA") on them. That will give another estimate of the *miss* rate of the search-engines as well as of the robot's algorithm for OA. (3) While you are doing this, in addition to the parameters that are stored with the reference (the citation count, the URLs for every query-match by the search, the number, proportion, and ordinal position of those of the matches that the robot tags as "OA"), please also store the citation impact factor of the *journal* in which the reference article was published. (We will use this to do sub-analyses to see whether the pattern is the same for high and low impact journals, and across disciplines; we will also look at it separately, for %OA among articles at different citation levels (1, 2-3, 4-7, 7-15, 16-31, 32-63, 64+), again within and across years and disciplines.) (4) The sampling for Biology and Sociology should of course be based on *pairs* within the same journal/year/issue-number: Assuming that you will be sampling 500 pairs (i.e., 1000 items) in each discipline (1000 Biology, 1000 Sociology), please first pick a *random* sample of 50 pairs for each year, and then, within each pair, pick, at *random*, one OA and one non-OA article per same issue. Use only the robot's *first* ordinal OA as your criterion for "OA" (so that you are duplicating the methodology the robot had used); the criterion for non-OA is, as before: none found among all of the search matches). If you feel you have the time, it would also be informative to check the 2nd or 3rd "OA" item if the robot found more than one. That too would be a good control datum, for evaluating the robot's accuracy under different conditions (number of matches; number/proportion of them judged "OA"). http://eprints.ecs.soton.ac.uk/11687/ http://eprints.ecs.soton.ac.uk/11688/ http://eprints.ecs.soton.ac.uk/11689/ (5) Count also the number of *journals* for which the robot judges that it is at or near 100% OA (for those are almost certainly OA journals and not self-archived articles). Include them in your %OA counts, but of course not in your OA/NOA ratios. (It would be a good idea to check all the ISI journal names against the DOAJ OA journals list -- about 2000 journals -- to make sure you catch all the OA journals.) Keep a count also of how many individual journal *issues* has either 100% OA or 0% OA (and were hence eliminated from the OA/NOA citation ratio). Those numbers will also be useful for later analyses and estimates. With these data we will be in a much better position to estimate the robot's accuracy and some of the factors contributing to the OA citation advantage.
- Prev by Date: Clarifying "non-OA" (RE: RECENT MANUAL MEASUREMENTS OF OA AND OAA)
- Next by Date: Re: ALPSP library survey
- Previous by thread: RECENT MANUAL MEASUREMENTS OF OA AND OAA
- Next by thread: Re: RECENT MANUAL MEASUREMENTS OF OA AND OAA
- Index(es):