[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: RECENT MANUAL MEASUREMENTS OF OA AND OAA
- To: liblicense-l@lists.yale.edu
- Subject: Re: RECENT MANUAL MEASUREMENTS OF OA AND OAA
- From: Stevan Harnad <harnad@ecs.soton.ac.uk>
- Date: Wed, 25 Jan 2006 18:01:39 EST
- Reply-to: liblicense-l@lists.yale.edu
- Sender: owner-liblicense-l@lists.yale.edu
On Mon, 23 Jan 2006, Phil Davis wrote: > It would be much more constructive if Stevan spent time trying > to find problems in their methodology and analysis... As I said, the discrepancies between our test of the robot's accuracy and Goodman et al's prompted us to try to find the basis for the discrepancy, and we think we have found it: The robot sends ISI reference queries to several search engines and then tests up to the first 60 hits to see if any of them is OA, stopping and returning "OA" as soon as its algorithm judges that a hit is OA, and returning "NOA" if none of the (up to) 60 hits is OA. The right way to check the robot's accuracy is to save all the hits, and hand-check a sample of all of them for a subset that the robot judged "OA" and a subset the robot judged "NOA". What we instead did in our own small test sample was to do a search by hand for a subsample of 100 references that the robot had judged to be OA and 100 references it had judged NOA (in Biology). Goodman et al. did the same for a sample about three times as big in Biology, as well as in Sociology. All three tests found very different accuracies. The reason now seems clear: When one hand-checks the accuracy of a device, this has to be on the *device*'s sample, not a different sample. All of us had used a different sample (and even different search engines). The right test of the robot's accuracy requires hand-checking the (up to) 60 hits that the robot actually sampled and processed and judged OA or NOA. We are now re-doing both the searches and the tests, saving the hits for doing this hand-checking. In other words, all three tests were biassed against the robot -- being based on different samples, from different sources, united only by whether or not the robot had judged the reference item to have an OA version somewhere among the (up to) 60 hits in the *first* sample. We had not noticed the bias earlier, because our test had yielded such a strong accuracy despite the (unnoticed) bias. As I said before, I am glad Goodman et al. did the further test, whose much weaker result alerted us to the fact that something was amiss. We think we have found what was amiss, and it was not in the robot's accuracy but in our test of the robot's accuracy. Stay tuned for the results for both Biology and Sociology, which are being completely re-done by the robot, but this time saving all the hits; the robot accuracy test will be available soon for a still larger subsample of these same data. We are also saving all the hits (for all of Biology and Sociology, not just this larger sample), so anyone else can hand-check them if they wish. Stevan Harnad > At 08:41 PM 1/22/2006, you wrote: >>Before anyone gets too excited about the tiny Goodman et al. test >>result, may I suggest waiting a couple of weeks, when we will be >>reporting the results of a far bigger and more accurate test of >>the robot's accuracy? >> >>Those who (for some reason) were hoping that the robot would >>prove too inaccurate and that the findings on the OA advantage >>would prove invalid may be disappointed with the outcome. I can >>already say that overinterpretations of the tiny Goodman et al. >>test as showing that the OA/OAA findings to date are "worthless" >>are rather overstated even on the meagre evidence to date, >>especially since two thirds of the published findings on the OA >>citation advantage are not even robot-based!. >> >>(This shrillness also seems to me to be trying to make rather >>much out of having actually done rather little!) >> >>As to the separate issue of how to treat the OA journal article >>counts (as opposed to the counts for the self-archived non-OA >>journal articles): We count it all, of course, but only use the >>non-OA journal article counts in calculating the OA advantage, >>because those are (necessarily) within-journal ratios, and >>citation ratios of zero and infinity are meaningless. Think about >>it. > > [SNIP] > >>Stevan Harnad
- Prev by Date: International Electronic Collections position opening - Yale University Library
- Next by Date: Travel Grants to Attend 2006 SSP Annual Meeting Available for Students of Library Science, Information, & Publishing
- Previous by thread: Re: RECENT MANUAL MEASUREMENTS OF OA AND OAA
- Next by thread: Clarifying "non-OA" (RE: RECENT MANUAL MEASUREMENTS OF OA AND OAA)
- Index(es):