[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Fwd: Re: Role of arXiv
- To: AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM@LISTSERVER.SIGMAXI.ORG
- Subject: Fwd: Re: Role of arXiv
- From: Stevan Harnad <harnad@ecs.soton.ac.uk>
- Date: Mon, 11 Oct 2010 19:44:58 EDT
- Reply-to: liblicense-l@lists.yale.edu
- Sender: owner-liblicense-l@lists.yale.edu
**Cross-Posted** Begin forwarded message: On 08/10/2010 12:56, "Stevan Harnad" <harnad@ecs.soton.ac.uk> wrote: > On Fri, 8 Oct 2010, Monica Duke wrote: > >>> SH: >>> Harvesting is cheap. And each university's >>> IR will be a standard part of >>> its online infrastructure. >> >> MD: >> So far do we have enough (or any) evidence >> that harvesting is cheap? What >> sense of cheap did you mean? > > A harvester does not have to manage the > deposit or host the content, as > Arxiv does. It need only harvest and > host the metadata. There countless > such OA harvesters sprouting all over > (not to mention the Google > Scholar!) -- and that's on the sparse > OA content that exists today (c > 5-25%). Harvesters will abound once the > OA content rises toward 100%, > thanks to OA self-archiving mandates by > universities and funders. > > History will confirm that we are simply > spinning our wheels as we keep > banging on about publishing costs, > repository costs, harvesting costs -- > while our annual research usage and > impact burns, because we have not > got round to mandating deposit... > > Stevan Harnad From: Hugh Glaser hg -- ecs.soton.ac.uk Date: October 10, 2010 6:06:16 PM EDT To: JISC-REPOSITORIES -- JISCMAIL.AC.UK Subject: Re: Role of arXiv Spot on Stevan. It is the work of a day or two to write a harvester for OAI-PMH from scratch (I know, I did it), although it is now pretty standard libraries. I know others who have done the same. I also wanted to translate into RDF, which added some effort. It is then a case of letting it run and funding the maintenance and service. We have not bothered much to keep it up to date, but we use the metadata all the time for our applications, and it is not significant as a delta with all the other metadata. The biggest cost is repository software that does not conform to the accepted view of OAI-PMH. Hopefully this will improve as more people harvest. To be concrete, we harvested over 1000 repositories, automatically finding the details from the roar site, which seems to have resulted in 15G of data, and then translated into about 24M triples and 21G of RDF. 20 times that, to use Stevan's lowest estimate, would be less than 1Tbyte, which is not really a lot of cost - right now I could serve that and probably run the whole system with harvesting for around $100/year on my ISP. So after the initial costs (a month or two to do a great job?), it is a day a month plus $100. The crucial thing here is, as Stevan says, that we are only talking metadata. The idea of the web is to avoid copying stuff, with attendant storage costs and synchronisation problems, and so the texts should be left where they lie. Best Hugh Glaser
- Prev by Date: Re: Role of arXiv
- Next by Date: EUP Announces New Archive
- Previous by thread: Re: Role of arXiv
- Next by thread: RE: Role of arXiv
- Index(es):