[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: opening arXiv back-end was Re: Role of arXiv

I too think that in the unlikely event that Cornell did not 
continue to operate arXiv, it would be taken up by others in the 

arXiv has long supported a set of mirror machines distributed 
over the world and synchronized on a daily basis. We have 14 at 
present which is probably rather more than we need for this 
purpose but some time ago the mirrors also provided faster local 
access (the evolution of the internet topology means that almost 
everyone has fast access to the US so this does not serve much 
purpose now). Some mirrors (e.g. the LANL mirror) are backed up 
locally providing an additional safeguard separate from Cornell's 

We also support a couple of other independently operated services 
(the Front, and eprintweb.org) over the complete set of arXiv 
data using our mirroring mechanism. These are fed with daily 
updates too.

Since the early days of arXiv we have provided bulk data access 
to researchers by agreement but have felt that we cannot fund 
support for uncontrolled bulk download (server support and 
network fees for hundreds of people downloading 200GB 
periodically would not be insignificant). Technology and internet 
business models have come to the rescue in the form of the cloud. 
We recently started putting up the complete collection of arXiv 
PDFs (the most frequently requested part of the dataset) on 
Amazon S3. This seems to be well received so we will likely 
extend to include source files also. For details see 
http://arxiv.org/help/bulk_data_s3 . We were also involved in the 
creation of the OAI-PMH standard and have supported metadata 
harvesting with daily updates since before the release of the 
standard (see http://arxiv.org/help/oa).


Alexandre Dulaunoy wrote:
> On Fri, Oct 8, 2010 at 4:42 AM, Joseph Esposito
> <espositoj@gmail.com> wrote:
>> If this funding were to disappear ..., would arXiv be 
>> resurrected by the community?
> This is an excellent question. I tried to find the source code 
> for the arXiv back-end software without success. and also a 
> weekly data dump of the whole repository.
> A good opportunity for arXiv to ensure perenity would be to 
> provide the tools and the dataset to help the community to 
> build other arXiv or resurrect one if required.
> Maybe arXiv is working on something similar?
> Alexandre Dulaunoy (adulau)