[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

internet archive (WAS: The Economist and e-Archiving)



I went to Brewster Kahle's talk at ALA/CLA on monday and the archiving on
the Wayback Machine was discussed. An outstanding talk:  "universal access
to all human knowledge"

The harvesting system at Internet Archive honors robot exclusions
*retroactively*. In other words, if you put a robots.txt file on your
server that excludes indexing of any files with path starting with
"/content/", then they will remove from the archive any files from your
server with path starting with "/content/".

Eric

At 6:37 PM -0400 6/24/03, informania@supanet.com wrote:
Regarding the Wayback Machine, yes, I have seen that they offer to desist
from archiving if someone shouts loudly enough. I wonder if thy have the
technical capability to do so, however. The technology used is a
Heath-Robinson-ish chain of linked second-hand PCs, I believe, and
terabyte harvesting is (naturally) completely automated. I rather doubt
that they would pull the plug to extract a sliver of dubious Economist
text from the accumulating body of Internet history - remember that they
archive day by day, so the sliver would be repeated in successive
archivings until it was pulled.

As to Finland's cache (and there are a number of other national caches in
operation), this has staggering copyright implications, of course, but the
defence is that this is being provided for reasons of technical efficiency
(I have heard this argued at the World Intellectual Property
Organization). Again, nobody is going to delve into these caches, and I am
not even sure if they are being archived - and if so, for how long.

Not to mention the collection, sampling and storage of Internet materials
by such security efforts as Echelon.

In general (and this was the point of my intervention), I suspect that
everything that has ventured onto the Internet will continue to be
available somewhere, and for quite a long time, like it or not.

Chris Zielinski