[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Caching and IP based authentication

The NBER is (among other activities) the operator of a subscription based
web site of economics working papers. This is a bit of boilerplate I have
written to handle a common query from English universities. Any
suggestions are welcome. As I write this, "Via" headers (described
below) seem to be the solution, though.


Why can't I access the full text of working papers for the NBER at
http://www.nber.org/wwp.html? My university is a subscriber. 


The problem is that traffic from English universities now goes through a
cache engine in the domain wwwcache.ja.net. Since this computer covers all
English universities, I can't put it in our authorization database.  We
never get a query directly from your computer or your university cache -
every query comes from the Janet engine.

One thing that won't work is for us to mark our pages non-cacheable. While
JANET will avoid caching pages so marked, the JANET cache is still seen
as the requesting host by our web server.

If you have any contacts at the network services office of your
university, you could ask them to modify the ACL on their cache engine so
as not pass on requests for http://nberws.nber.org to the off-campus cache
engine. Looking through our logs, it is clear that most universities in
England have taken this step. It is fine for them to cache those pages
themselves, of course. Note that the address given in nberws.nber.org, not
www.nber.org. Of course, we would like to see a general solution that
would not require ACL modifications at every customer site.

One proposed solution, which works for some users, is to browse a special
web server we have set up on port 81 of our web server. The user would
then browse:


(where 0000 is the paper number) to actually down load the paper. I
understand that some sites pass port 81 to the off-campus cache,
defeating this strategy. If any significant usage were to develop on this
port, we would modify our server to automatically recognize requests from
Janet, and automatically specify port 81 in HTML returned to those sites. 
This would make the procedure completely user-transparent. However at this
time there are no regular users of this server.

An alternative that I have suggested in the past is for JANET to enable
the 'X-Forwarded-for:' header for requests.  This would give us the IP
address of the original requester, and we could authorize or not according
to our database.  Once we so authorized, Janet could service the request
from the cache database minimizing overseas line charges. However, Janet
has made a policy decision not to provide that header, to enhance privacy.

A second proposal (made by Martin Hamilton of JANET), is for us to
recognize the http "Via" header, which records the chain of caches through
which any request has come. Since we expect that most universities
maintain at least one on-campus cache that is consulted before queries are
passed on to the off-campus (JANET) cache, this should work, and should
provide access for the entire campus.  As I write this we have about 72
hours of experience with "Via" headers, and I can only see one request
with a chain leading back to an English university. Your network services
staff would be the ones to investigate for your university where in the
chain the list of caches is being deleted. (It is supposed to be deleted
only where a request passes through a firewall). It is not something we
can determine or fix from this end. 

I am aware that HTTP headers are easily forged, but that is not a serious
concern to us at this time. We are more interested in encouraging
widespread use of the web site.

Daniel Feenberg
National Bureau of Ecnomic Research