[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Security Issues (was JSTOR) Pt. 2.
- To: liblicense-l@lists.yale.edu
- Subject: Re: Security Issues (was JSTOR) Pt. 2.
- From: "James A. Robinson" <jim.robinson@stanford.edu>
- Date: Fri, 13 Dec 2002 09:32:44 EST
- Reply-To: liblicense-l@lists.yale.edu
- Sender: owner-liblicense-l@lists.yale.edu
> The oirginal question is this: if we know that IP authentication has this > particular problem, then when should we continue to use it, and when/how > should we be either abandoning it or trying to improve it? I think a key point is that IP access is not authentication. It used to be that you could use it in place of authentication because it was hard to get around telling a computer your real ip address. I think it's still mostly this way. The two problems I've seen come from open proxy servers and from renegade networks. What I mean by the former is proxy sites which do not do any authentication of their own, what I mean by the latter is sites where the people in charge of the network are acting in collusion with an abuser. You can get around the first problem by having a proper authentication system set up at the proxy end. As I wrote in a previous note, it can be configured to be a minimum hassle for legitmate users, with just enough authentication to deter abusers. I don't think there is a solution for the second problem, other than simply blocking them and perhaps sueing them if they are in a country which has copyright laws. My personal opinion is that IP based access is a good thing, and that librarians who maintain proxys simply need to be sure they know exactly how their software works. They need to have expertise in maintaining it, or have a responsive computer and networking staff which maintains it. If that doesn't happen, if rare abuse like this JSTOR incident becomes common (or blown out of proportion), I'm worried that publishers might react in an extreme manner w/re to access policy, and start a downward spiral. On the topic of people who might think "we don't care about *that* kind of gap in security," I wonder if perhaps people are thinking about all the costs that go into this kind of service? I saw something in the archive from awhile ago, where someone asked a question along the line of "why couldn't a patron walk in and photocopy every article out of an issue?" I wasn't sure how many response there were, but the only ones I saw were those dealing with copyright. Please keep in mind that the publisher might have more than content rights in mind when it comes to abuse. For example, many places have to pay fees on the amount of network usage their sites make use of. For example, using some numbers I just pulled off google, a publisher might pay a service provider to host a site for $250.00 a month, granting up to 300 megabytes of traffic per month. In excess of 300 megabytes, a charge of 50 cents per megabyte might be applied. To put that in perspective when it comes to spidering abuse, if for some reason HighWire allowed someone to spider all the content hosted by us, we would be talking about a terrabyte of data, at a minimum. I believe we are already the largest user of Stanford's networking resources (either that, or number two). And we're tiny compared to some places. Ok, I'm going to get all long winded at this point. Those of you who don't want to see me getting *really* preachy might want to hit the delete button now. Clearly both sides want the same end result: Legitimate users with unrestricted access to the material they are paying for. The publisher needs readers, the readers obviously need the content. As someone who works for an aggregator I often feel stuck in the middle. I hear publishers who worry about their content being stolen, and also hear enormous frustration from librarians who are sometimes confronted with access policies which restrict IP access to only a handful of well monitored computers. At my place of work, we see some rather bad abuse coming from sites. We see essentially what JSTOR wrote about: a computer within a network which has access to content would spider one or more of our hosted sites, in their entirety, often trying to download multiple resources at the same time. I ended up having to implement software which ends up frustrating users, but saves us from the extremes of abuse. What our sites have now is a simple counter which watches how many requests a single ip makes within any given minute. If that threshold gets exceeded, we cut off access for a period of time. We're always trying to fine tune the program to block abuse and allow legitimate use. It's not easy, because the nature of HTTP does not lend itself to identifying unique clients making a request. Many times it is the very purpose of a proxy server to strip any such identifying characteristics out of a request before it reaches us. So, while this program saves us from massive abuse which can bring an entire machine down, it is extremely annoying to some sites with proxy servers. It does not help that some companies have forever tainted the use of cookies in the eyes of many internet users. Doing so means we often cannot use a simple, non-personally-identifying, method to track whether multiple requests from a single ip are emanating from a single user or from multiple users behind a proxy. For example, there are some enormous networks out there which route all HTTP access to a handful of proxy servers. The administrators of those sites express their frustration when our software kicks in, blocking 1/3 of their user base (i.e., we block one of three proxy servers which handles thousands of users). We tell them of some options which could reduce the problem, and they essentially tell us they are unwilling to change anything. They have privacy concerns which supersed any issues we are trying to resolve w/re to abuse. It's a deadlock. We're not willing to let performance for our other users come to a crawl, and they are not willing to budge on privacy concerns. Another source of frustration comes from a lack of response. Sometimes we will notice a pattern of abuse from a set of ip addresses. We will contact that site's administrators, and recieve no reply. Nothing even indicating the message was recieved. At that point, we have to block the site by default. In other words, wait for somene to contact us. The librarians naturally see the problem from the opposite end. If their site isn't a source of problems, all they might end up seeing is a publisher who decides "all of a sudden" that a site license now means paying X thousand dollars to license three computers at a library, or limiting access to ten users at any one time, and so forth. So the problem is complicated, and I think there is a very large social compontent to the issue. I've only worked at two places in my life, both were college environments. In both places the library had a lot of clout when it came to computer and networking issues. They were able to talk to networking and security about issues. If that's the case in most places (and I have no idea whether or not it is), the solution might simpy be better communication. If a publisher looks at access logs and sees that an instition's usage has gone up by a factor of three hundred, perhaps the site administrator and publisher could work together to determine if this was abuse of access? I guess my long winded point is that it's better if both sides work together. A solution implemented by only the publisher's input is bound to leave the library unhappy. An unhappy library might not renew subscriptions, which is bound to leave the publisher unhappy. And hey, that might end up putting the aggregator out of business, and we wouldn't want that now, right? Right? Please? Jim - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - James A. Robinson jim.robinson@stanford.edu Stanford University HighWire Press http://highwire.stanford.edu/ 650-723-7294 (W) 650-725-9335 (F)
- Prev by Date: Perhaps this comment from Tim O'Reilly will open some eyes.
- Next by Date: Re: Security Issues (was JSTOR) Pt. 2.
- Prev by thread: Security Issues (was JSTOR) Pt. 2.
- Next by thread: Re: Security Issues (was JSTOR) Pt. 2.
- Index(es):