Revitalizing Older Published Literature: Preliminary Lessons from

Open Proxy Servers
Kevin Guthrie
ALA, January 2003
Outline
•
•
•
•
•
•
Background: what are “open proxies”?
What’s the exposure?
What happened?
How was it done?
Not an isolated case
What to do
JSTOR – January 2003
2
What has been taken:
51,392 Articles from 11 Titles
# of articles
Pct. of Run
Sociology Journal 1
4,997
95%
Sociology Journal 2
11,340
87%
Economics Journal 3
5,514
77%
Sociology Journal 4
349
73%
Economics Journal 2
402
71%
Sociology Journal 5
14,537
65%
Economics Journal 3
3,619
55%
Statistics Journal 1
6,555
44%
Economics Journal 4
120
3%
Sociology Journal 6
3,728
23%
Economics Journal 4
231
<1%
JSTOR – January 2003
3
Proxy Servers
A proxy server is a web server that acts as an
intermediary or relay station between a
workstation user and the Internet.
JSTOR – January 2003
4
proxy.inst.edu
IP: 2.3.4.5
User IP: 1.2.3.4
www.jstor.org
Proxy Servers
Common Reasons for Their Use
•
•
•
•
•
Caching
Remote access
Usage tracking
Controlled access
Approved filtering
JSTOR – January 2003
6
What is an “open” proxy server?
• There is a configuration process to specify
who is authorized to access the server. It is
similar to the configuration process for any
web server
• When a proxy server is not set up with the
appropriate access controls, anyone can
access that machine and “assume its
identity”
JSTOR – January 2003
7
“Open” Proxy Servers:
How and Why are they Created
• Some are organizational or departmental
proxy servers incorrectly configured.
• Some are set up intentionally to provide
access to restricted resources (probably for
convenience).
• We believe many are set up accidentally as
an unknown by-product of setting up a web
server.
JSTOR – January 2003
8
What’s the Exposure?
Search For
Lists of
Open Proxy
Servers
Find Lists
of Open
Proxy
Servers
Lists of
Open Proxy
Servers by
Domain
Type
A List of
Open .edu
Proxies
[The server hostnames have been edited to protect the
institutions with open proxy servers listed on this page.]
What Happened and How it was
Discovered
JSTOR Monitors Use
• We have triggers to alert us to unusual
levels of usage activity
• We investigate when usage seems unusual
JSTOR – January 2003
15
The Abuse
What Happened
August 22nd to the 27th -- 13413 articles are
downloaded from Proxy #1.
August 27th we deny this IP access to JSTOR.
------------------------------------------------------------August 26th to September 4th -- 3859 articles are
downloaded from Proxy #2 at a different
participating site.
September 4th we deny the IP address of this second
proxy.
JSTOR – January 2003
16
The Abuse
What Happened
•
It appeared the two abuse situations were
related:
1. There was an overlap in journals downloaded,
but not an overlap in articles downloaded.
2. Analysis of our log files showed that the
URLs being downloaded via Proxy #2 were
created through use at Proxy #1.
JSTOR – January 2003
17
The Abuse
The Pattern Continues
• Between August 27th and October 31st
downloads occurred from:
– 27 open proxy servers at
– 16 different sites
• As JSTOR staff denied each proxy server,
the abuse moved on.
~51,000 articles downloaded from 11 journals
JSTOR – January 2003
18
How Is It Done?
Automate The Process
• Download lists of open proxies
• Automate a process to probe each to see if
there is access to restricted resources
• Identify a set of open proxy servers with
such access and set them aside
• Automate a process to download content
• From the “confirmed” list – commence
downloading.
JSTOR – January 2003
20
Not an Isolated Case
We have found web pages providing explicit
instructions for others to help them exploit
open proxies in order to download content.
JSTOR – January 2003
21
Not an Isolated Case
Not an Isolated Case - Translations
– “The Bible for Downloading Journal Articles”
– “To be blunt about it, you find an overseas proxy. The
institution that the proxy server belongs to has spent
money to buy the electronic edition of some journal,
and then you use this proxy, (so) of course you can
download the entire text of that journal!”
– “I cannot deny that some servers can download
complete texts from many journals, but please,
everyone, let’s not grab onto the ones which are easy to
use and use them madly. The result of doing so will be
to hasten the death of that server! So when you are
using them, it’s best to do so equitably!”
JSTOR – January 2003
23
Not an Isolated Case
Questions & Discussion
What to do?
• Shibboleth
http://shibboleth.internet2.edu/
• DLF Certificates
http://www.diglib.org/architectures/digcert.htm
• Education
• Drive all campus access through a set of
properly authenticated proxy servers
JSTOR – January 2003
26
http://www.jstor.org/