Scalability and planning for growth

Scalability and planning for
growth
WUCM1
1
Possible server organisation
Users
Production
“Live” Server
Live site feeds
content to users
Synchronise content
Site Developers
Once finished, new
features rolled out
to staging server
Development
Significant structural and
Server
technology changes explored
on this private server
Staging server holds site
changes that are later
synchronised with the
public site
Staging
Server
Content Contributers
WUCM1
4
Apache configuration issues 1
• Apache directives with performance implications:
– KeepAlive number
• Keeps the connection open for maximum this number of accesses – avoids hogging
– KeepAliveTimeout seconds
• Max time to wait for next request
– MaxKeepAliveRequests number
• Max number to keep open at one time
– HostNameLookups [on|off|double]
• ‘on’ put hostname in log instead of IP address
– MaxClients number
• Limits number of requests handled at once by server
– MaxRequestsPerChild number
• each child process of Apache handles this many requests and dies (to tidy up memory
leaks)
– ThreadsPerChild number
• only relevant Win32. Default 50, may need increase for many simultaneous hits.
(Microsoft issue..)
WUCM1
5
Apache configuration issues 2
• Other Apache directives:
– UseCanonicalName on/off/dns
• Relates to DNS names
– FollowSymLinks
• an Option, can cause Apache to waste time checking
through file structure - security risk
– Logging of all kinds slows Apache down
– .htaccess files add overhead (read on each request)
– Large configuration files also slow Apache, so thinning
here is a good idea
WUCM1
6
General server configuration issues
• CGI programs influence the performance of
the website:
– Consider FastCGI or mod_perl to speed matters
– Writing efficient code is always important
• Other tricks
– Force popular files to be memory resident
• Operating system may do that for you
– Force secure transfers to have more bandwidth
WUCM1
7
Proxy server performance issues
• An Apache proxy can:
– Cache for speed
– Filter for security or decency
• Apache's proxy functionality is encapsulated in
mod_proxy
• In order to use mod_proxy, use the directive
– ProxyRequests on|off
WUCM1
8
Proxy customisation
• To block particular sites from your clients:
ProxyBlock www.badsite.com baddomain.co.uk badword
• This will block the specific site, domain or any
URL with names that contain ‘badword’
WUCM1
9
Hiding servers with a proxy
• Suppose there are two extra servers, parallel to
the www.tech.port.ac.uk server
• Add the ProxyPass directive to the main
www.tech.port.ac.uk server configuration file
ProxyPass /users/ http://users.tech.port.ac.uk/
ProxyPass /secure/ http://secure.tech.port.ac.uk/
• This makes users.tech.port.ac.uk and
secure.tech.port.ac.uk appears as directories on
the main server, e.g. www.tech.port.ac.uk/users/
WUCM1
10
Still not enough performance?
• Two further possibilities to boost
performance:
– Replace the server hardware with a more
powerful machine
– Add more servers and distribute the load of client
requests amongst them
WUCM1
11
Benefits of multiple servers
• Server machines can be cheaper and easily
replaceable
• Individual servers can fall over without the
website becoming unavailable
• Increase capacity by adding another server
and synchronising the data
• No need to alter or reconfigure any of the
existing servers
WUCM1
12
Clustering 1
• Cannot just add an extra servers
– Each would need different IP addresses
• Set of servers needs to be established as a cluster
so that:
– For external clients it should appear as one big fast
server with one domain name
– Clients should not be aware that the load is being
shared by a cluster of servers
– Content on the multiple servers must be synchronised
WUCM1
13
Clustering 2
• Two basic ways of approaching clustering:
1. DNS load sharing
2. Web server clustering
WUCM1
14
DNS load sharing
• Most common approach is Round-Robin DNS
distribution
• It works by specifying multiple IP addresses
for the same host name (using a BIND syntax)
www.tech.port.ac.uk 60 IN A 148.197.203.1
www.tech.port.ac.uk 60 IN A 148.197.203.2
www.tech.port.ac.uk 60 IN A 148.197.203.3
WUCM1
15
DNS load sharing
[Source: O’Reilly Books]
WUCM1
16
Round-Robin DNS sharing 1
• Each DNS request for www.tech.port.ac.uk
returns the next IP in sequence
• Set a short time-to-live (TTL) – the 60 seconds
• A lower TTL would
– Improve web server load sharing
– But increase the load on DNS server
• Attraction of round-robin DNS is its simplicity
WUCM1
17
Round-Robin DNS sharing 2
• Not true load balancing, only load sharing
• The round-robin takes no account of:
– which servers are loaded
– which are free
– which are actually up and running
• Round-robin DNS makes keeping state for a
user more difficult
– A user may get a different server from last time
WUCM1
18
Hardware load balancing
• Needs a specialist piece of software to redirect
requests
• For example:
– LocalDirector and DistributedDirector were
products from Cisco (http://www.cisco.com).
– These will rewrite IP headers to redirect a
connection to a local server
WUCM1
19
Clustering with Apache 1
• Apache provides way to cluster servers using
the features of mod_rewrite and mod_proxy
together
• This avoids the DNS caching problems and the
cost of hardware solutions
• Need a machine as a proxy server, handling
requests to several back-end servers on which
the website is actually loaded
WUCM1
20
Clustering with Apache 2
• E.g. the proxy takes the master name
www.tech.port.ac.uk and the backend servers
might be www1 to www6
• Wainwright (1999) sets out a method of
setting up Apache using two parts:
– Use mod_rewrite to randomly select a back-end
server for the client request
– Use mod_proxy’s ProxyPassReverse directive to
disguise the URL of the back-end server
WUCM1
21
Summary
• Configuration issues for scalability and
performance
• Proxy Servers – filter and cache
• DNS (round robin) clustering
• Hardware clustering
• Proxy based clustering
WUCM1
22