Lately, we have been involved in a project where our clients needed a site capable of serving a large number of anonymous users and a reasonable number of concurrently logged in users. In order to reach these goals, we looked to the cloud. We first got as much caching as possible, since this is relatively simple and goes a long way. We next created a distributed system. This blogs describes how we got it to work. A diagram of our architecture is attached, and the various configurations are summarized at the bottom.
First, the anonymous user caching. Anonymous users all view the same content, so if we cache a static html page, we can serve this page without involving php at all. We are using boost to provide these static pages. And then we have nginx serving these cached pages and acting proxying other requests to Apache. Since nginx can scale without much of a memory hit, it is much better to use nginx to serve large amounts of static files and let apache handle the logged in users and new page requests. Now, for anonymous users, the bottleneck suddenly becomes the network, and on a localhost test, ab records well over 10 thousand hits per second being served by a 2gb rackspace instance.
On to logged in caching. We use APC as an opcode cache. This saves the server from recompiling the php code on every page load. Moreover, the whole thing fits easily in RAM (we typically give APC 128M of ram). This drastically decreases the CPU usage. Logged in users can now browse the site much faster. But we can still only handle a limited number of them. We can do a bit better. Instead of querying MySQL every time we go to the cache, we can store these tables in memory. Here come memcached and the cacherouter module.
Now, if you've looked at the nginx conf bellow, you might have noticed that it is also acting as a load balancer. We have Drupal on multiple nodes. The first step in achieving this was putting MySQL on a different node (this does require hardening it up) and having apache live on different machine. However, in order to make sure that user uploaded files and "boosted" cache files are available on all apache servers, we use glusterfs to replicate files accross all machines. We also use glusterfs to replicate the code base so that changes can be made quickly, although we rsync it to the file system since it slows down file operations. The PHP code is not being run from glusterfs.
Putting it all together: the architecture. You can find the attached diagram with the architecture. We are deploying all our servers on rackspace hosting, starting with an Ubuntu Karmic image. There are three types of nodes: load balancers and static file servers which we'll refer to as nginx nodes, server nodes with apache which we'll refer to as apache nodes, and the database node(s) which we'll refer to as mysql nodes.
The nginx nodes have nginx, memcached and glusterfs installed. They serve static files from a shared folder on a glusterfs mount. Any request which is not cached and is not found in the static files will be proxied to the pool of apache nodes. The memcached deamon is part of a pool in which the apache nodes also participate, and which is used by cacherouter to distribute mysql cached queries and the cache tables. The nginx nodes can be replicated for high availability, since the files they are serving are replicated in real time via glusterfs.
The apache nodes have apache with mod_php and php 5.2 installed, as well as glusterfs, apc and memcached. We can spin up new instances quickly and add them to the pool, as once glusterfs is mounted, it will quickly sync up the files from the other nodes as necessary, and be available to receive it's share of requests. All the Drupal nodes talk to the MySQL node for the database. The MySQL node can also be replicated for high availability.
Deploying rapidly: what is the point of having a distributed architecture in the cloud if we cannot scale quickly? We use puppet to quickly configure a node which has been spun up to the nginx or apache pools.
Wrapping it up: we should be able to follow up soon with a post on performance. Testing we have done so far indicates that the system does scale up quite well. We have also compared rackspace hosting to ec2, and the numbers show that rackspace is much faster for drupal, mostly due to the network latency. We will soon have numbers and graphs to show it all.
Configuring apc: we set the memory size to 128M with a single bin.
Configuring cacherouter: version: 6.x.1.x-dev (vs 6.x.1.0-rc1)
* The dev version had some bug fixes for the memcached engine at the time we installed it
Append following to your Drupal's settings.php
# Cacherouter
$conf['cache_inc'] = './sites/all/modules/cacherouter/cacherouter.inc';
$conf['cacherouter'] = array(
'default' => array(
'engine' => 'memcached',
'servers' => array(
'web01',
'web02',
'web03',
),
'shared' => TRUE,
'prefix' => '',
'path' => '',
'static' => FALSE,
'fast_cache' => FALSE,
),
);
Configuring boost: most of boost's default settings are fine. We turned on gzip and enabled css and js caching. We also ignore the htaccess rules, since we use nginx to serve the html files.
Configuring nginx (version 7.62):
in nginx.conf in the "http" section:
upstream apaches {
#ip_hash;
server web01;
server web02;
server web03;
}
in the host conf, in the "server" section:
server {
listen 80;
proxy_set_header Host $http_host;
gzip on;
gzip_static on;
gzip_proxied any;
gzip_types text/plain text/html text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;
set $myroot /var/www;
#charset koi8-r;
# deny access to files beginning with a dot (.htaccess, .git, ...)
location ~ ^\. {
deny all;
}
location ~ \.(engine|inc|info|install|module|profile|po|sh|.*sql|theme|tpl(\.php)?|xtmpl)$|^(code-style\.pl|Entries.*|Repository|Root|Tag|Template)$ {
deny all;
}
set $boost "";
set $boost_query "_";
if ( $request_method = GET ) {
set $boost G;
}
if ($http_cookie !~ "DRUPAL_UID") {
set $boost "${boost}D";
}
if ($query_string = "") {
set $boost "${boost}Q";
}
if ( -f $myroot/cache/normal/$http_host$request_uri$boost_query$query_string.html ) {
set $boost "${boost}F";
}
if ($boost = GDQF){
rewrite ^.*$ /cache/normal/$http_host/$request_uri$boost_query$query_string.html break;
}
if ( -f $myroot/cache/perm/$http_host$request_uri$boost_query$query_string.css ) {
set $boost "${boost}F";
}
if ($boost = GDQF){
rewrite ^.*$ /cache/perm/$http_host/$request_uri$boost_query$query_string.css break;
}
if ( -f $myroot/cache/perm/$http_host$request_uri$boost_query$query_string.js ) {
set $boost "${boost}F";
}
if ($boost = GDQF){
rewrite ^.*$ /cache/perm/$http_host/$request_uri$boost_query$query_string.js break;
}
location ~* \.(txt|jpg|jpeg|css|js|gif|png|bmp|flv|pdf|ps|doc|mp3|wmv|wma|wav|ogg|mpg|mpeg|mpg4|htm|zip|bz2|rar|xls|docx|avi|djvu|mp4|rtf|ico)$ {
root $myroot;
expires max;
add_header Vary Accept-Encoding;
if (-f $request_filename) {
break;
}
if (!-f $request_filename) {
proxy_pass "http://apaches";
break;
}
}
location ~* \.(html(.gz)?|xml)$ {
add_header Cache-Control no-cache,no-store,must-validate;
root $myroot;
if (-f $request_filename) {
break;
}
if (!-f $request_filename) {
proxy_pass "http://apaches";
break;
}
}
location / {
access_log /var/log/nginx/localhost.proxy.log proxy;
proxy_pass "http://apaches";
}
}
Configuring glusterfs: (version 3.0.3)
There are two files. glusterfsd holds the local "brick". glusterfs holds the info on how to mount and use the bricks.
glusterfsd.vol
# Generated by Puppet
volume posix
type storage/posix
option directory ####
end-volume
volume locks
type features/locks
option mandatory-locks on
subvolumes posix
end-volume
volume iothreads
type performance/io-threads
option thread-count 16
subvolumes locks
end-volume
volume server-tcp
type protocol/server
subvolumes iothreads
option transport-type tcp
option auth.login.iothreads.allow ####
option auth.login.####.password ####
option transport.socket.listen-port 6996
option transport.socket.nodelay on
end-volume
glusterfs.vol
# Generated by Puppet
volume vol-0
type protocol/client
option transport-type tcp
option remote-host ####
option transport.socket.nodelay on
option remote-port 6996
option remote-subvolume iothreads
option username ####
option password ####
end-volume
... # 1 per apache node + 1 per nginx node
volume vol-3
type protocol/client
option transport-type tcp
option remote-host ####
option transport.socket.nodelay on
option remote-port 6996
option remote-subvolume iothreads
option username ####
option password ####
end-volume
volume mirror-0
type cluster/replicate
subvolumes vol-0 vol-1 vol-2 vol-3
option read-subvolume vol-0
end-volume
volume writebehind
type performance/write-behind
option cache-size 4MB
# option flush-behind on # olecam: increasing the performance of handling lots of small files
subvolumes mirror-0
end-volume
volume iothreads
type performance/io-threads
option thread-count 16 # default is 16
subvolumes writebehind
end-volume
volume iocache
type performance/io-cache
option cache-size 412MB
option cache-timeout 30
subvolumes iothreads
end-volume
volume statprefetch
type performance/stat-prefetch
subvolumes iocache
end-volume