Caching With Apache's mod_cache On Debian Lenny

Version 1.0
Author: Falko Timme
Follow me on Twitter

This article explains how you can cache your web site contents with Apache's mod_cache on Debian Lenny. If you have a high-traffic dynamic web site that generates lots of database queries on each request, you can decrease the server load dramatically by caching your content for a few minutes or more (that depends on how often you update your content).

I do not issue any guarantee that this will work for you!

 

1 Preliminary Note

I'm assuming that you have a working Apache2 setup (Apache 2.2.x - prior to that version, mod_cache is considered experimental) from the Debian repositories - the Apache version in the Debian Lenny repositories is 2.2.9 so you should be good to go.

I'm using the document root /var/www here for my test vhost - you must adjust this if your document root differs.

 

2 Enabling mod_cache

mod_cache has two submodules that manage the cache storage, mod_disk_cache (for storing contents on the hard drive) and mod_mem_cache (for storing contents in memory which is faster than disk caching). Decide which one you want to use and continue either with chapter 2.1 (mod_disk_cache) or 2.2 (mod_mem_cache).

 

2.1 mod_disk_cache

The mod_disk_cache configuration is stored in /etc/apache2/mods-available/disk_cache.conf, so let's edit that one:

vi /etc/apache2/mods-available/disk_cache.conf

Make sure you uncomment the CacheEnable disk / line, so that the minimal configuration looks as follows:

<IfModule mod_disk_cache.c>
# cache cleaning is done by htcacheclean, which can be configured in
# /etc/default/apache2
#
# For further information, see the comments in that file,
# /usr/share/doc/apache2.2-common/README.Debian, and the htcacheclean(8)
# man page.

        # This path must be the same as the one in /etc/default/apache2
        CacheRoot /var/cache/apache2/mod_disk_cache

        # This will also cache local documents. It usually makes more sense to
        # put this into the configuration for just one virtual host.

        CacheEnable disk /

        CacheDirLevels 5
        CacheDirLength 3
</IfModule>

You can find explanations for these configuration options and further configuration options on http://httpd.apache.org/docs/2.2/mod/mod_disk_cache.html.

Now we can enable mod_cache and mod_disk_cache:

a2enmod cache
a2enmod disk_cache

/etc/init.d/apache2 restart 

To make sure that our cache directory /var/cache/apache2/mod_disk_cache doesn't fill up over time, we have to clean it with the htcacheclean command. That command is part of the apache2-utils package which we install as follows:

aptitude install apache2-utils 

Afterwards, we can start htcacheclean as a daemon like this:

htcacheclean -d30 -n -t -p /var/cache/apache2/mod_disk_cache -l 100M -i

This will clean our cache directory every 30 minutes and make sure that it will not get bigger than 100MB. To learn more about htcacheclean, take a look at

man htcacheclean

Of course, you don't want to start htcacheclean manually each time you reboot the server - therefore we edit /etc/rc.local...

vi /etc/rc.local

... and add the following line to it, right before the exit 0 line:

[...]
/usr/sbin/htcacheclean -d30 -n -t -p /var/cache/apache2/mod_disk_cache -l 100M -i
[...]

This will start htcacheclean automatically each time you start the server.

 

2.2 mod_mem_cache

The mod_mem_cache configuration is located in /etc/apache2/mods-available/mem_cache.conf:

vi /etc/apache2/mods-available/mem_cache.conf   
<IfModule mod_mem_cache.c>
        CacheEnable mem /
        MCacheSize 4096
        MCacheMaxObjectCount 100
        MCacheMinObjectSize 1
        MCacheMaxObjectSize 2048
</IfModule>

This is the default configuration - if you like you can modify it. A list of configuration directives for mod_mem_cache is available here: http://httpd.apache.org/docs/2.2/mod/mod_mem_cache.html

Now let's enable mod_cache and mod_mem_cache as follows:

a2enmod cache
a2enmod mem_cache

/etc/init.d/apache2 restart

That's it already! With mod_mem_cache, you don't have to clean up any cache directories.

 

3 Testing

Unfortunately mod_cache doesn't provide any logging functionalities which is bad if you want to know if logging is working. Therefore I create a small PHP test file, /var/www/cachetest.php, that sends out HTTP headers that tell mod_cache that it should cache the file for 300 seconds, and that simply prints the timestamp:

vi /var/www/cachetest.php
<?php
header("Cache-Control: must-revalidate, max-age=300");
header("Vary: Accept-Encoding");
echo time()."<br>";
?>

Now call that file in a browser - it should display the current time stamp. Then click in the browser's address bar and press ENTER so that the page gets loaded again (don't press F5 or the reload button - this will always fetch a fresh copy from the server instead of the cache!) - if all goes well, you should still see the old, cached timestamp. If you wait 300 seconds, you should get a fresh copy from the server instead of the cache.

 

4 HTTP Headers

Caching doesn't work out-of-the-box - you must modify your web application so that caching can work (it is possible that your web application already supports caching - please consult the documentation of your application to find out). mod_cache will cache web pages only if the HTTP headers sent out by your web application tell it to do so.

Here are some examples of headers that tell mod_cache not to cache:

  • Expires headers with a date in the past: "Expires: Sun, 19 Nov 1978 05:00:00 GMT"
  • Certain Cache-Control headers: "Cache-Control: no-store, no-cache, must-revalidate" or "Cache-Control: must-revalidate, max-age=0"
  • Set-Cookie headers: a page will not be cached if a cookie is set.

So if you want mod_cache to cache your pages, modify your application to not send out such headers.

If you want mod_cache to cache your pages, you can set an Expires header with a date in the future, but the recommended way is to use max-age:

"Cache-Control: must-revalidate, max-age=300"

This tells mod_cache to cache the page for 300 seconds (max-age) - unfortunately mod_cache doesn't know the s-maxage option (see http://www.mnot.net/cache_docs/#CACHE-CONTROL), that's why we must use the max-age option (which also tells your browser to cache - please keep this in mind if you get unexpected results!). If mod_cache knew the s-maxage option, we could use "Cache-Control: must-revalidate, max-age=0, s-maxage=300" which would tell mod_cache, but not the browser, to cache the page.

Of course, this header is useless if you send out one of the non-caching headers (Expires in the past, Set-Cookie, etc.) from above at the same time!

Another very important header for caching is this one:

"Vary: Accept-Encoding"

This makes mod_cache keep two copies of each cached page, one compressed (gzip) and one uncompressed so that it can deliver the right version depending on the capabilities of the user-agent/browser. Some user-agents don't understand gzip compression, so they should get the uncompressed version.

So here's the summary: use the following two headers if you want mod_cache to cache:

"Cache-Control: must-revalidate, max-age=300"
"Vary: Accept-Encoding"

and make sure that no Expires with a date in the past, cookies, etc. are sent.

If your application is written in PHP, you can use PHP's header() function to send out HTTP headers, e.g. like this:

header("Cache-Control: must-revalidate, max-age=300");
header("Vary: Accept-Encoding");

This page is a must-read if you want to learn more about HTTP headers and caching: http://www.mnot.net/cache_docs/

 

Share this page:

1 Comment(s)