How To Set Up A Caching Reverse Proxy With Squid 2.6 On Debian Etch
Version 1.0
Author: Falko Timme
This article explains how you can set up a caching reverse proxy with Squid 2.6 in front of your web server on Debian Etch. If you have a high-traffic dynamic web site that generates lots of database queries on each request, you can decrease the server load dramatically by caching your content for a few minutes or more (that depends on how often you update your content).
I do not issue any guarantee that this will work for you!
1 Preliminary Note
In this guide I will call the web site that I want to cache www.example.com, and it's running on Apache2. I will install Squid on the same server and configure Apache to listen on port 8080 and Squid on port 80 so that all HTTP requests go to Squid which then passes them on to Apache (unless it can satisfy the request from its cache).
Of course, you are free to install Squid on another system - you could then let Apache run on port 80.
2 Preparing The Backend Web Server (Apache)
Squid will pass on the original user's IP address in a field called X-Forwarded-For to the backend web server (Apache). Of course, the backend web server should log the original user's IP address in its access log instead of the IP address of our Squid proxy. Therefore we must modify the LogFormat line in /etc/apache2/apache2.conf and replace %h with %{X-Forwarded-For}i:
vi /etc/apache2/apache2.conf
[...] #LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined [...] |
Next we open /etc/apache2/ports.conf...
vi /etc/apache2/ports.conf
... and change the port that Apache should listen on to 8080:
Listen 8080 |
Afterwards we restart Apache:
/etc/init.d/apache2 restart
3 Installing And Configuring Squid
Squid can be installed as follows:
apt-get install squid
Next we make a backup of the original squid.conf file (/etc/squid/squid.conf):
cd /etc/squid/
mv squid.conf squid.conf_orig
squid.conf_orig is more then 4000 lines long - it contains all valid Squid 2.6 configuration options together with lots of comments. Although this is pretty much to read, you should definitely take the time to study it!
Now we can create a squid.conf file for our server:
vi squid.conf
cache_mgr root # Basic parameters visible_hostname www.example.com # This line indicates the server we will be proxying for http_port 80 defaultsite=www.example.com vhost # And the IP Address for it - adjust the IP and port if necessary cache_peer 127.0.0.1 parent 8080 0 no-query originserver login=PASS acl apache rep_header Server ^Apache broken_vary_encoding allow apache # Where the cache files will be, memory and such cache_dir ufs /var/spool/squid 10000 16 256 cache_mem 256 MB maximum_object_size_in_memory 128 KB # Log locations and format #logformat common %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st %Ss:%Sh logformat combined %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh access_log /var/log/squid/access.log combined # Example how to configure Squid to not log certain requests #acl dontlog urlpath_regex ^\/monit\/token$ #acl dontlog urlpath_regex ^\/server-status$ #acl dontlog urlpath_regex ^\/admedia\/reste_300x250.php$ #acl dontlog urlpath_regex ^\/admedia\/reste_120x600.php$ #acl dontlog urlpath_regex ^\/admedia\/reste_728x90.php$ #acl dontlog urlpath_regex ^\/geoip\/rectangle_forum_postbit.html$ #acl dontlog urlpath_regex ^\/geoip\/banner_iframe.php #acl dontlog urlpath_regex ^\/js\/amazon.js #acl dontlog urlpath_regex .js$ #acl dontlog urlpath_regex .css$ #acl dontlog urlpath_regex .png$ #acl dontlog urlpath_regex .gif$ #acl dontlog urlpath_regex .jpg$ #access_log /var/log/squid/access.log combined !dontlog cache_log /var/log/squid/cache.log cache_store_log /var/log/squid/store.log logfile_rotate 10 ## put this in crontab to rotate logs at midnight: ## 0 0 * * * /usr/sbin/squid -k rotate &> /dev/null hosts_file /etc/hosts # Basic ACLs acl all src 0.0.0.0/0.0.0.0 acl manager proto cache_object acl localhost src 127.0.0.1/255.255.255.255 acl to_localhost dst 127.0.0.0/8 acl Safe_ports port 80 acl purge method PURGE acl CONNECT method CONNECT http_access allow manager localhost http_access deny manager http_access allow purge localhost http_access deny purge http_access deny !Safe_ports http_access allow localhost http_access allow all http_access allow all http_reply_access allow all icp_access allow all cache_effective_group proxy coredump_dir /var/spool/squid forwarded_for on emulate_httpd_log on redirect_rewrites_host_header off buffered_logs on # Do not cache cgi-bin, ? urls, posts, etc. hierarchy_stoplist cgi-bin ? acl QUERY urlpath_regex cgi-bin \? acl POST method POST no_cache deny QUERY no_cache deny POST # Example how to configure Squid to not cache certain URLs #acl adminurl urlpath_regex ^/myadminpanel #no_cache deny adminurl #acl phpmyadminurl urlpath_regex ^/phpmyadmin #no_cache deny phpmyadminurl |
This is standard stuff. Make sure you adjust the hostname and IP address, if necessary. The login=PASS option in the cache_peer line makes that .htaccess authentication is passed through the cache. I've added some (commented out) lines that show how you can configure Squid to not log certain requests in its access log and how to tell it to not cache certain URLs.
This configuration assumes that all your users are seeing the same, i.e., you don't have logged-in users that should see something different than anonymous users. If you have logged-in users that should not see cached content, read on - I'll come to that in a moment (chapter 5).
Now change the permissions of squid.conf and restart Squid:
chmod 600 squid.conf
/etc/init.d/squid restart
3.1 Log Rotation
Let's assume that you want Squid to start new log files (access.log, cache.log, store.log) each day at midnight because your log analysis program needs it that way. To do this, you need the logfile_rotate directive in squid.conf (see above) with a number of old log files to keep (e.g., if you specify 10, Squid would keep the last ten log files, access.log.0 - access.log.9, cache.log.0 - cache.log.9, and store.log.0 - store.log.9).
In order to tell Squid to start a new log file, we must create the following cron job that runs each day at midnight:
crontab -e
0 0 * * * /usr/sbin/squid -k rotate &> /dev/null |