How To Set Up A Caching Reverse Proxy With Squid 2.6 On Debian Etch

Want to support HowtoForge? Become a subscriber!
 
Submitted by falko (Contact Author) (Forums) on Tue, 2008-12-23 19:03. :: Debian

How To Set Up A Caching Reverse Proxy With Squid 2.6 On Debian Etch

Version 1.0
Author: Falko Timme <ft [at] falkotimme [dot] com>
Last edited 12/17/2008

This article explains how you can set up a caching reverse proxy with Squid 2.6 in front of your web server on Debian Etch. If you have a high-traffic dynamic web site that generates lots of database queries on each request, you can decrease the server load dramatically by caching your content for a few minutes or more (that depends on how often you update your content).

I do not issue any guarantee that this will work for you!

 

1 Preliminary Note

In this guide I will call the web site that I want to cache www.example.com, and it's running on Apache2. I will install Squid on the same server and configure Apache to listen on port 8080 and Squid on port 80 so that all HTTP requests go to Squid which then passes them on to Apache (unless it can satisfy the request from its cache).

Of course, you are free to install Squid on another system - you could then let Apache run on port 80.

 

2 Preparing The Backend Web Server (Apache)

Squid will pass on the original user's IP address in a field called X-Forwarded-For to the backend web server (Apache). Of course, the backend web server should log the original user's IP address in its access log instead of the IP address of our Squid proxy. Therefore we must modify the LogFormat line in /etc/apache2/apache2.conf and replace %h with %{X-Forwarded-For}i:

vi /etc/apache2/apache2.conf

[...]
#LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
[...]

Next we open /etc/apache2/ports.conf...

vi /etc/apache2/ports.conf

... and change the port that Apache should listen on to 8080:

Listen 8080

Afterwards we restart Apache:

/etc/init.d/apache2 restart

 

3 Installing And Configuring Squid

Squid can be installed as follows:

apt-get install squid

Next we make a backup of the original squid.conf file (/etc/squid/squid.conf):

cd /etc/squid/
mv squid.conf squid.conf_orig

squid.conf_orig is more then 4000 lines long - it contains all valid Squid 2.6 configuration options together with lots of comments. Although this is pretty much to read, you should definitely take the time to study it!

Now we can create a squid.conf file for our server:

vi squid.conf

cache_mgr root

# Basic parameters
visible_hostname www.example.com

# This line indicates the server we will be proxying for
http_port 80 defaultsite=www.example.com vhost

# And the IP Address for it - adjust the IP and port if necessary
cache_peer 127.0.0.1 parent 8080 0 no-query originserver login=PASS

acl apache rep_header Server ^Apache
broken_vary_encoding allow apache

# Where the cache files will be, memory and such
cache_dir ufs /var/spool/squid 10000 16 256
cache_mem 256 MB
maximum_object_size_in_memory 128 KB

# Log locations and format
#logformat common %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st %Ss:%Sh
logformat combined %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh

access_log /var/log/squid/access.log combined

# Example how to configure Squid to not log certain requests
#acl dontlog urlpath_regex ^\/monit\/token$
#acl dontlog urlpath_regex ^\/server-status$
#acl dontlog urlpath_regex ^\/admedia\/reste_300x250.php$
#acl dontlog urlpath_regex ^\/admedia\/reste_120x600.php$
#acl dontlog urlpath_regex ^\/admedia\/reste_728x90.php$
#acl dontlog urlpath_regex ^\/geoip\/rectangle_forum_postbit.html$
#acl dontlog urlpath_regex ^\/geoip\/banner_iframe.php
#acl dontlog urlpath_regex ^\/js\/amazon.js
#acl dontlog urlpath_regex .js$
#acl dontlog urlpath_regex .css$
#acl dontlog urlpath_regex .png$
#acl dontlog urlpath_regex .gif$
#acl dontlog urlpath_regex .jpg$
#access_log /var/log/squid/access.log combined !dontlog

cache_log /var/log/squid/cache.log
cache_store_log /var/log/squid/store.log
logfile_rotate 10
## put this in crontab to rotate logs at midnight:
## 0 0 * * * /usr/sbin/squid -k rotate &> /dev/null

hosts_file /etc/hosts

# Basic ACLs
acl all src 0.0.0.0/0.0.0.0
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
acl to_localhost dst 127.0.0.0/8
acl Safe_ports port 80
acl purge method PURGE
acl CONNECT method CONNECT

http_access allow manager localhost
http_access deny manager
http_access allow purge localhost
http_access deny purge
http_access deny !Safe_ports
http_access allow localhost
http_access allow all
http_access allow all
http_reply_access allow all

icp_access allow all

cache_effective_group proxy

coredump_dir /var/spool/squid

forwarded_for on

emulate_httpd_log on

redirect_rewrites_host_header off

buffered_logs on

# Do not cache cgi-bin, ? urls, posts, etc.
hierarchy_stoplist cgi-bin ?
acl QUERY urlpath_regex cgi-bin \?
acl POST method POST
no_cache deny QUERY
no_cache deny POST

# Example how to configure Squid to not cache certain URLs
#acl adminurl urlpath_regex ^/myadminpanel
#no_cache deny adminurl
#acl phpmyadminurl urlpath_regex ^/phpmyadmin
#no_cache deny phpmyadminurl

This is standard stuff. Make sure you adjust the hostname and IP address, if necessary. The login=PASS option in the cache_peer line makes that .htaccess authentication is passed through the cache. I've added some (commented out) lines that show how you can configure Squid to not log certain requests in its access log and how to tell it to not cache certain URLs.

This configuration assumes that all your users are seeing the same, i.e., you don't have logged-in users that should see something different than anonymous users. If you have logged-in users that should not see cached content, read on - I'll come to that in a moment (chapter 5).

Now change the permissions of squid.conf and restart Squid:

chmod 600 squid.conf
/etc/init.d/squid restart

 

3.1 Log Rotation

Let's assume that you want Squid to start new log files (access.log, cache.log, store.log) each day at midnight because your log analysis program needs it that way. To do this, you need the logfile_rotate directive in squid.conf (see above) with a number of old log files to keep (e.g., if you specify 10, Squid would keep the last ten log files, access.log.0 - access.log.9, cache.log.0 - cache.log.9, and store.log.0 - store.log.9).

In order to tell Squid to start a new log file, we must create the following cron job that runs each day at midnight:

crontab -e

0 0 * * * /usr/sbin/squid -k rotate &> /dev/null

Please do not use the comment function to ask for help! If you need help, please use our forum.
Comments will be published after administrator approval.
Submitted by Anonymous (not registered) on Mon, 2009-01-05 22:18.
rep_header above should be req_header