View Full Version : Squid as a Reverse Proxy for ISPconfig on the same machine
RotHorseKid
28th November 2005, 17:14
Hello All.
I would like to setup Squid as a reverse proxy for my sites which I administer with ISPconfig.
I am using SuSE 9.3 with the perfect setup.
I think by browsing this forum I already found out what to do, I just want to check back to make sure my idea is not completely braindead:
1. Change make_vhost() in /root/ispconfig/scripts/lib/config.lib.php so the sites get created on a different port (say, 8080)
2. Change /etc/apache2/listen.conf so the webserver listens on port 8080 for all my IPs
3. Change /etc/apache2/vhosts/Vhosts_ispconfig.conf so that the existing vhosts listen on port 8080
4. Install and configure Squid with httpd_accel_port set to 8080 and listening on all my IPs
Please let me know if this approach makes sense. Did I miss anything?
I would also appreciate any insights on if and what problems with updating this creates.
Regards,
RHK
falko
28th November 2005, 18:56
Looks good! :)
Regarding ISPConfig updates, whenever /root/ispconfig/scripts/lib/config.lib.php is changed, you have to edit it again and change the ports to 8080...
Tenaka
28th November 2005, 20:19
have you already made a case study? do you know for which kind of traffic squid makes sense as an accelerator? I was interetsted too because I am hosting a site with a huge image gallery but I have not found an answer as to when it is indicated to use squid as an accelerator...
RotHorseKid
29th November 2005, 17:25
I just tested the setup with one of my domains.
It seems to work as expected.
But - I am not completely sure if the traffic restrictions for sites and clients still work, as much of the traffic is now handled by Squid. The same is true for Webalizer Stats.
Any ideas?
till
29th November 2005, 17:28
Only the traffic that goes through apache is counted. Why do you want to use squid, so much traffic that apache cant handle it?
falko
29th November 2005, 19:39
But - I am not completely sure if the traffic restrictions for sites and clients still work, as much of the traffic is now handled by Squid. The same is true for Webalizer Stats.
As long as Apache is logging to the same log file as before everything should be fine. :)
Tenaka
29th November 2005, 22:27
Maybe this helps:
emulate_httpd_log on
The option emulate_httpd_log, if set to ON, specifies that Squid should emulate the log file format of the Apache web server. This is very useful if you want to use a third party program like Webalizer to analyze the Web Server httpd log file.
Taken from here (http://www.faqs.org/docs/securing/chap28sec231.html). See here (http://www.squid-cache.org/Doc/FAQ/FAQ-20.html) and here (http://www.squid-cache.org/Doc/FAQ/FAQ.html#toc20) for more info.
;)
I'll try it too if you succeed and you think its worth the hassle AND if I manage to get more free time, so maybe next year :mad:
###edit###
just answered one of my own questions:
The cache serves references to cachable objects, such as HTML pages and GIFs, and the true httpd (on port 81) serves references to non-cachable objects, such as queries and cgi-bin programs. If a site's usage characteristics tend toward cachable objects, this configuration can dramatically reduce the site's web workload.
So yes, squid will help my site's performance because I am hosting a site which gets around 50-100GB traffic / month and mainly consists of a picture gallery..
RotHorseKid
30th November 2005, 13:56
Only the traffic that goes through apache is counted.
That's what I thought.
emulate_httpd_log on
Been there, done that.
Could I just let Squid write it's httpd-emulated logs to the respective web_log files for the sites?
If that's feasible, how would I do that with Squid for different logs based on IP/Domain? I did not find that this would be possible.
Why do you want to use squid, so much traffic that apache cant handle it?
In tests, I found decreases in latency by using Squid, especially for pages with lots of graphics (as Tenaka already found). This is simply for the fact that, for 50+ different sites on one server, Squid does a better job serving pages/graphics from memory than Linux and Apache alone. So by not going through apache, I decrease the latency introduced by disk IO.
This is good for commercial pages, where the user's illusion of "fastness" or "responsiveness" of a web site mainly can be expressed in the amount of milliseconds it takes from the user entering an URL or clicking a bookmark/link until something starts to get rendered in the browser.
Secondly, I have some sites with high-latency DB connections. Much near-to-static data comes from these DBs. By tuning the web applications to write the correct Cache-Control headers for these pages, I can minimize the amount of actual DB queries made.
At least these were my findings, I am open to discussion here.
Tenaka
30th November 2005, 14:45
That's what I thought.
a) Could I just let Squid write it's httpd-emulated logs to the respective web_log files for the sites?
b) In tests, I found decreases in latency by using Squid, especially for pages with lots of graphics (as Tenaka already found). This is simply for the fact that, for 50+ different sites on one server, Squid does a better job serving pages/graphics from memory than Linux and Apache alone. So by not going through apache, I decrease the latency introduced by disk IO.
c) Secondly, I have some sites with high-latency DB connections. Much near-to-static data comes from these DBs. By tuning the web applications to write the correct Cache-Control headers for these pages, I can minimize the amount of actual DB queries made.
d) At least these were my findings, I am open to discussion here.
Originally Posted by till
Only the traffic that goes through apache is counted.
I do not udnerstand this fully, isn't traffic still going through apache? the only difference I see is that apache isn't delivering to the client but to squid. And squid, I suppose is requesting from apache in the same way a client would, so the logfiles should still be ok as usual? Or is there a major point I am missing here?
a) see my above lines
b) great ;-)
c) can you explain in a little bit more detail here? are you talking about optimizing mysql settings?
d) please keep in mind that I am talking about theory, so be patient, I have not yet tested this just been reading about it and considering implementing right now
RotHorseKid
30th November 2005, 15:10
I do not udnerstand this fully, isn't traffic still going through apache? the only difference I see is that apache isn't delivering to the client but to squid. And squid, I suppose is requesting from apache in the same way a client would, so the logfiles should still be ok as usual? Or is there a major point I am missing here?
The problem is, AFAIK ISPconfig is using the Apache logs to find out which site is generating how much traffic.
With Squid in front of Apache, Squid serves the content it has cached directly, Apache does not know about that, therefore it will not be in the logs (the log/web_log files found beneath the web directories).
BUT the traffic is generated anyway (at the external interface of my server, where my ISP measures the traffic, they don't care if it's cached or if apache served it), and I (resp. my clients) still have to pay for that.
At least I believe that is how it works. Tell me if I am wrong.
c) can you explain in a little bit more detail here? are you talking about optimizing mysql settings?
Not exactly.
I serve a web page that has some content coming from a DB. Connecting to this DB is VEEERY expensive, latency-wise (it's Oracle, perhaps you know what I mean, in my case there are round-trips of 1000-1500ms).
What I know is that some of these pages are static over a long period of time (like the items in a webshop for example). So I go ahead and serve these pages with a Cache-Control: public http header. (Think header() in PHP for example).
Squid caches these pages now, and there are no DB connections.
till
30th November 2005, 15:26
The problem is, AFAIK ISPconfig is using the Apache logs to find out which site is generating how much traffic.
With Squid in front of Apache, Squid serves the content it has cached directly, Apache does not know about that, therefore it will not be in the logs (the log/web_log files found beneath the web directories).
BUT the traffic is generated anyway (at the external interface of my server, where my ISP measures the traffic, they don't care if it's cached or if apache served it), and I (resp. my clients) still have to pay for that.
At least I believe that is how it works. Tell me if I am wrong.
You are about 98% right :)
Logs and traffic counting for websites in ISPCOnfig:
1) Apache writes one big logfile for all sites. The logfile has the name of the day (cronolog), e.g. /var/log/httpd/ispconfig_access_log_2005_11_30
2) This log is splitted nightly by the script /root/ispconfig/scripts/shell/logs.php. The splitted logfile parts are appended to the logs inside the website directory. The logs.php scripts counts also the traffic while it is plitting the logfile.
3) Later, the webalizer.php script is run to generate the website statistics based on the splitted logfiles for every website.
Tenaka
30th November 2005, 17:17
With Squid in front of Apache, Squid serves the content it has cached directly, Apache does not know about that, therefore it will not be in the logs (the log/web_log files found beneath the web directories).
BUT the traffic is generated anyway (at the external interface of my server, where my ISP measures the traffic, they don't care if it's cached or if apache served it), and I (resp. my clients) still have to pay for that.
At least I believe that is how it works. Tell me if I am wrong.
you're right, I didn't realize that
Not exactly.
I serve a web page that has some content coming from a DB. Connecting to this DB is VEEERY expensive, latency-wise (it's Oracle, perhaps you know what I mean, in my case there are round-trips of 1000-1500ms).
What I know is that some of these pages are static over a long period of time (like the items in a webshop for example). So I go ahead and serve these pages with a Cache-Control: public http header. (Think header() in PHP for example).
Squid caches these pages now, and there are no DB connections.
thx for the explanation
now I would be interested in some feedback if you get the logging done so you can allocate traffic to the customers although traffic is giong through squid.
RotHorseKid
30th November 2005, 17:25
1) Apache writes one big logfile for all sites. The logfile has the name of the day (cronolog), e.g. /var/log/httpd/ispconfig_access_log_2005_11_30
So, to get correct traffic reports, I have to make Squid log (in httpd compat mode) to the symbolic link
/var/log/httpd/ispconfig_access_log
and ISPconfig will take care of the rest?
That almost sounds too easy. But ISPconfig is a fscking great piece of software, so it might just work...
Tell me. If it works like this (or almost like this), it would be easy to offer Squid acceleration for ISPconfig as an option...
Tenaka
30th November 2005, 17:35
this might be a little off topic but did anyone try the apache2 mod_proxy? http://httpd.apache.org/docs/2.0/mod/mod_cache.html still experimental!
falko
30th November 2005, 18:04
So, to get correct traffic reports, I have to make Squid log (in httpd compat mode) to the symbolic link
/var/log/httpd/ispconfig_access_log
and ISPconfig will take care of the rest?
That almost sounds too easy. But ISPconfig is a fscking great piece of software, so it might just work...
Almost...;)
ISPConfig uses a special log format; it is defined in the httpd.conf and looks like this:
LogFormat "%v||||%b||||%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined_ispconfig
CustomLog "|/root/ispconfig/cronolog --symlink=/var/log/httpd/ispconfig_access_log /var/log/httpd/ispconfig_access_log_%Y_%m_%d" combined_ispconfig
So a log line in /var/log/httpd/ispconfig_access_log would look like this:
www.example.com||||2846||||1.2.3.4 - - [30/Nov/2005:17:01:50 +0100] "GET /bla/icons_expanded.png HTTP/1.1" 200 2846 "http://www.example.com/bla" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.7.12) Gecko/20050919 Firefox/1.0.7"
So you have to configure Squid so that it prepends the web's domain (www.example.com) and the file size to every line.
RotHorseKid
7th December 2005, 19:24
Thanx for the info. I will report when I got it implemented.
Regards,
RHK
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.