Splitting Apache Logs With vlogger
Version 1.0
Author: Falko Timme
Vlogger is a little tool with which you can write Apache logs broken down by virtual hosts and days. With vlogger, we need to put just one CustomLog directive into our global Apache configuration, and it will write access logs for each virtual host and day. Therefore, you do not have to split Apache's overall access log into access logs for each virtual host each day, and you do not have to configure Apache to write one access log per virtual host (which could make you run out of file descriptors very fast).
At the end of this tutorial I will show you how to use webalizer to create statistics from the Apache access logs.
I do not issue any guarantee that this will work for you!
1 Preliminary Note
I have tested vlogger on a Debian Etch system where Apache2 is already installed and working.
2 Installing And Configuring vlogger
To install vlogger, we simply run
apt-get install vlogger
Afterwards, we have to change the LogFormat line (there are multiple LogFormat lines - at least change the one that is named combined) in /etc/apache2/apache2.conf. We must add the string %v at the beginning of it:
vi /etc/apache2/apache2.conf
[...] #LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined LogFormat "%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined [...] |
Then add the following CustomLog line to the same file (you can put it directly after the LogFormat line):
vi /etc/apache2/apache2.conf
[...] CustomLog "| /usr/sbin/vlogger -s access.log /var/log/apache2" combined [...] |
That's the only CustomLog directive that we need in our whole Apache configuration. Please disable all other CustomLog directives, especially in your virtual host configurations!
The advantage of writing just one access log is that this lowers the load on the server a lot, especially if you have some high-traffic sites on your server.
Now restart Apache:
/etc/init.d/apache2 restart
Vlogger will now create subdirectories in the /var/log/apache2 directory, one per virtual host, and it will create access logs that contain the current date in the file name. It will also create a symlink called access.log that points to the current log file.
Let's assume we have two virtual hosts, www.example.com and www.test.tld. Then this is how the /var/log/apache2 directory will look like:
/var/log/apache2/
www.example.com/
06042007-access.log
06052007-access.log
06062007-access.log
access.log -> 06062007-access.log
www.test.tld/
06042007-access.log
06052007-access.log
06062007-access.log
access.log -> 06062007-access.log
To learn what other vlogger command line directives you can put into the CustomLog line, take a look at
man vlogger
3 Creating Statistics With webalizer
In this chapter I will show you how you can create statistics from the splitted log files with webalizer. Again, I'm assuming that you have two virtual hosts, www.example.com and www.test.tld, and these virtual hosts have the document roots /var/www/www.example.com/web and /var/www/www.test.tld/web (it's important that the server names are in the document root paths, otherwise the following procedure won't work). I'd like to put the statistics into the directories /var/www/www.example.com/web/stats and /var/www/www.test.tld/web/stats, so these must already exist.
First, let's install webalizer:
apt-get install webalizer
Take a look at
man webalizer
to see how webalizer works. Basically, to create statistics for www.example.com from yesterday's access log, you can use this command:
/usr/bin/webalizer -c /etc/webalizer/webalizer.conf -n www.example.com \
-s www.example.com -r www.example.com -q -T -o /var/www/www.example.com/web/stats \
/var/log/apache2/www.example.com/`/bin/date -d "1 day ago" +%m%d%Y`-access.log
(/etc/webalizer/webalizer.conf is the location of Debian's default webalizer.conf. /bin/date -d "1 day ago" +%m%d%Y prints yesterday's date exactly the way we need it so that we can pass yesterday's access.log to webalizer without needing to know the exact date.)
Of course, we don't want to run such a command manually for each virtual host, therefore we write a little shell script that reads the /var/log/apache2 directory and creates statistics for each virtual host that has logs in that directory. I name the script webstats and place it in the /usr/local/sbin directory:
vi /usr/local/sbin/webstats
#!/bin/sh logdir=/var/log/apache2 webalizerconf=/etc/webalizer/webalizer.conf yesterdaysdate=`/bin/date -d "1 day ago" +%m%d%Y` cd ${logdir} for directory in * do if [ -d ${directory} ]; then /usr/bin/webalizer -c ${webalizerconf} -n ${directory} \ -s ${directory} -r ${directory} -q -T -o /var/www/${directory}/web/stats \ ${logdir}/${directory}/${yesterdaysdate}-access.log fi done exit 0 |
We must make that script executable:
chmod 755 /usr/local/sbin/webstats
Finally, we create a cron job that calls the /usr/local/sbin/webstats script every night at 04.00h:
crontab -e
0 4 * * * /usr/local/sbin/webstats &> /dev/null |
After the cron job has run for the first time, you can go to www.example.com/stats and www.test.tld/stats to see the statistics in your browser. It's a good idea to password-protect the stats directories with .htaccess/.htpasswd.
4 Links
- vlogger: http://n0rp.chemlab.org/vlogger
- webalizer: http://www.mrunix.net/webalizer
- Debian: http://www.debian.org