How To Tell Apache To Not Log Certain Requests In Its Access Log

Want to support HowtoForge? Become a subscriber!
 
Submitted by falko (Contact Author) (Forums) on Fri, 2007-09-28 09:25. :: Apache

How To Tell Apache To Not Log Certain Requests In Its Access Log

Version 1.0
Author: Falko Timme <ft [at] falkotimme [dot] com>
Last edited 09/19/2007

Normally Apache logs all requests in its access log. In certain cases this can distort your page view statistics (if you use a tool like Webalizer or AWStats that creates statistics based on Apache's access log), for example if you get lots of visits from search engine spiders or from a certain IP address (e.g. your own), or if each of your pages includes another page (e.g. in an iframe) from your web site (that would instantly double your page views which is obviously not correct). This short guide shows how you use Apache's SetEnvIf directive to prevent Apache from logging such requests.

This document comes without warranty of any kind! I do not issue any guarantee that this will work for you!

 

1 Using SetEnvIf

The SetEnvIf directive can be used in the following contexts in your Apache configuration: in the global Apache configuration (if the directive should be valid for the whole server), in vhost configurations (if the directive should be valid only for that specific vhost), between <Directory ...></Directory> (if the directive should be valid only for a certain directory and its subdirectories), and in .htaccess files (AllowOverride FileInfo must be set).

With SetEnvIf, you can prevent requests from getting logged based on the following criteria (among others - see http://httpd.apache.org/docs/2.0/mod/mod_setenvif.html for more details):

  • Host
  • User-Agent
  • Referer
  • Accept-Language
  • Remote_Host: the hostname (if available) of the client making the request.
  • Remote_Addr: the IP address of the client making the request.
  • Server_Addr: the IP address of the server on which the request was received (only with versions later than 2.0.43).
  • Request_Method: the name of the method being used (GET, POST, etc.).
  • Request_Protocol: the name and version of the protocol with which the request was made (e.g., "HTTP/0.9", "HTTP/1.1", etc.).
  • Request_URI: the resource requested on the HTTP request line - generally the portion of the URL following the scheme and host portion without the query string.

The SetEnvIf directive has the following form:

SetEnvIf attribute regex env-variable

where attribute is one of the criteria I've just mentioned, and regex is a Perl compatible regular expression.

Now let's assume that Monit is requesting the file /monit/token once a minute to check if Apache is still running. Obviously we don't want to log these requests because they are not from a real user. Therefore we use the following SetEnvIf directive:

SetEnvIf Request_URI "^/monit/token$" dontlog

^ means that the Request_URI must begin with /monit/token, $ means that it must also end with /monit/token (so only /monit/token matches this regular expression). If we used "^/monit/token", any URL beginning with /monit/token would match the regular expression, e.g. /monit/token/example.html; "/monit/token$" would match any URL ending in /monit/token, e.g. /example/monit/token.

Now we have an iframe in /iframe/iframe.html that we don't want to log either. This is what we'd use:

SetEnvIf Request_URI "^/iframe/iframe.html$" dontlog

Now we must tell Apache that it must not log all requests labelled with dontlog. Find the CustomLog directive in your Apache configuration, e.g.

CustomLog /var/log/apache2/access.log combined

or

CustomLog "|/usr/bin/cronolog --symlink=/var/log/httpd/access.log /var/log/httpd/access.log_%Y_%m_%d" combined

and add env=!dontlog to the line:

CustomLog /var/log/apache2/access.log combined env=!dontlog

or

CustomLog "|/usr/bin/cronolog --symlink=/var/log/httpd/access.log /var/log/httpd/access.log_%Y_%m_%d" combined env=!dontlog

Restart Apache afterwards. Now it won't log any request anymore that is labelled with dontlog.

Here are some further examples that I've found on these pages:

To prevent all requests made with a certain browser, e.g. Internet Explorer, from getting logged, you could use:

SetEnvIf User_Agent "(MSIE)" dontlog

To not log requests from any client whose hostname ends in bla.example.com, use:

SetEnvIf Remote_Host "bla.example.com$" dontlog

To not log requests from any client whose hostname begins with example, use:

SetEnvIf Remote_Host "^example" dontlog

To not log requests from a certain IP address, use something like:

SetEnvIf Remote_Addr "192\.168\.0\.154" dontlog

If you don't want requests of your robots.txt to get logged, use:

SetEnvIf Request_URI "^/robots\.txt$" dontlog

Apart from SetEnvIf, which is case-sensitive, you can use SetEnvIfNoCase which is case-insensitive.

For example, in order not to log certain search engine spiders, you could use:

SetEnvIFNoCase User-Agent "Slurp/cat" dontlog
SetEnvIFNoCase User-Agent "Ask Jeeves/Teoma" dontlog
SetEnvIFNoCase User-Agent "Googlebot" dontlog
SetEnvIFNoCase Remote_Host "fastsearch.net$" dontlog

Or to not log certain file extensions, use something like this:

SetEnvIfNoCase Request_URI "\.(gif)|(jpg)|(png)|(css)|(js)|(ico)|(eot)$" dontlog

To not log certain referrals (e.g. from your own domain), use something like:

SetEnvIfNoCase Referer "www\.mydomain\.com" dontlog

 

2 Links


Please do not use the comment function to ask for help! If you need help, please use our forum.
Comments will be published after administrator approval.
Submitted by Biren Desai (not registered) on Mon, 2013-06-24 12:01.
Nice post. Keep sharing such this kinds of post here and aware us from it. Keep it up.
Submitted by Egbert O'Foo (not registered) on Fri, 2013-05-17 19:07.
> This document comes without warranty of any kind!

Ah, good ole BSD licensed, eh?

No worries ... I didn't need a warranty, just these instructions, and my years of previous experience with sysadmin work.

Worked for me with BSD!
Submitted by Anonymous (not registered) on Sun, 2013-03-10 12:38.
great, thanks a lot. very elegant & shows deep understanding of those piles of heaps of options!
Submitted by Frank (not registered) on Sun, 2009-02-08 11:10.

Hi and thanks,

 this saved me a lot of unneccessary log file space ( 5 gb a day ).

 keep up the good work!