Possible enhancements in future versions of mod_gzip
Author: Michael Schröpl
Possible enhancements in future versions of mod_gzip
This document describes some possible functional enhancements which hopefully could be implemented into the current version 188.8.131.52a of mod_gzip without too much effort and enhance the usability of this module.
The decision about whether a document content qualifies for compression by mod_gzip is finally done during the evaluation of the filter rules defined by the directives mod_gzip_item_include and mod_gzip_item_exclude which are checked by the function mod_gzip_validate1.
But this function is invoked at no less than five positions within the source code, each time with different parameter settings for specific parts of the rules classes to be checked at that moment. In some of these cases it is clear from the restricted parameter setting which rule class must have led to the specific result (but not which rule value!), in other cases (such as the simultaneous test of all rules of the classes file, uri, mime and handler) not even the class of the decisive rule is clear (because mod_gzip_validate1 returns such an unspecific result value to the caller that this one cannot even understand what exactly has happened - in some cases even internal errors are encoded like a decision because of an exclusion rule). In these cases not even a reasonable evaluation of the mod_gzip status code is possible.
But the definition of an appropriate rule set is the most important step during the complete (and currently still rather complicated) mod_gzip configuration procedure. Any information about which one of the defined rules decided in which cases about whether a document content was to be compressed would be helpful for the user in many cases.
On the other hand mod_gzip already supports processing transparency after successfully handling a document by setting some variables which can be used in Apache log formats (the processing status, the document sizes before and after compression as well as the volume saving in percent - the last one erroneously always rounded up). According to this mod_gzip could (just in the moment when the decision about compressing the document content was made) easily store class and content of the decisive rule into two more log variables which would then be addressed withing a log format via the names mod_gzip_rule_class and mod_gzip_rule_content.
Currently mod_gzip uses the gzip compression level 6. This is hard-coded by the assignment gz1->level = 6 inside the function gz1_init.
The higher the compression level (gzip allows values between 1 and 9), the better the compression effect, but the higher the CPU time consumption as well. By adapting this compression level a user could solve the trade-off between CPU load and bandwidth saving according to his own requirements. Own experiments have shown that level 3 already takes you near the effect of compression level 6 - at least the choice between these two values should be left to the user.
Thus it would be reasonable to have this compression level configurable by offering another directive mod_gzip_compression_level.
A closer look by Christian Kruse led to the conclusion that Kevin Kiley's compression code for mod_gzip doesn't appear to support all 9 compression levels for gzip but basically just those required for level 6 (although the program structure was designed for implementing all 9 levels later). Therefore simply changing the aforementioned constant doesn't suffice; in fact mod_gzip isn't even usable any more after this kind of modification.
mod_gzip 184.108.40.206a uses - caused by its way of being embedded inside the Apache request handling - a complex, two-level filter procedure to decide whether the result of a request should be compressed. (In a modified architecture of Apache 2.0 a simpler way of embedding might be possible.)
A request currently will be accepted by mod_gzip exactly if during each of the two decision phases at least one include rule fires but no exclude rule. As such rules allow for regular expressions as parameter values this makes powerful conditions available.
Nevertheless all rules are independent from each other; the directives currently available don't allow the user to express that distinct MIME types only be delivered in compressed form to distinct browsers - the AND connection between several rules is missing.
There seem to be only rare occasions where this would be really helpful - as of now, it would be especially required to fine-tune filter rules as to cope with all the bugs of Netscape 4, other than by totally excluding this browser from compression. With time passing (and improved standards compliancy of the browsers) this feature may become obsolete.
According to RFC 2616, for a request using the HEAD method a HTTP/1.1 compliant implementation should serve HTTP headers identical to those that would have been served for a request for the same ressource using the GET method.
In the case of mod_gzip this would mean that even for a HEAD request the content would have to be compressed (resp. read, in case of a statically precompressed file) just to be able to generate a correct Content-Length HTTP header.
As of now, mod_gzip doesn't handle this (extremely rare) case of application (probably due to performance considerations); yet, according to RFC 2119 this might cause the module's behaviour thus to be only conditionally compatible to HTTP/1.1 because there has to be a valid reason for not supporting a recommended requirement.