mod_gzip - serving compressed content by the Apache webserver - Page 5
On this page
- 's module source code
Author: Michael Schröpl
How to install mod_gzip
Actually performing the installation isn't difficult, but finding the method that suits best to the needs of your Apache installation may take some time.
Therefore it is highly recommended that you read this chapter completely and become aware of the pros and cons of the different options before you select the operation method and perform the installation.
The document in hand especially covers the internal processing model of mod_gzip as an Apache module and may thus provide informations that can be helpful for understanding mod_gzip's evaluation method for configuration directives.
The Apache web server supports two different methods of integrating a module into its program code:
Depending on the operation concept used for your Apache server and the given requirements
- one of these two operation methods for mod_gzip has to be selected and
- the set of files required for this method has to be downloaded.
Static integration means that the module becomes a permanent part of the linked program binary httpd which implements the Apache server.
For this the Apache server source code has to be
Normally each administrator may want to use a different set of features for his Apache server adapted to his own requirements; therefore it doesn't seem feasible to provide program files ready to run on a multitude of platforms for download.
Dynamic integration means that the module can be loaded from a separate module file as shared object when starting the Apache process.
- the shared object file of this module for the required target platform has to be
- the Apache configuration file has to be extended by a directive to load this module.
Most Apache modules consist of one single source code file only. This file can be compiled by invoking the apxs program (with corresponding parameter values); for the installation of the shared object created this way one more invocation of apxs (with different parameter values) would be required.
From version 220.127.116.11a on, mod_gzip's source code is divided into three separate files:
- mod_gzip.c (about 8000 lines) contains all functions that are necessary to implement the processing logic of the mod_gzip Apache module.
This file is very much dependent on the module interface of the Apache version 1.3 (which didn't change for many years).
- mod_gzip_debug.c (about 500 lines) contains functions that merely are required for debugging tasks for the developer; part of these functions are not even contained in a mod_gzip compiled 'the normal way' (depending on the values of the given compiler directives of the -D type to define symbolic constants).
- mod_gzip_compress.c (about 3000 lines) contains Kevin Kiley's implementation of the gzip compression function, the one that 'actually does the work'.
This part is not dependent on any specific Apache version and (from a purely technical point of view) might be used by other compression tools as well (like mod_deflate which currently uses the 'zlib' for compression).
This structure of the source code makes the mod_gzip maintenance a little easier - and the installation a little more complicated (as now several source files have to be compiled instead of just one). Therefore mod_gzip now provides Makefiles to simplify this installation process.
Depending on which of the operation concepts named above is to be run, different files (suitable for the respective purpose of use) are to be used.
As of this writing the following files are available for each mod_gzip version at the download page for the mod_gzip project:
If mod_gzip is to be run with dynamic integration on some other platform then the shared module file for this platform has to be created by the administrator.
The procedure for installing the Apache webserver on a UNIX machine documented by the Apache Group reads like this (in the short version):
- download the archive with the Apache source code from the WWW
- unpack the archive
- navigate into the directory created by the previous operation
- read and understand the file INSTALL
- start the shell script ./configure with the appropriate parameter values (this will cause the creation of Makefile files in a large number of subdirectories)
- make install (this will cause the compilation and installation of the Apache server including its online documentation)
If the Apache webserver has been created this way then the shipped modules normally become static parts of the created program file httpd in the Apache program directory - unless you specified something different in the parameters of the configure call.
(The actual call of configure may become very extensive, depending on the degree of deviation from the standard parameter values. I recommend storing this call itself in a small shell script to document the type of processed installation by the way.)
The source code of the official Apache modules is contained in the src/modules subdirectory of the unpacked tar archive of the Apache software.
To let mod_gzip be treated like a standard Apache module by this mechanism, the following preparations are necessary:
- Uncompress and unpack the content of the download archive containing the mod_gzip source code (which will create a directory mod_gzip-versionnumber),
- Create a directory src/modules/gzip within the directory tree of the Apache source
- Copy all files with the extensions *.c, *.h and *.tmpl into this new gzip directory.
As next step, you extend the configure call by the parameter --activate-module=src/modules/gzip/mod_gzip.c. Now the configure script will find the mod_gzip source code and create a suitable Makefile from the shipped file Makefile.tmpl - logged by the messages
+ activated gzip module (modules/gzip/mod_gzip.c)
Creating Makefile in src/modules/gzip
- the latter one just like for Apache's own modules. (The Makefile shipped with mod_gzip is not suitable for this type of installation - this one is only for the creation of a shared object file.)
Now the Apache installation will work as usual - and mod_gzip will be treated like a normal Apache module.
But configure knows that a module integrated via the --activate-module parameter is a 3rd-party module that may probably have specific requirements, and thus will load mod_gzip automatically on top of the module stack so that it will have access to the incoming HTTP request prior to all other modules - which is exactly what mod_gzip urgently needs.
On some platforms Apache's configure doesn't seem to automatically set the value of the $(LIBEXT) environment variable to the proper value of .a. In this case the compilation of mod_gzip will fail. The exact reason for this behaviour is unknown as of now; as workaround you may replace the line
within the shipped file Makefile.tmpl, i. e. insert the proper value manually.
Be sure to use an editor that doesn't expand tab characters to whitespaces for this task!
(To be tested: What happens when integrating more than one 3rd-party module with --activate-module? Is the order of the parameter values relevant in this case?)
To check whether mod_gzip actually has been integrated into the Apache program code as requested, the Apache server provides the httpd -l command. This will display a list of all integrated modules (in the order in which they will be loaded); mod_gzip.c should be the last entry displayed there.
The Apache webserver supports the concept of loadable modules.
Nearly each Apache module can be
- compiled as a shared object and then
- dynamically loaded into Apache's address space at the start of the Apache server (by use of the corresponding configuration directives).
The handling of loadable modules requires additional knowledge about the Apache configuration (because the order in which these modules are loaded may be significant for their functioning) but allows for changes of the Apache server's code range without having to recompile its source code.
On platforms like Windows (where not many Apache administrators have a C development environment at hand to compile and link the Apache code) the use of loadable modules may often be the only possibility to enlarge the functional scope of the Apache server.
The Apache 1.3 documentation provides the following articles about this topic:
- Dynamic Shared Object (DSO) Support - the description of the corresponding concept for the Apache webserver
- Module mod_so - the description of the Apache module for loading other modules and the required configuration directives
To dynamically add the mod_gzip shared object to the Apache code, one of the following configuration directives is required:
# load a DLL / Windows:
LoadModule gzip_module modules/ApacheModuleGzip.dll
# load a DSO / UNIX:
LoadModule gzip_module modules/mod_gzip.so
# (none of both if module statically integrated)
The actual file name can be freely selected - it only has to match the name of the file effectively used. On the other hand, this name can depend on the operating system platform and even on the compilation method used for this module - in this case either the directive shown above has to be adapted or the file has to be renamed accordingly.
The Apache server can dynamically load any number of modules. While doing so, the corresponding LoadModule directives are processed in the order of their occurrence within the configuration file.
But the modules are loaded on a stack within the working memory: The module that has been loaded last will be the first to get access to handling the corresponding HTTP request to the Apache webserver - and may then decide whether to consider itself responsible for handling this request or not.
Only one of all modules in question can be responsible for handling a request in Apache 1.3 - subsequent modules will not even be asked.
Therefore, to be able to process the output of arbitrary modules, mod_gzip has to do something that actually contradicts the Apache 1.3 architecture: It has to 'handle' a request but subsequently revoke the responsibility for handling this request. Only by this procedure the module which is effectively responsible for handling the request can still be activated by the Apache server at all.
In this first phase the 'handling' of this request by mod_gzip does not mean to compress the page's content to be served - because this content doesn't even exist yet, it still has to be generated by another module! Instead, at this point in time mod_gzip just prepares to be asked again whether it wants to do anything after the page content's creation. Only in this second phase of its activation (where the content of the HTTP response is already available then) mod_gzip can perform its essential task, which is compressing the content of a HTTP response packet (and the modification of certain HTTP headers).
This 'registration' for later postprocessing the HTTP response performed by mod_gzip is necessary only if mod_gzip cannot already determine at this stage that it definitely won't be interested in processing the response content anyway.
Thus in this first phase mod_gzip already performs a part of the evaluation of the filter directives specified in the Apache configuration: It checks those rules where it can do this based upon the request description alone (i. e. the content of the corresponding HTTP headers). This applies to the mod_gzip_item_include/mod_gzip_item_exclude rules of the type
- reqheader (content of the HTTP request headers of the request),
- url (URL of the requested HTTP ressource),
- file (file name of the file betroffenen by this request, after evaluation of all Alias translations etc.) and
- handler (name of the handler responsible for evaluating this request, according to the Apache configuration).
If the evaluation of these filter rules already proves that this request's result must not be compressed, i. e. if
- at least one exclude rule is satisfied or
- none of the include rules is satisfied or
- if any other condition for performing the compression isn't satisfied (e. g. at this stage it can already be verified whether the client has entitled the serving of compressed data at all by sending the Accept-Encoding: gzip HTTP header)
then it is not necessary for mod_gzip to check the remaining rules after the creation of the response content - so this won't happen then, because mod_gzip remembers the result of the first evaluation phase for each request and terminates the second phase immediately in this case.
Otherwise in the second phase of its operation mod_gzip checks the remaining filter rules that can be evaluated only based on the actual content of the generated response packet:
- rspheader (content of the HTTP response headers) as well as
- mime (HTTP content type of the result).
Furthermore some other conditions are tested now, such as the size of the response packet (directives mod_gzip_minimum_file_size rsp. mod_gzip_maximum_file_size).
And only if all of these tests led to a positive result the compression of the response packed will actually be performed.
As to be able to perform all tasks described above, the mod_gzip module must have access to handling the HTTP request prior to each other Apache module whose output it is meant to handle. Because of the reversed order of access to handling a request for all Apache modules, mod_gzip should be loaded as the last one of all Apache modules.
For the static integration this module order is defined by the 'blueprint' of the httpd program during the compilation of the Apache source code. The procedure for compiling the Apache source code shipped by the Apache Group, activated by the configure shell script, knows all dependencies between the shipped modules (and ensures a corresponding order of these modules) but not the requirements of 3rd party modules like mod_gzip which are integrated into the compilation process by the configure parameter --add-module=file. To allow for a maximum of influence to these 3rd party modules such modules are loaded as last modules on the module stack.
So if mod_gzip is to be integrated into an Apache server as the only 3rd party module then configure automatically does the right thing. In case of using more than one 3rd party module the administrator is responsible for ordering these modules (maybe by the order of his --add-module= values? I didn't test this yet).
If an Apache server is operated to support the dynamic integration of modules(i. e. uses the mod_so module) then a utility program named apxs will be generated in Apache's bin directory during the Apache installation.
This programm allows its user to compile the program source code of an Apache module (using a C compiler) and to create a corresponding shared object file without requiring the complete source code of the Apache servers to be available: apxs knows all required Apache program interfaces and supplies the C compiler with the necessary information.
To save the user from finding out how exactly mod_gzip has to be compiled and installed completely when using apxs, a file named Makefile is provided within the source code archive.
Using this Makefile reduces the installation procedure to the following steps:
- Extract the files from the downloaded mod_gzip source code archive file into a (new, temporary) directory of your choice and change your current directory position into there.
- Find out the path name of the program apxs from your Apache installation.
- Perform the compilation running the command
make APXS=your_apxs_pathnameThis will create the shared object file mod_gzip.so within the current directory.
(This step of the operating sequence may be omitted as it will then be covered by the subsequent step.)
- Perform the installation running the command
make install APXS=your_apxs_pathnameThis will not only copy the shared object file into the corresponding directory of the Apache installation but automatically extend the Apache configuration file httpd.conf by the required directives LoadModule and AddModule as well ... if you don't like foreign programs to rewrite your precious configuration files you might prefer to perform this final step manually, or at least make a backup of you Apache configuration first.
Besides these necessary steps the Makefile supports the following commands (which might rather be of interest to developers):
- make clean removes all created object module files of the mod_gzip source code files from the current directory (i. e. all files with the name pattern *.o).
- make clean additionally removes the created shared object file mod_gzip.so as well.