The Magic of mod_perl

by Frank Wiles
NOTE: This article first appeared in SysAdmin Magazine in January 2005. The mod_perl API has changed since that time and the following code examples will require slight modification to work

I often run into people who are confused about what mod_perl is. Some people think mod_perl is only useful to speed up CGI scripts. Some oddly believe it to be a heretical and incompatible version of the Perl programming language. While others are very confused and think mod_perl is just another fancy way of saying Perl/CGI. Fortunately all of them are wrong. With the upcoming release of mod_perl 2.0 (mod_perl 2.0 is very near release as of the writing of this article and likely will be released before this is published). I wanted to better explain mod_perl.

In short, mod_perl embeds a Perl interpretor directly into your Apache web server. It does have the wonderful added benefit of speeding up your CGI scripts, but that is just a taste of its power. The real power of mod_perl is the ability for you to directly use all of the Apache API from Perl. This is also what sets mod_perl apart from other similar technologies such as PHP (mod_php) and Python (mod_python), which only allow you to control the content or response phase of the Apache server.

When a browser requests a page from an Apache server, the request goes through several processing phases. Some are related to access, authentication, logging, but the most common is the response phase. The response phase is what you work with when building a Perl CGI or a PHP script. It is the part of the process that generates the actual HTML page and returns it to the browser.

The power of mod_perl is that it gives you the ability to replace the default behaviors of any of these phases with your own phase handlers. mod_perl handlers can be thought of as true Apache modules, plugged directly into the server, rather than a script or other outside process. Each handler is a different Perl module that you have written to deal with a particular phase. This can also help code reuse by allowing you to share a particular logging or authentication handler on many different sites or dedicated servers without having to alter any other aspects of the Apache process.

Here are some examples of mod_perl's abilities:

  • Log all requests to http://www.domain.com/admin/ into a SQL database, capturing the information we care about and still logging the rest of the site to the normal Apache log file for traffic analysis.
  • Replace Apache's flat file Basic auth with a SQL database that controls access both by username/password as well as by date and time of the request. This could be used to allow employees access to an application only during office hours.
  • mod_perl gives you the ability to configure your Apache web server with Perl code. Use this to easily configure a large amount of VirtualHosts by querying a database instead of manually configuring them in httpd.conf.
  • Apache Filters allow you to filter the output of any flat file, a script written in another language, or even another mod_perl handler before sending the page on to the browser. This can be used to clean up the output of a legacy system without having to modify the original code.
  • Because Apache 2.0 is protocol agnostic you can even make your server speak protocols other than HTTP. An example of this would be to build a SMTP server in mod_perl and a corresponding web application to control how it operates.

Installation

Installing mod_perl is a relatively easy task. If you are using a recent Linux distribution you may have it installed already. mod_perl 2.0 does have some prerequisites, namely a recent Perl and Apache 2.x. If both of these are already installed all that is required is downloading the mod_perl source code from http://perl.apache.org/download and issuing the following commands:

# tar -xvzf mod_perl-2.x.xx.tar.gz
         # cd mod-perl-2.x.xx
         # perl Makefile.PL MP_APXS=/path/to/apxs MP_INST_APACHE2=1
         # make
         # make test
         # make install
        

Configuration

Once you have mod_perl installed you will need to configure it in your httpd.conf by adding the following configuration options:

LoadModule perl_module modules/mod_perl.so
        PerlRequire /path/to/perl/libs/startup.pl
        

The startup.pl script allows you to setup your @INC library path and preload any modules that you want shared among your Apache server children. For the examples that follow I'm using the following minimal startup.pl:

use lib qw(/path/to/perl/libs);
        use Apache2;
        1;
        

Speeding up CGIs

mod_perl speeds up your CGI scripts by getting rid of the infamous "fork, compile, execute" problem. When running a normal CGI, the Apache web server forks a Perl interpretor, which in turn compiles and executes the Perl source. With normal CGIs this process is repeated for each request made to the CGI. With mod_perl we remove the expensive forking step by having an embedded interpretor inside of our web server. However, mod_perl also will compile the Perl source on server startup and keep it in memory. This leaves only the actual execution of the code on each request.

As I'm sure you can imagine this greatly increases the speed of most CGIs, often as much as 100 times their original speed. To configure this for all of your Perl CGIs in the directory /modperl/ simply add this to your Apache's httpd.conf:

<Location /modperl/>
            SetHandler perl-script
            PerlResponseHandler Modperl::Registry
            PerlOptions +ParseHeaders
            Options +ExecCGI
        </Location>
        

This instructs mod_perl to compile the Perl scripts in the /modperl/ directory once for each Apache child and store it in memory. If the script is edited on disk mod_perl is smart enough to recompile on the next request it to reflect your changes.

Extending the Apache Web Server

One challenge that faces many web site administrators is that of reworking your file system layout without breaking existing bookmarks and deep links into your site. Sometimes these can be fixed with redirects or mod_rewrite rules, but these can quickly become unwieldy to manage on large sites.

Suppose you run a news website where your articles are stored into a different directory each day and you want to change this layout slightly. You want to be able to change requests for http://www.example.com/20041106/article-title.html into something like http://www.example.com/archive/2004/11/06/article-title.html on the fly. A URI is mapped to filenames in the TransHandler phase of the Apache life cycle. Using mod_perl we can easily change the default behavior with a translation handler like this:

package My::LayoutChanger;

        use strict;
        use warnings;

        use Apache::RequestRec ();
        use Apache::Const -compile => qw(DECLINED);

        sub handler {
            my $r = shift;

            # See if the requested URI follows our old style of having
            # an eight digit directory in the form of /YYYYMMDD/
            if( $r->uri =~ m|^/\d{8}/|o ) {

                # Extract the parts of the date and the filename from the
                # requested URI
                my ($year, $month, $day, $file) =
                    $r->uri =~ m|^/(\d\d\d\d)(\d\d)(\d\d)/(.*?)$|o;

                # Replace the URI transparently
                $r->uri("/archive/$year/$month/$day/$file");

            }

            # Return DECLINED so that other trans handlers can be
            # called if necessary
            return( Apache::DECLINED );
        }
        1;
        

You configure this in Apache's httpd.conf with the following directives:

PerlModule My::LayoutChanger
        PerlTransHandler +My::LayoutChanger
        

A handler like this could easily be converted to handle multiple site redesigns in the same module or you can stack the handlers so that each new filesystem layout is a different Perl module. Each handling a different set of URI rewrites. You can also map an existing static HTML site into a new dynamic application by building the URI and the HTTP query string with $r->args.

I/O Filtering

One of mod_perl's most interesting features is I/O filtering. Filtering can be used to modify static files on the fly or even the output of another program. If for example you would like to automatically add the last modified date and time to the bottom of all of the pages in a particular directory you would use a filter much like this:

package My::Filter;

        use strict;
        use warnings;

        use base qw(Apache::Filter);
        use APR::Finfo ();              # For file information
        use APR::Table ();              # For $f->$r->headers_out->unset

        use Apache::Const -compile => qw(OK);

        use constant BUFFER => 1024;

        sub handler {
            my $f = shift;              # Our filter object
            my $r = $f->r;              # Our Request object
            my $finfo = $r->finfo;      # Our file info

            # Convert last modified time into a human readable format
            my $time = localtime($finfo->mtime);

            # Unset our Content-Length header since we will be changing
            # the content's length.
            unless( $f->ctx ) {
               $f->r->headers_out->unset('Content-Length');
               $f->ctx(1);
            }

            # Read the file 1024 bytes at a time
            while( $f->read(my $buf, BUFFER) ) {

                 # Replace the closing BODY tag with our last modified
                 # date and time
                 if( $buf =~ /<\/BODY>/i ) {
                    $buf =~ s/<\/BODY>/Last modified: $time<\/BODY>/i;
                 }

                $f->print($buf);
            }

            return( Apache::OK );
        }
        1;
        

To configure this filter for the /content/ directory you would add the following to your httpd.conf:

PerlModule My::Filter
        <Directory /content/>
           SetHandler modperl
           PerlOutputFilterHandler My::Filter
        </Directory>
        

Conclusion

While mod_perl gives you easy access to tweaking how your Apache server behaves, it also proves to be an efficient and scalable platform on which to build large websites and enterprise applications. mod_perl is used for such large traffic sites as www.slashdot.org, www.livejournal.com, and www.ticketmaster.com. I personally have been using mod_perl for years building LAMP (Linux Apache Mod_perl PostgreSQL) applications and have never found myself limited by it.

I hope this introduction to mod_perl 2.0 has piqued your interest. If so I encourage you to visit the mod_perl homepage at http://perl.apache.org. You will find it a wonderful resource with a large collection of online documentation, links to print resources, and the users mailing list that will help you take full advantage of mod_perl.

If you get stuck solving a mod_perl problem please send a question to the mod_perl users mailing list, where answers are often sent back to you in a few short minutes. If you don't mind a slower response time, feel free to E-mail me directly any questions you may have.