Monday, September 8, 2008

Some useful things you can do with mod_rewrite

On my current project we are implementing a strangler application to replace a legacy mod_perl application with ruby on rails. For our first release, we are not replacing all of the functionality. In order to get our application to seamlessly interact with the vintage code base, we are making heavy use of mod_rewrite. So I figured I would share what I would consider some of the more interesting rewrites that we have.

Proxy all non-public content to another server, for instance an HAProxy.

This is a very common rule for Rails applications that use the Apache, HAProxy, Mongrel stack. Here the goal is for Apache to serve all static content, i.e. everything under /public in our rails app, because it will be much more efficient.

Here our RewriteCond will evaluate to true if the requested resource does not exist on the file system. We will then proxy everything to our HAProxy instance, here running on port 4000.

DocumentRoot /path/to/rails-app/public

RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f
RewriteRule ^.*$ http://localhost:4000%{REQUEST_URI} [P,QSA,L]

Put the protocol into the environment variable based upon the header set by a load balancer

Did you ever get the "This page contains secure and insecure items" warning? Yeah it's an annoyance. To make matters even worse, you have a butt-load of legacy webapps that now need to point to assets from your site but are too brittle and cumbersome to change directly. To make matters double worse, these legacy webapps serve up secure and insecure content (http & https). To make matters triple worse, all of these applications are behind a hardware load balancer that manages the SSL negotiation so all traffic behind it is requested as http.

We thought this problem was unsolvable, but found out that our load balancer could set a header whether an SSL negotiation had happened or not. So we put something like this in our vintage apache configuration:

# set the 'protocol' environment variable to default to http
RewriteRule .* - [E=protocol:http]

# if the header exists with the value active, change the protocol variable to https
RewriteCond %{HTTP:X-SSL-State-MTG} ^active$ [NC]
RewriteRule .* - [E=protocol:https]

RewriteRule ^/images/.* %{ENV:protocol}://newapp.com/%{REQUEST_URI}

Transform part of a URL from upper to lowercase

Linux can be a tricky beast. We do all of our development on Macs, so you'd think you'd have your bases covered with operating system incompatibilities. And you'd be wrong. OS X does not use case sensitive path names, so if you are running an Apache locally and you have an asset named foo.gif and you try to access it through http://localhost/FOO.gif it will work fine. However, once you deploy to your sever, in our case running Enterprise SUSE, you will get a 404 - Not Found.

There are multiple ways to solve this problem, one you could install mod_speling which will make all of your urls case insensitive. We chose to go another route, mostly because we didn't want to install yet another apache module and we had a pretty simple case. All of our legacy URLs were in upper case and all our resources were in lowercase.

#define a function 'lowercase' that is an alias of the internal tolower function
RewriteMap lowercase int:tolower

#rewrite the image names for everything in the teams folder to lowercase and redirect to the new application
# i.e. /images/teams/ATL.gif => http://www.newapp.com/images/atl.gif
RewriteRule ^/images/teams/(.*).gif http://www.newapp.com/images/${lowercase:$1}.gif

Show a maintenance page if it exists

Capistrano provides some handy tasks to enable and disable your web site. It does this by creating a maintenance.html file in public/system/. In order for your webapp to respect that file you need a rewrite rule like the following:

RewriteCond %{DOCUMENT_ROOT}/system/maintenance.html -f
RewriteRule ^.*$ /system/maintenance.html [L]

Show a default image if the requested image does not exist

Here is a crazy one. Our legacy site is going to request images based upon some key in the database. This information in the database is fairly volatile, so the likelihood that there are missing images is high. This is unacceptable, so we need to show some sort of default image if the specific one is not available. Since the legacy site does not have access to the filesystem where the images live, there is no way for it to know if that image exists before it writes the image tag.

Here we can move this logic into the apache configuration of our new application. Here if the requested logo file we are requesting does not exist we will serve up mlb.jpg instead.
RewriteCond %{REQUEST_URI} ^/images/logos/.*
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f
RewriteRule ^/.*$ /images/logos/mlb.jpg [R,QSA,L]

1 comment:

Unknown said...
This comment has been removed by the author.