What's on Your "Sorry" Server?
There comes a time when we need to redirect everything to a single page on an apache-driven site. There could be several reasons for doing this. Data Center Migration. Retiring a site. User notification of site maintenance by way of a "sorry" server.
A "sorry" server is an industry term, which is a web site meant to convey temporary outage notifications to our users. There are some guidelines we should follow when setting up a sorry server. It isn't necessarily something as mundane as uploading an html page that says that we are going to be down for some period of time because our sites are constantly getting accessed by things other than humans. And we don't want these things to penalize us simply because we're going to be offline for a few hours.
This post assumes that you've already got Apache up and running. It's natural for us as busy administrators to not want to go overboard with setting up a web server that may not be accessed all that frequently (hopefully!) but we really should treat our sorry servers just like any other production server. We certainly wouldn't want out sorry servers to get hacked or defaced. We wouldn't want one of our maintenance pages to show up in Google's listings.
The HTML below is a no-frills page meant to be used as an example and not for production use. We want our designers to create an actual, branded HTML page that looks just as nice as any of the other pages on our sites.
<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/> <title>Site Maintenance </head> <body> <p>We're sorry. We're down.</p> </body> </html>
There are a few things missing from this page that we should probably add. We probably shouldn't cache this page because we don't necessarily have any control over the third-party device that is caching our maintenance page. If we know how long we're going to be offline, we could probably set an 'expires' header or meta tag. If the site is really busy though, allowing caching while setting an expiration is a good idea. We don't want robots indexing the page because it shouldn't show up in search engines. Below is the updated HTML from above, sprinkling in some meta tags to prevent caching and indexing by robots:
<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/> <meta http-equiv="Cache-Control" content="no-cache"/> <meta http-equiv="Pragma" content="no-cache"/> <meta name="robots" content="noindex,nofollow"/> <meta name="googlebot" content="noarchive"/> <title>Site Maintenance</title> </head> <body> <p>We're sorry. We're down.</p> </body> </html>
I decided that I didn't want to cache any requests because I want to see every page request show up in the web server logs. Now that we've got our page ready, save it as index.html and upload it to the sorry server's document root. The next thing that should be done is to upload a robots.txt file into the site's document root as well. The robots.txt file will only include the following lines:
User-agent: * Disallow: /
The usual caveats apply with this file: It will only be honored by robots that actually conform to the robots.txt standard.
Now we're ready to modify our apache configuration. In our example site, we want ANY POSSIBLE page request to get redirected to our site down index.html page. The mod_rewrite rule is fairly simple:
# Turn on mod_rewrite # if not already on RewriteEngine On # Don't rewrite image requests # or stylesheet requests # Add js if necessary RewriteRule \.(css|jpe?g|gif|png)$ - [L] # Rewrite everything RewriteCond %{REQUEST_URI} !^/index.html RewriteRule /(.*) /index.html [R=302,L]
This set of rewrite rules is pretty self-explanatory but if you are new to mod_rewrite, here is a walk-through. RewriteEngine On
turns on mod_rewrite. RewriteRule \.(.css|jpe?g|gif|png)$ - [L]
tells mod_rewrite to not do anything to requests for images of stylesheets and don't process any more rules for this type of request. RewriteCond %{REQUEST_URI} !^/index.html
means "if the request is not for /index.html". RewriteRule /(.*) /index.html [R=302,L]
means, "issue a 302 redirect and send the browser to /index.html and don't process any more rules after this one.
I like compressing at the apache level and even though we are working with one html page that will probably be relatively small, I'm going to compress it anyway. Here are the settings I use as a starting point, (requires that mod_deflate be installed):
## HTTP compression on html, js and css files DeflateBufferSize 8096 DeflateCompressionLevel 4 DeflateFilterNote Input instream DeflateFilterNote Output outstream DeflateFilterNote Ratio ratioSetOutputFilter DEFLATE Header append Vary User-Agent env=!dont-vary
We aren't finished yet. We should add some customized logging to apache so we can see how many visitors are coming to the site. Normally a combined log format would be ok but I'm also interested in seeing the Host header from the browser so I can see which site they were trying to get to (if multiple sites were going to be offline for whatever reason). Also, I'm interested in logging compression rates as well, if later optimization is necessary. Finally, I want to grab the X-Forwarded-For IP if that is being passed instead. I'm using the logging set up from a recent How-To I published:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Host}i\" \"%{Referer}i\" \"%{User-Agent}i\" %{outstream}n/%{instream}n (%{ratio}n%%) " combined LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Host}i\" \"%{Referer}i\" \"%{User-Agent}i\" %{outstream}n/%{instream}n (%{ratio}n%%) " proxy SetEnvIf X-Forwarded-For "^.*\..*\..*\..*" forwarded CustomLog "logs/access_log" combined env=!forwarded CustomLog "logs/access_log" proxy env=forwarded
What's on your sorry server? I'd love to compile everyone's best practices that could be used as an implementation template.