Improving user experience during a server outage

There is some irony in the fact that developers spend an inordinate amount of time polishing the guts of the content into an appealing look and feel, while most error messages display the cryptic, confusing details to the user.

While often cast as a passive, yet pernicious deviation from intended results, error can also signal a potential for a strategy of misdirection, one that invokes a logic of control to create an opening for variance, play, and unintended outcomes. -Mark Nunes, Error: glitch, noise, and jam in new media cultures

While we work hard everyday to prevent and mitigate consumer facing outages, there are times when a portion of our user base experiences a problem. For web properties, we can provide an improved experience to our users by serving something meaningful during one of these events, rather than generic and mostly meaningless error messages. There is some irony in the fact that developers spend an inordinate amount of time polishing the guts of the content (underlying HTML, XML, JSON, CSS, etc ...) into an appealing look and feel, while most error messages display the cryptic, confusing details to the user.

A practice we have been using for some time on our high profile sites is something we call "Riviera". (Humorously named after William Gibson's sociopathic character Peter Riviera in Neuromancer, who can project holographic images using cybernetic implants.) The basic idea is to create a snapshot of content, originally created in some dynamic way, store it on a separate hosted environment, and serve this to the user as a replacement when an error is encountered. The AOL.com portal operates this way today, and is ideally suited for this approach, as nearly every link on the page takes the user to content served from other web properties. In cases where the content is dependent on the affected environment, choices need to be made on creating static snapshots or remove that feature from the Riviera snapshot. Applying this approach to the hundreds of Patch sites, like Leesburg Patch, which has a handful of updates each day, you could create a branded version of the most recent days content. Compared to the site operating with out errors, the Riviera site isn't as feature rich, but it is a marked improvement over a standard error page. In Patch's case, we may not have comments for articles, for example, but we could certainly serve the entire article on a branded site with advertising and links to other content. I'd also argue for a banner message that indicates our staff is aware of the issue and working on restoring the full functionality of the site.

From sampling real users feedback about errors they encounter in our systems, it is clear that a little improvement could make a big impact. For example, the feedback I quote below from a recent issue affecting a small percentage of  people illustrates the user likely navigated away to another site, in search of whatever content they had been interested in:

When I click on the icon, its takes a few moments, the screen turns white and then a message that says the system is maxed and to try later.

Using standard web traffic reporting, a site's most popular content could be targeted for a Riviera approach, providing a more robust, branded experience for a user rather than an unfulfilling, machine produced message, while also letting them know we have been alerted to the reduced functionality and will restore the full service in short order.