Eliminating Outages

Showing 303 Magazine Performance Fixes

Typically clients come to us with either a redesign, re-platform, or HELP! type project in mind. They were in the latter category, with hopes to do a redesign in the future after the performance issues were resolved. Shortly after our initial conversations we took over support and maintenance of the self-hosted (AWS) site and put plans in place to fix the performance issues and migrate them to Pantheon.

Project Goal

This one was pretty simply defined, fix the site. The site had not gone a full day without an outage for weeks greatly affecting traffic and ad revenue.

 

Our Solution

We identified the premium theme as the root cause to the performance issues. Utilizing New Relic we quickly addressed the main issues and brought the site back to life within five hours.

Results

Prior to even beginning our engagement we had their team install New Relic on the AWS servers to begin collecting metrics, which we could use as a baseline later. We could tell nearly immediately the issues were within the theme, and gaining access to the code and content confirmed this. If you’ve never used a tool like New Relic, it essentially slices up your application among areas like database, application code, external requests and others. Here are two of the graphs we saw:

Metrics from New Relic for 303 Magazine Database Performance
transaction times for 303 magazine performance from New Relic
Gaining Access and More Metrics

Now the fun begins, we get the keys and of course the responsibility to make some adjustments. Looking back through New Relic we could see an extremely large amounts of queries per page load as in over 1,000. I’ve been building Drupal sites for a while, and that number even shocked me! So our next step was to get the site running in a local environment to ensure our changes could be validated before we made any on production. Did I mention we only had a production server to work with, which is why our next post you’ll see our steps to migrate to Pantheon.

We identified one of the main culprits to the database queries was a call to get_pages, which for basic sites this wouldn’t be an issue, but this site has over 1,000 ‘pages’ and over 100,000 posts. Later in the code it loops through the $pages and does a query for each page (hence the 1,000+ queries/page). The happy graphs after looked like this:

303 Magazine Database Improvements
303 Magazine New Relic Database Graph After Performance Improvements
303 Magazine Newrelic Transactions After Performance Fix
Still Not as Performant as We Would Like

This was a huge improvement, and amazing things happen when your site becomes faster (and stable)… if you build(or fix) it they will come. Traffic continued to increase after our fix and then of course other issues were also identified. The theme’s use of a mega menu with “latest” and “random” functionality likely worked great for sites with hundreds of posts, our post table had over 100,000 entries though. Ndevr added some transient cache calls around the menus to alleviate the unnecessary calls on every page load. This change brought database queries from 2-300/page down to around 70.