Speed up page loads using Cache PHP

At work we pull ranks for how key-phrases ranked on certain search engines once a week. We have many clients and each client has multiple key-phrases and each key-phrase has ranks for multiple search engines. This information is stored in the database. This information adds up more and more over time. So the longer a client is with us the more data that is stored in the database. All this information can become cumbersome when trying to calculate and display each time a web page is loaded. This is taxing on our web server to calculate this information and is also taxing on our database when all this information is pulled. The solution to this problem is caching. There are two types of caching we use at work. Memcache and File caching.

Memcache is where the cache is stored on the web servers RAM. This can be good because of the speed of retrieval of the cache. The downfalls of using memcache is that if you restart the web server the information stored in RAM is lost. Also the amount of information you can cache is limited to the size or your RAM on your web server. The RAM limitation can be overcome by having a dedicated memcache server.

File Caching is where the information is stored in a file on the web server. This will be slower than memcache because the information is pulled from the hard drive of the web server instead of being pulled from the web servers RAM. When the web server is restarted the cached information is not lost because it is stored on the hard drive. The limitation of the size of all the caching is limited to the size of the hard drive and not the size of RAM on the web server. Now with solid state hard drives coming out and getting cheaper all the time you can speed up the loading of file caches probably close to the speed of memcache.

My first example will be memcaching. In order to do memcaching on your web server you will have to make sure you have memcache installed. I am running Ubuntu with LAMP, so the command line call to install memcache is ‘sudo apt-get install memcache’. You can download my example file memcache_example.php.tar so that you can follow along easier.

I will not put all the code together for the memcaching example first and then break it down:


* @date 2011-09-30
* @brief example of memcaching
*/
echo "
    <html>
        <head>
            <title>MemCache Example</title>
        </head>
        <body>
            <h1>MemCache Example</h1>
";
if( isset($_GET['client_id']) && is_numeric($_GET['client_id'])) {
    $client_id = $_GET['client_id'];
} else {
    $client_id = rand(1,20);
}

define('RANK_CACHE_DATE_STRING', 'next Saturday');

$memcache_obj = memcache_connect('localhost');
$cache_till = strtotime(RANK_CACHE_DATE_STRING);
$cache_name = 'rank_table_html_'.$client_id;
if( ! $html = memcache_get($memcache_obj, $cache_name)) {
    $html = get_rank_table_html($client_id);
    memcache_set($memcache_obj, $cache_name, $html, 0, $cache_till);
    echo "<p>";
    echo "There was NOT a cache of the HTML so new 
"; echo "HTML was created and cached for next time
"; echo "this client is loaded."; echo "</p>"; } else { echo "<p>There was a cache of the HTML</p>"; } echo $html; function get_rank_table_html($client_id) { return " <p> Here is where the table of ranks
would be for client# $client_id </p> "; } ?>

So in this example the first thing I do is define the client_id. I have made it so you can use $_GET data or just let a random client id be created:

echo "
    <html>
        <head>
            <title>MemCache Example</title>
        </head>
        <body>
            <h1>MemCache Example</h1>
";
if( isset($_GET['client_id']) && is_numeric($_GET['client_id'])) {
    $client_id = $_GET['client_id'];
} else {
    $client_id = rand(1,20);
}

This next line of code should be in your settings file where you would define constants and system settings. the RANK_CACHE_DATE_STRING constant will define the string that phps strtotime will use to get the end date of the cache as a timestamp. Since we pull ranks on Friday phps strtotime() function will use the string ‘next Saturday’. This way the new cache will start Saturday morning at 12am. By defining this once in your settings this can be changed in one place and will change throughout your system:

define('RANK_CACHE_DATE_STRING', 'next Saturday');

Next we will instantiate the memcache(http://us.php.net/manual/en/book.memcache.php):

$memcache_obj = memcache_connect('localhost');

Here we will use the RANK_CACHE_DATE_STRING constant to create the $cache_till variable as a time stamp using phps strtotime function:

$cache_till = strtotime(RANK_CACHE_DATE_STRING);

Now we will define the $cache_name variable making sure it is distinct to the cache it will be referring to:

$cache_name = 'rank_table_html_'.$client_id;

Here is the conditional where we check to see if there is already a cache of the html we are looking for. If there is then use that cached HTML otherwise create the HTML and cache it for next time:

if( ! $html = memcache_get($memcache_obj, $cache_name)) {
    $html = get_rank_table_html($client_id);
    memcache_set($memcache_obj, $cache_name, $html, 0, $cache_till);
    echo "<p>";
    echo "There was NOT a cache of the HTML so new 
"; echo "HTML was created and cached for next time
"; echo "this client is loaded."; echo "</p>"; } else { echo "<p>There was a cache of the HTML</p>"; }

Now just echo out the HTML:

echo $html;

Here is the function that creates the HTML. This is just generic for this example.

function get_rank_table_html($client_id) {
    // pull ranks from database
    // calculate ranks
    // create html from calculated ranks
    // return html
    return "
                <p>
                    Here is where the table of ranks
would be for client# $client_id </p> "; }

My second example will be file caching. In order to do file caching you will have to make sure that the web server user has permissions to read and write to the folder that will contain the file caches. You can download my example file_cache_example.tar so that you can follow along easier.

I will not put all the code together for the file caching example first and then break it down:

echo "
    <html>
        <head>
            <title>File Cache Example</title>
        </head>
        <body>
            <h1>File Cache Example</h1>
";
if( isset($_GET['client_id']) && is_numeric($_GET['client_id'])) {
    $client_id = $_GET['client_id'];
} else {
    $client_id = rand(1,20);
}
define('RANK_CACHE_DIRECTORY', 'cache/');
define('RANK_CACHE_DATE_STRING', 'next Saturday');
$cache_location_and_name =
    RANK_CACHE_DIRECTORY.
    date('Y-m-d', strtotime(RANK_CACHE_DATE_STRING)).
    '_'.
    'rank_table_html_'.
    $client_id;
if(file_exists($cache_location_and_name)) {
    $html = file_get_contents($cache_location_and_name);
    echo "<p>There was a cache of the HTML</p>";
} else {
    $html = get_rank_table_html($client_id);
    file_put_contents(
       $cache_location_and_name,
       $html
    );
    echo "<p>";
    echo "There was NOT a cache of the HTML so new 
"; echo "HTML was created and cached for next time
"; echo "this client is loaded."; echo "</p>"; } echo $html; function get_rank_table_html($client_id) { return " <p> Here is where the table of ranks
would be for client# $client_id </p> "; }

So again in this example the first thing I do is define the client_id. I have made it so you can use $_GET data or just let a random client id be created:

echo "
    <html>
        <head>
            <title>File Cache Example</title>
        </head>
        <body>
            <h1>File Cache Example</h1>
";
if( isset($_GET['client_id']) && is_numeric($_GET['client_id'])) {
    $client_id = $_GET['client_id'];
} else {
    $client_id = rand(1,20);
}

Since we are caching to a folder on the web server we will define this location as a constant that should be defined where you define constants and system settings.

define('RANK_CACHE_DIRECTORY', 'cache/');

Again This next line of code should be in your settings file where you would define constants and system settings. the RANK_CACHE_DATE_STRING constant will define the string that phps strtotime will use to get the end date of the cache as a timestamp. Since we pull ranks on Friday phps strtotime() function will use the string ‘next Saturday’. This way the new cache will start Saturday morning at 12am. By defining this once in your settings this can be changed in one place and will change throughout your system:

define('RANK_CACHE_DATE_STRING', 'next Saturday');

Now we will create a variable to hold a string defining the location and name of the cache. You will notice that I start out the name of the cache with the date. The date is defining when the cache will expire. And again by putting the client id in the name of the cache we are making the name unique.

$cache_location_and_name =
    RANK_CACHE_DIRECTORY.
    date('Y-m-d', strtotime(RANK_CACHE_DATE_STRING)).
    '_'.
    'rank_table_html_'.
    $client_id;

Here we use a conditional to check for a cache of the HTML. If there is a file with the same name and location of the file we just defined get the HTML contents of the file. Otherwise create the HTML and save in a file as a cache for next time. The ingenious part is that once the now timestamp is greater than the timestamp of the cached file a new cached file will be created.

if(file_exists($cache_location_and_name)) {
    $html = file_get_contents($cache_location_and_name);
    echo "<p>There was a cache of the HTML</p>";
} else {
    $html = get_rank_table_html($client_id);
    file_put_contents(
       $cache_location_and_name,
       $html
    );
    echo "<p>";
    echo "There was NOT a cache of the HTML so new 
"; echo "HTML was created and cached for next time
"; echo "this client is loaded."; echo "</p>"; } echo $html;

Here is the function that is called from the first foreach to remove invalid files:

function remove_invalid_files($file) {
    return ! preg_match(
        '/\.|\.\.|\.svn/',
        $file
    );
}

One of the problems with file caching is that old cache files will build up on your web server over time. In order to resolve this problem I have created a script that will remove old file caches from the web server. This script will be run once a week from the web servers crontab. You can download my example file_cache_example.tar so that you can follow along easier.

The first thing we want to do is define all the locations of the file caches that we want cleared out:

$cache_locations = array(
    'cache_folder' => '/cache/',
    //'rank_table'   => '/cache/rank_charts/',
    //'rank_chart'   => '/cache/rank_tables/'
);

Now we will create an array of files for each location:

$cached_file_lists = NULL;
foreach($cache_locations as $cache_name => $cache_location) {
    $files = scandir($cache_location);
    $files = array_filter($files, 'remove_invalid_files');
    $cached_file_lists[$cache_name]['location'] = $cache_location;
    $cached_file_lists[$cache_name]['files']    = $files;
}

Next we will do a foreach on each location and then do a foreach on each file in that location. We will check each file to see if the data at the front of the file name is before now. If it is then delete the file.

foreach($cached_file_lists as $cached_file_list_name => $data) {
    $location = $data['location'];
    foreach($data['files'] as $file_name) {
        preg_match(
            '(^[0-9]{4}-[0-9]{2}-[0-9]{2})',
            $file_name, $matches
        );
        if( ! empty($matches[0]) ) {
            $cache_end_date = strtotime($matches[0]);
        } else {
            $cache_end_date = FALSE;
        }
        if($cache_end_date && $cache_end_date < strtotime('now')) {
            unlink($location.$file_name);
        }
    }
}

I will not put all the code together for the file cache clearing example:

$cache_locations = array(
    'cache_folder' => '/cache/',
    //'rank_table'   => '/cache/rank_charts/',
    //'rank_chart'   => '/cache/rank_tables/'
);
$cached_file_lists = NULL;
foreach($cache_locations as $cache_name => $cache_location) {
    $files = scandir($cache_location);
    $files = array_filter($files, 'remove_invalid_files');
    $cached_file_lists[$cache_name]['location'] = $cache_location;
    $cached_file_lists[$cache_name]['files']    = $files;
}
foreach($cached_file_lists as $cached_file_list_name => $data) {
    $location = $data['location'];
    foreach($data['files'] as $file_name) {
        preg_match(
            '(^[0-9]{4}-[0-9]{2}-[0-9]{2})',
            $file_name, $matches
        );
        if( ! empty($matches[0]) ) {
            $cache_end_date = strtotime($matches[0]);
        } else {
            $cache_end_date = FALSE;
        }
        if($cache_end_date && $cache_end_date < strtotime('now')) {
            unlink($location.$file_name);
        }
    }
}
function remove_invalid_files($file) {
    return ! preg_match(
        '/\.|\.\.|\.svn/',
        $file
    );
}

So now on your web server create a crontab that will clear out all the old file caches every Saturday Morning at 6am. The crontab file is usually located in the /etc folder. So command like if you have vim installed, sudo vim /etc/crontab. Add the following lines:

# every Saturday Morning at 4am clear out all the old file caches.
0 4 * * 6 www-data cd /var/www/file_cache_example && php file_cache_clearing_example.php

Make sure that the root user has permissions to read and write the cache folders.

This concludes todays lesson on caching. Hope this has helped someone out there. Please leave a comment if it has, or if you know a better way.

Share and Enjoy:
  • Print
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Yahoo! Buzz
  • Twitter
  • Google Bookmarks
  • Google Buzz
  • LinkedIn
  • MySpace
  • RSS

You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

 
Stop SOPA!

SOPA breaks our internet freedom!
Any site can be shut down whether or not we've done anything wrong.

Stop SOPA!