Magento Cache Prefix and Multi-server Configuration
I recently spent some time investigating some strange, seemingly random issues with caches not being cleared when expected. This behaviour could not be replicated locally in my vagrant based development environment. The Magento site in question was running on multiple servers and using Redis as the caching mechanism. Whilst the development environment also used Redis, it was only a single box. Since this is a difference between the two environments, perhaps this was the cause? But why would having multiple servers affect the caching behaviour?
I started out by connecting to Redis from the command line, and checking what keys were in the live environment vs my local development environment.
1 2 3 |
redis-cli -p 6380 select 1 keys * |
After running this command in both environments, the first thing I noticed was that there were a lot more keys in the live environment. Since I’m aware each category and product can generate their own unique layout cache entries, I wasn’t entirely surprised since traffic on my local copy would be much lower. I decided to narrow down my search by looking for a specific entry. When looking for a key from my local install on the live site, I got no results.
1 2 |
127.0.0.1:6380> keys zc:ti:8ce_MAGE (empty list or set) |
Hm. So I knew the zc:ti:
prefix relates to the key entry containing tag information for a cache entry and I knew that MAGE was the cache key, so time for a more generic search.
1 2 3 |
127.0.0.1:6380> keys zc:ti:*_MAGE 1) "zc:ti:56a_MAGE" 2) "zc:ti:513_MAGE" |
That’s interesting. This Redis DB is only used for a single instance of Magento, yet there are two entries, neither of which matches my local entry. Time to dive into the caching code to see where that 3 digit prefix comes from. After a bit of searching, I came across the constructor for Mage_Core_Model_Cache
.
1 2 3 |
if (empty($this->_idPrefix)) { $this->_idPrefix = substr(md5(Mage::getConfig()->getOptions()->getEtcDir()), 0, 3).'_'; } |
The prefix comes from a file system path. It was then that it dawned on me the multi-server setup has a dedicated box for the administration panel and that box used different file paths for the document root! This meant that events in the admin panel for clearing the cache didn’t clear the same cache entries that were used on the frontend of the store! It was only when flushing the whole cache without specifying keys that the cache was really cleared.
Now that I knew what the problem was, finding the solution was fairly simple. You can see from the same constructor, that the file system path is only used if the value isn’t passed in. We can therefore simply add an id_prefix node to our local.xml file and presto-chango, both the admin panel and the frontend are using the same cache path again!
This probably sounds like an unlikely situation for a developer to run into. After all, running different paths for different servers is not exactly standard is it? However, upon discussing the issue with a fellow developer, Winston, I found he was seeing similar cache behaviour despite using the same path on all servers. To help him diagnose his issues I dug a little further to see how the etc dir path is generated. This led to Mage_Core_Model_Config_Options
, which shows it’s a hard-coded path relative to a root path and that root is calculated using dirname($appRoot)
. Aha. The use of dirname suggests a real
filesystem path. A common practice amongst agencies is to use Capistrano style deployments. Which means the Apache / nginx document root is a symlink pointing to a directory of a specific build. The path of that directory is generally kept unique by use of either the deployment time or some kind of commit hash. After confirming with Winston that they were using this approach, I knew we had the answer.
With an elastic scaling system, additional servers are brought up as needed for current / expected load levels. When these new servers are provisioned, the filesystem path of the code will be different to the servers already online. This means that any events designed to clear cache, will only clear the cache on other servers brought up at the same time. Leaving older / newer servers with stale copies of the cache.
So long story short, if you run multiple servers and there’s any chance of the etc dir having a different path on different servers, manually specify an id_prefix!