Caching FAQs - F5 DevCentral

Caching FAQs
Dawn Parzych, 2014-01-05
One of the most mysterious parts of the BIG-IP Application Acceleration Manager (AAM) is caching. Rarely is it explained,
and there are very few documents that describe why you would or would not use one of the BIG-IP's caching facilities.
Even harder to find is some kind of description of what numbers you should use, or whether or not to push some
specific caching button when trying configure your AAM policies or applications. So here's an overview of a select few
bits of frequently asked AAM caching questions, and some explanation of why you would or would not do something
with those pretty buttons and number fields.
To be clear, AAM does not use fast Cache, it has two entirely separate and distinct caching systems of its own: Metastor
and the Small Object Cache. In this posting, however, we'll be talking about them, mostly, as if they are one in the same.
The 4 most commonly asked questions we get regarding caching are as follows:
· Why is there an option to turn off cache on first hit, and why would I ever enable this?
· What does Queue Parallel Requests do?
· Why would I ever set the maximum object size to anything less than infinity?
· OK, a maximum object size makes sense, but what about the minimum object size?
Each question is addressed using an analogy of putting marbles into a mason jar. We are, of course, talking about web
objects and bytes of data, not marbles and weight.
1) "Why is there an option to turn off cache on first hit, and why would I ever do so?"
OK, well, let's start with a simple mental model of a cache. Imagine your website as just a bunch of marbles. To keep it
simple, all your marbles are the same size. Now think of a cache as being like a Mason jar. Imagine if the Mason jar is just
big enough to hold exactly one marble. You can think of the BIG-IP as a super-fast copying machine that can copy
marbles, and store one copy of one marble.
Finally, imagine a single user sending requests for marbles to your website through the BIG-IP, where every policy node
has "Cache marbles on first hit" turned on, and every marble is cacheable, and cached if requested. Pretty simple, right?
If you have "Cache marble on first hit" turned on, then the very first request your user
makes for a marble will cause the BIG-IP to turn around, get that marble from the
website, copy it, put that copy into the Mason jar, and then hand the original marble to
your user. At this point, the Mason jar is full.
If the next request your user makes is for a different marble, then the first marble must be removed from the jar in order to
make room for the one just requested.
Sadly, the effort and time it took to copy and put the first marble into the Mason jar was entirely wasted, and the user got
both of his marbles later, and slower than he would have if the BIG-IP had simply taken them from the website, and
handed them to your user.
If the third request the customer makes is for the first marble, then again the Mason jar has to be emptied and the first
marble cached (remember only a single marble can be cached at any time). The BIG-IP is churning away, copying then
putting a marble into the Mason jar, then emptying out the Mason jar, but never actually getting any value out of having
handed them to your user.
If the third request the customer makes is for the first marble, then again the Mason jar has to be emptied and the first
marble cached (remember only a single marble can be cached at any time). The BIG-IP is churning away, copying then
putting a marble into the Mason jar, then emptying out the Mason jar, but never actually getting any value out of having
that Mason jar.
If the user keeps switching back and forth between requesting the first marble and the second marble, the jar will never
have the marble being requested, and the load on the back end servers has not been reduced. This is considered a zero
cache scenario where the benefits of the cache are moot.
But imagine if "Cache marble on first hit" is turned off. Now the same marble has to be requested twice before the BIG-IP
will copy it and put the copy in the Mason jar.
So now, with the first request the BIG-IP does nothing but pass it along. However, the BIG-IP
remembers that the blue marble was requested once. The second request also does nothing but
pass the marble along, but again, the BIG-IP remembers that, say, a red marble was requested
once. At this point, if the user goes back and asks for the blue marble again, it has been
requested twice, so it will be copied and stored in the Mason jar.
If the user then asks for a green marble, the BIG-IP remembers that the request was made, but
does not discard the marble in the jar, as this is the first request. If the user requests the blue
marble again, then the user will get a copy of that from the Mason jar, not from your website. You now have an effective
cache where 1 in 5 requests have been offloaded from the origin server.
In summary, turn off "Cache object on first hit" for policy nodes where the objects either change very quickly, or where
the time between requests is relatively long. This will prevent the cache from discarding an object that your users will
hopefully be requesting more often, and more frequently.
Obviously, the flip side of that coin is that the BIG-IP will have to get the same object from your website twice, so if you
are sure that the objects matched by a particular policy node are really popular, and that they will be requested quite
frequently, (such as the company logo and navigation buttons) then copy 'em and dump them in the cache the first time
they are requested.
2) What is "Queue Parallel Requests" and why would I turn it on?
Queuing parallel requests is interesting, as it interacts with caching, but it really only helps when you have a lot of users
trying to get the same marble at the same time, and that marble is being cached for the first time.
A cache is kind of stupid, and it doesn't remember the marbles it threw away. As a result, any marble being put into it
looks like it is being stored "for the first time", even when it is actually being put into the jar for the hundredth time.
"Queue Parallel Requests" basically makes all the users who are requesting the same marble wait for it to be fetched off
of your website, and then copied once for each user by the BIG-IP.
That doesn't sound too interesting or useful until you realize that if you don't turn this on, then between the time you
start the process of requesting that marble from your website and finish putting it into the jar, every other request for that
same marble will have to be forwarded to your website. Image a scenario where a server takes 2 ms to respond to a
request for an object. Every ms 2 new users request the object. In the time it has taken the server to respond to the first
request 3 additional requests would have been sent for the server to process.
request 3 additional requests would have been sent for the server to process.
This has created unnecessary demand on the servers. With queuing turned on all subsequent requests for the object will
be placed into a parking area to wait for the original response to be returned and cached.
Four requests doesn’t sound like it will cause a server to be overloaded, but what if it isn’t 4 but 400 requests. Suddenly,
queuing sounds like a better idea, right? It is, but like any other feature, it is not a panacea. Turn it on for new, shareable,
highly popular objects that remain the same for a relatively long time.
More to the point, however, if the web server that is giving one marble to the BIG-IP to copy and give to a bunch of users
hiccups (say, you decide to take down one of the web servers in your pool, or as luck would have it, one of them fails in
the middle of hand over that marble), all of those users will get part of a marble, and that is all. You are trading less pool
traffic for what our engineers like to call a "single point of failure" risk.
But if you have a really rare and valuable marble that everyone wants a copy of, all at the same time, and your website
pool is pretty stable and handing out marbles pretty efficiently, then request queuing will really reduce the traffic on your
web servers!
3) There is an option to set the minimum and maximum cacheable object size. Why would I ever set the
maximum object size to anything less than infinity?
Yeah, that's a tough one. First, go read the answer to "Why turn off Cache content on first hit". Then, let's
imagine a Mason jar where instead of one marble, we have a jar big enough to store one thousand marbles.
In this scenario, however, we are going to assume exactly 16 simultaneous users, and also that the marbles they are
requesting are in the jar. Obviously, the web servers in your pool are getting zero requests. Cool, right!? When caching is
working, it can be really handy!
But now let us change one assumption: let's allow your web site objects to vary in size. We still have 16 users,
but there is one marble that is twice the diameter as the marbles in our first example. When this marble is
cached it reduces the total number of marbles that can be cached. Only 13 of the original 16 requests can be
served from the jar, the other 3 requests have to go to the server pool.
served from the jar, the other 3 requests have to go to the server pool.
If every marble in the cache is twice the diameter of the marbles in our first example, twelve of the 16 requests being
made have to go to your pool.
At the extreme, if one object completely fills the Mason jar, that marble (well, bowling ball, really!) is the only object that
can be served from cache; the other 15 requests have to go to your pool.
So you limit the maximum size of the marbles that can be stored in your Mason jar to configure the BIG-IP to serve the
average number of simultaneous users you expect, and wish to serve. By the emergent properties of the system, it turns
out that large objects are often times not that popular, anyway. Unless you are running a web server whose job it is to
serve large patch files to end users, that is.
4) OK, a maximum object size makes sense. So why have a minimum object size?
OK, now we have to get explicit about the jar, and about knowing what has been requested, copied, and stored in the
jar. Assume that we have a peg board that has exactly one thousand holes in it. Each time we dump a marble in the jar,
we write out a tag that describes the marble, tie it to a peg, then put that peg into the peg board.
When we remove a marble from the jar, we remove its associated peg from the board. When the peg board is full, we
can't store any more marbles in the Mason jar.
Now, what if your minimum size is that of a grain of sand, but your mason jar is big enough to fit 100 marbles with a
diameter of 2 inches? If what is popular, and requested quite frequently is a bunch of grains of sand, you can end up
running out of peg board space long, LONG before you even finish coating the bottom of your Mason jar with sand.
Giving your customers copies of those grains of sand will happen often, but will by definition be a smaller percentage of
the total volume of traffic than if you made your minimum size larger, AND if you still have enough marbles of that
minimum size on your website to fill your cache.
Another way of looking at it is in terms of a collection of marbles of all sizes. If a large marble is in cache, and it has to be
displaced to make room on the peg board for a tag that records the information for a grain of sand, and then the grain of
sand has to be displaced to make room for the large marble, you will have to get both off of your origin servers.
If you don't try to cache the sand grain, then when a user asks for the larger marble, the total weight of marbles
sand has to be displaced to make room for the large marble, you will have to get both off of your origin servers.
If you don't try to cache the sand grain, then when a user asks for the larger marble, the total weight of marbles
requested from your server is going to be smaller. Even if that grain of sand has to be served from your server several
times in order to keep the larger marble in the jar, that will be a lot less total grams of marbles moved, copied and stored
or retrieved from the jar.
Obviously, there is a trade off here between the number of requests, versus the total weight of the marbles being
requested.
Putting it all together
Knowing when and what to cache is an important step to ensure that BIG-IP and your application is performing
optimally. Setting a parameter with a wrong value can have negative effects causing increased traffic on your origin
servers and consuming resources unnecessarily on BIG-IP. Think about what you are trying to achieve, what other
optimization features are enabled and the traffic patterns of your site when configuring the cache settings.
Thank you to my colleague John Stevens for assistance in writing this article.
F5 Networks, Inc. | 401 Elliot Avenue West, Seattle, WA 98119 | 888-882-4447 | f5.com
F5 Networks, Inc.
Corporate Headquarters
[email protected]
F5 Networks
Asia-Pacific
[email protected]
F5 Networks Ltd.
Europe/Middle-East/Africa
[email protected]
F5 Networks
Japan K.K.
[email protected]
©2017 F5 Networks, Inc. All rights reserved. F5, F5 Networks, and the F5 logo are trademarks of F5 Networks, Inc. in the U.S. and in certain other countries. Other F5
trademarks are identified at f5.com. Any other products, services, or company names referenced herein may be trademarks of their respective owners with no
endorsement or affiliation, express or implied, claimed by F5. CS04-00015 0113