Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

What is Cache / How Cache Works

Expand

We're talking about the HTTP Response Cache - there are others

We mean "rendered Magnolia output" when we talk about what is being cached. 

Magnolia's default cache is based on Ehcache.

Expand
titleOptional: How Ehcache Works

Ehcache is embedded.

Cache can be organized into tiers, based on storage type:

  • heap
    • non-serialized objects on JVM heap.  this is the fastest, since we don't have (un)serialize overhead, but can be affected by garbage collection, and size of this tier is dependent upon what is available in JVM.  can limit by size or by number of entries. ← smallest tier
  • off heap
    • serialized objects on JVM heap.  can limit by overall size, but not by number of entries.
  • disk
    • serialized objects stored on file system.  slowest option, but can be sped up if hosted on a server dedicated for this purpose. ← largest tier

You specify the available tiers (if you choose to use N > 1 tier, you must specify a heap tier as well), but Ehcache decides which tiers to use during normal operations.  One way it does this is based on frequency of access of item in cache - obviously, something accessed very often ("hot") should go into the heap tier, and something not accessed very often ("cold") should go to the disk tier. 

Cache Persistence

We may not want to incur the performance penalty associated with re-warming the cache on magnolia start up ... but how can we persist the cache?

We cannot persist at JVM level, so all we have to work with is disk level.  But is there a need for cache persistence? When might we need cache persistence?

  • when we are not developing, we don't need cache at all, so that's out.
  • when we are in prod, we might need cache:
    • on restart - but when do we restart?
      • on update
        • well, code/cfg/tpls changed for an update, so magnolia forces cache flush anyway on update, so this isn't needed
      • on crash
        • well, persistence happens as part of shutdown sequence ... and we can't guarantee shutdown sequence fully happened, so this isn't needed

Configuring Cache Persistence

/modules/cache/config/cacheFactory/delegateFactories/ehcache3/caches/<cacheName>/resourcePoolsBuilder/pools/<diskTierName>="true"


magnolia.cache.startdir=${magnolia.home}/cache <-- obviously, you may override this to put the cache someplace else ... you can override it in magnolia.properties, or in configuration: /modules/cache/config/cacheFactory/delegateFactories/cacheFactory@diskStorePath <-- make it an absolute path ... if you make it a relative path, it is treated as relative to magnolia.cache.startdir=${magnolia.home}/cache. <-- if we want to use a remote service (we'll see how to do this later), we need to use a diff tool.

/server/filters/cache

/modules/cache/config/contentCaching


  • what to cache
  • when to flush the cache (commands: flushAllflushByUUIDflushNamedCache)
  • what headers to pass to browsers

/modules/cache/config/contentCaching/defaultPageCache/cachePolicy

  • Request hits → browser cache policy:
    • not modified: HTTP 304
    • modified or does not exist in cache: → server cache policy

      Expand
      titleserver cache policy

      "Should we cache this or not?" ← voters

      Default is: if content does not exist in cache, then cache it.  You could change this default to "never" cache.

      By default, all content at Publics except /.magnolia

      By default, /.resources on Authors are cached ← really makes no sense

      You may generate your own cache keys, rather than use the Magnolia defaults.

    • not available: → magnolia → browser cache policy

      Expand
      titlebrowser cache policy

      In headers, can set different cache policies for different content types ← voters

      In headers, can say how long browser may cache each content type FixedDuration or Never.


Cache Flush Policy

Expand

When to flush

  • Default behavior is: workspaces are observed, we flush upon activation if new content is detected in the workspaces.  We can choose to flush partially, completely, or not at all.
  • Every module may have its own cache flush policy

Executors

Once a cache decision has been made, actions are taken by executors:

  • useCache
  • store
  • bypass

Executors also configure expiry headers.

Note

Remember that when you flush the cache, you make every public do more work on the next request, which increases load.

Compression

Expand
titlegzip filter
Info

Some older browsers do not support compression ... one example is IE 6.  Check /modules/cache/config/compression/voters/userAgent/rejected@00.

Note where the gzip filter lives in the filter chain.

We compress items we send out from cache.  Typically, we compress text (because they are very easy to compress) objects like HTML, JS, CSS  (OOTB) to 20% size before sending to browsers.  We achieve further gains by streaming these sends directly from the JCR repo, rather than first storing them in memory.

Generally, it is enough to compress text.  We don't need to cache binary content (think about why not!), and Magnolia will not cache big objects (> 500k) anyway (see "in-memory threshold"). 

In-Memory Threshold

Should we cache something, based on size?

Testing shows that 98% of resources are served as fast from memory as from repository if those resources are > 500k in size.  So in addition to maybe not caching binary content because it hopefully doesn't change often, if it is over this threshold, it comes from memory just as fast as from repository, because it takes time to stream so much data from repository.  You can set a different value for this threshold programmatically.

Testing Compression

We can use tools like Web Sniffer to change the acceptencodinguser-agent headers, then submit pages to the sniffer and analyze the response.

We could also just use curl or your web browser's developer tools:

Expand
Code Block
~/Downloads/bb\-> curl --head http://localhost:8080/test-1/test-2.html
or ... curl -H "Accept-Encoding: gzip" -I http://localhost:8080/test-1/test-2.html
HTTP/1.1 403
Set-Cookie: JSESSIONID=C1D92406907AE50551D035703A19D78C;path=/;HttpOnly
Set-Cookie: NEW_VISITOR=new;Max-Age=86400;HttpOnly
Set-Cookie: VISITOR=returning;path=/test-1/test-2.html;HttpOnly
X-Magnolia-Registration: Registered
WWW-Authenticate: FormBased
Content-Type: text/html;charset=UTF-8
Transfer-Encoding: chunked
Date: Sun, 03 Dec 2017 05:03:15 GMT

~/Downloads/bb\->

Image Added

For more, see https://developer.mozilla.org/en-US/docs/Tools/Network_Monitor

Caching Strategies

Expand

We don't want to cache everything: even if data from an observed Magnolia workspace does not change, some external data we pull in to use in some components on the page may have changed.  This means ultimately that we have a different computed page to stream to the requester.  Since we don't want to stream that response with old data, we take it out of the cache.  Likewise, when we are developing JS or CSS or HTML on the Author instance, we don't want to cache that.

Note

Have you ever seen this in the magnolia.properties

# Switch to false to enhance the performance of the javascript generation and similar
magnolia.develop=true

Cache Header Negotiation

Expand

This is a mechanism to allow templates (components) to influence if content should be cached, and for how long.  If you don't want a page to be cached, but it's too late in the rendering process for something to be analyzed for caching.  Examples:

  • dynamic / live data: 1min, 5min cache
  • personalized data: maybe we don't cache at all (this is the default - see Advanced Cache for Personalization)
  • error resolution: maybe don't cache error messages! we don't want to redisplay a failure message after the user fixes his mistake, or after a temporary failure to read in dynamic, external data.

can set them in code, or in templates, like so:

Code Block
<%response.setHeader("cache-control", "no-cache"); %>



Other

Expand

The strictest (most precise) caching policy wins.  Think about the case where we are caching multiple things on the same page ... let's say we have two components, one is static and one is dynamic.  The dynamic one says "don't cache me", the static one says "cache me".  Will this page be cached? No! Why not - because one component doesn't want to be cached.

Cache Tools App

Cache Browser SubApp - used to log in from author to all publics ... can see what is in cache ... maybe can be used to help troubleshoot some common problems:

  • does the same content exist in cache for every public instance? (see memcache for a fix)
  • do we serve the same content (ex: an image) for every public instance? (see memcache for a fix)
  • is our cache evenly distributed? (see memcache for a fix)

...


Page Turner
button-linkstrue

...