Summary of findings and ideas for cache configuration for JR clustered installations.

Considering options for advaning cache implementation to behave correctly in clustered env.

Introduction

Probably enterprise only feature.

This concept is based on issue MGNLADVCACHE-10 - Getting issue details... STATUS  found while testing clustering and subsequent discussion.

Use case

  • page commenting
  • any other non activation based content updates

EHCache

EHcache supports clustering.
Considerations for using EHcache clustering support:

  • public instances would need to know each other (IP), making public instance configuration more complicated ... the author will have only one subscriber, but public that will be such subscriber would need to know about all other public instances ... what would happen if such public goes down ... would need to figure out which of the cache slaves will become master and change subscriber on the author to that master
  • the cache configuration would need to be different based on cluster configuration - full vs. partial clustering vs. partial clustering (specially if author itself participates in the cluster)
  • solves the issue only for EHcache, making initial barrier for implementing other underlying cache for Magnolia even more difficult.

Magnolia cache

Considerations for advancing current cache implementation:

  • instances participating in the cluster are already bound together by JR journalling and only events related to clustered workspaces are propagated across cluster, no extra config necessary
  • there is mechanism in place for instructing the cache (Flush policy) which works well in clustered environment
  • the Flush Policy is already implemented for flushing all the content (and used for activation)
  • new flush policy for flushing single only items would have to be implemented
  • The advanced caching techniques manipulating cache items directly would need to be rewritten to use Observation for all cache manipulation to ensure events are propagated across cluster
  • New method (since 4.1) for flushing single page from cache would need to be changed to use single page flush policy instead of retrieving cache instance and removing content directly

Implementation details

Cache updates should be realized only via observation to ensure distribution over the cluster.
Schema of the implementation for page commenting based on observation

First code changes have been committed.
In cache module MAGNOLIA-2616 - Getting issue details... STATUS :

  • New methods in CachePolicy API to allow exposing conversion of uuid to cache key, should the cache policy support this.
  • Implementation of the above for Default cache policy.

In commenting module MGNLCMNT-2 - Getting issue details... STATUS :

  • Implementation of the flush policy that is able to detect page that needs to be flushed based on new comments being added.
  • Registration of the above policy on the startup of the module

Class diagram of current implementation:

Pending

  • peer review
  • what might still change
    • is some method naming, and
    • probably final location for the code related to flushing the page in the policy. Since this is rather generic, it might be moved to either cache or advanced cache module to expose functionality to the others who might be interested in using it.
    • initialization of the flush policy ... still undecided whether it is really business of the commenting module to register this or whether cache module should take care.
    • configuration - is it enough that there is only this one policy that can be used with commenting, or is there a real chance that we or someone else might ever want to replace it with something else in some case.

5 Comments

  1. Maybe adding use cases would be a good idea...

  2. As a first attempt we could just bypass the problem by solving the issues in the modules. For instance by using observation to flush the cached page in the other instance. Following that approach we collect some experience with comments, PUR and the imaging back end we can use to solve the issue generally.

    More and more I have the feeling that we need an inter instance communication for this kind of scenarios where using observation is not enough. Similar to what we have done for the seat imaging back end. 

  3. When we talk about a cluster we talk about big installations, high availability etc..

    IMHO, we should step back and see what works best in such an environment, I don't see EhCache as a solution for big distributed/clustered installations. We have everything in place so why not use a central cache "memcached" where all public servers use this single cache source.

    Why we want caching/flushing overhead in each and every public instance?

    There are some Java clients available for memcached which could be used to implement Magnolia cache/flush policies.

    Just my 'few' cents.

    1. I don't see EhCache as a solution for big distributed/clustered installations.

      Why not ?

  4. IMO, there is no need for this overhead on each and every node in a cluster.

    Clustering does not mean that we need to cluster every tier of an application, optimum performance can be achieved in a much simpler architecture.