Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Note

It is an open question whether Dockerizing / containerizing Magnolia is not considered best practice.  You have been warned (smile)

However, since using Docker together with Magnolia is clearly popular and growing in popularity, even if we do not advise Dockerizing Magnolia, we need to be able to provide a solution to those who must use it.  This section will go through some aspects of Dockerizing Magnolia.

First - What is Docker?

Docker containers are basically lightweight type II virtual machines.  Another way to think of it: Docker containers are like apps that come bundled with all their dependencies.  That's really it.  You can drop a Docker container into any system running Docker, and it should just work.  For example, you can run a Docker container on an EC2 instance, as long as that instance has Docker installed.  In fact, you can run N Docker containers on any 1 or N EC2 instances.  The key question here is: Why would I want to do something like this? Well, let's assume some business rules force you to use Docker.

Next - Why (Not) Docker?

Advantages

  • easy rollback of changes (docker history)
    • See explanation, below.
  • start-up speed
    • See explanation, below.
  • more CI-able?
    • Because Docker containers can be quickly created and destroyed, and because they will work the same way anywhere you run them, if a container works on your Dev machine, it will also work on your Prod (or any other env) machine.  Since we can integrate Docker with jenkins or travis or teamcity and git, a developer could submit code to github, automatically test and trigger a build, and then add it to nexus. 
  • self-contained
  • not tightly coupled to OS / tool (e.g., RHEL, AWS)
  • scalability
    • We have an image that we know is the same as all other such images and can work anywhere.  Therefore we should be able to quickly and easily create N such instances.

Disadvantages

  • disk usage could go up
    • See explanation, below.
  • if you're not using Linux, you need VirtualBox-ish
  • running systemctl or systemd units inside a container seems impossible / very difficult
  • best practices lead to lots of framework
    • E.g., the suggested basic Magnolia setup (1 author, 2 publics, each with different database) would require 6 different docker containers.  Those 6 docker containers would have to be strung together with Docker compose, and Docker compose does not seem to work on multiple hosts, so, you'd also need to set up a Docker swarm. 
  • to fully utilize CI/CD, might need an enterprise-level Docker license

...

titleFrom a discussion with Nicolas B. in Pre-Sales room:

[9:48 AM] Nicolas Barbé: In short, there are two issues : 1. Magnolia is stateful 2. You cannot run Two magnolia instances on the same DB. Because of that, we can't leverage advanced cloud features which are provided out-of the box by k8s or aws such as auto-scalability or B/G deployment. Our customers have to implement a lot of glue code to make it work, which kills the argument of using such platforms. With container based orchestration, the situation is even worse, because they can reallocate dynamically the containers to different hosts. This is the way failover is implemented and that the whole cluster scale. Magnolia instances must be declared as "fixed" instances, which cannot be moved to a different host. Which again kills all the advantage of having k8s (or other container based orchestrator).

Our customers ask for k8s because they have invested in this technology to leverage these otb features and to have a consistent way of managing "services".

[9:50 AM] Nicolas Barbé: To be more complete : JCR clustering is broken and  k8s statefulset  is not something you want to do with Magnolia

[10:13 AM] Nicolas Barbé: Sure, it's written in the Jackrabbit documentation itself https://wiki.apache.org/jackrabbit/Clustering
[10:15 AM] Nicolas Barbé: In short, JCR clustering uses a log.  This log is used to spin up new instances and sync them (index) relatively quickly. The log grows up quickly, to clean-up the log, you need to activate the janitor. If you do so, you can't spin up new instances since part of the history will be missing
[10:16 AM] Nicolas Barbé: Plus, even with this mechanism, creating the new instance is not something obvious.
[10:17 AM] Nicolas Barbé: Good new is that things are different with OAK, I don't know if other JCR implementations have the same issues

[10:18 AM] Jan Haderka:

    |  you can't spin up new instances since part of the history will be missing

Why?

1.  Magnolia is stateful - it's difficult to do k8s stateful set.

2. JCR clustering is difficult - You cannot easily run two Magnolia instances on the same DB.  JCR clustering uses a log - this log is used to spin up new instances and sync (index) them relatively quickly.  But the log grows fast; to clean it, you need to activate the janitor.  If you activate the janitor, you can't spin up new instances (unless using a snapshot taken from a synch'd cluster node after the last janitor run, or some other workaround) since part of the history will be missing (even with this mechanism, creating the new instance is not something obvious).

Because of those two things, we can't leverage advanced cloud features which are provided out-of the box by k8s or AWS such as auto-scalability or B/G deployment.  You may have to implement a lot of glue code to make it work, which removes some of the benefits of using such platforms.  With container-based orchestration, the situation is even worse, because you can reallocate dynamically the containers to different hosts.  This is the way failover is implemented and that the whole cluster scales.  Magnolia instances must be declared as "fixed" instances, which cannot be moved to a different host, again peeling away the advantages of having k8s (or other container-based orchestrator).

You have been warned (smile)

However, since using Docker together with Magnolia is clearly popular and growing in popularity, even if we do not advise Dockerizing Magnolia, we need to be able to provide a solution to those who must use it.  This section will go through some aspects of Dockerizing Magnolia.

First - What is Docker?

Docker containers are basically lightweight type II virtual machines.  Another way to think of it: Docker containers are like apps that come bundled with all their dependencies.  That's really it.  You can drop a Docker container into any system running Docker, and it should just work.  For example, you can run a Docker container on an EC2 instance, as long as that instance has Docker installed.  In fact, you can run N Docker containers on any 1 or N EC2 instances.  The key question here is: Why would I want to do something like this? Well, let's assume some business rules force you to use Docker.  Or, our customers ask for k8s because they have invested in this technology to leverage these otb features and to have a consistent way of managing "services".

Next - Why (Not) Docker?

Advantages

  • easy rollback of changes (docker history)
    • See explanation, below.
  • start-up speed
    • See explanation, below.
  • more CI-able?
    • Because Docker containers can be quickly created and destroyed, and because they will work the same way anywhere you run them, if a container works on your Dev machine, it will also work on your Prod (or any other env) machine.  Since we can integrate Docker with jenkins or travis or teamcity and git, a developer could submit code to github, automatically test and trigger a build, and then add it to nexus. 
  • self-contained
  • not tightly coupled to OS / tool (e.g., RHEL, AWS)
  • scalability
    • We have an image that we know is the same as all other such images and can work anywhere.  Therefore we should be able to quickly and easily create N such instances.

Disadvantages

  • disk usage could go up
    • See explanation, below.
  • if you're not using Linux, you need VirtualBox-ish
  • running systemctl or systemd units inside a container seems impossible / very difficult
  • best practices lead to lots of framework
    • E.g., the suggested basic Magnolia setup (1 author, 2 publics, each with different database) would require 6 different docker containers.  Those 6 docker containers would have to be strung together with Docker compose, and Docker compose does not seem to work on multiple hosts, so, you'd also need to set up a Docker swarm. 
  • to fully utilize CI/CD, might need an enterprise-level Docker license

[10:18 AM] Jan Haderka: that's just partially true.

[10:19 AM] Jan Haderka: you can still spin new instances, but from a snapshot taken from synced cluster node after last janitor run

[10:20 AM] Jan Haderka: and there's other workarounds.

[10:22 AM] Nicolas Barbé: yes true, that's what i meant with "glue code" earlier

[10:27 AM] Nicolas Barbé: k8s and similar tools come with a cost, customers expect to get an ROI out of that mainly not writing glue code anymore. Actually in k8s there is no good way to trigger the glue code (unless it has changed)

...

Explanation

How Docker works: There are many pre-fabricated Docker images available online (and you may make / contribute your own).  For any task / service you want to run (say, mysql or tomcat), there's probably already some image in place.  When you want to use an image that already exists, Docker uses an algorithm called 'copy-on-write'; this means basically lazy copying: Docker will effectively create a pointer to the image you want to use, and only creates a new image when you want to actually alter that existing image; this makes starting it up pretty fast. 

...