The Extended Health Check module provides extensible, configurable endpoints for evaluating the "health" of a Magnolia instance. You can use the endpoint for monitoring a Magnolia instance, either manually or automatically, for example, for autoscaling.

You configure the values of the HTTP status returned by the health check and configure the conditions that will be checked for a specific HTTP status. 

The Extended Health Check module also provides a store for "health events".  Health events are significant events that indicate something about the health of the Magnolia instance and can be checked in the extended health check. 

You can collect health events from the Magnolia log with log4j configuration, and you can collect health events relating to Magnolia publication failures. 

Installation

Maven is the easiest way to install the module. Add the following dependency to your bundle:

<dependency>
  <groupId>info.magnolia</groupId>
  <artifactId>healthcheck</artifactId>
  <version>${version}</version>
</dependency>

Versions

1.0Magnolia 5.7.8 and later

Health outcomes

Health outcomes define the conditions for which a specific HTTP status is returned by the extended health check. 

A health outcome defines: 

  • A voter set including one or more health voters or boolean voter sets checking Magnolia health conditions.
  • Details returned if the conditions for the health outcome are met (HTTP status and description).

Health outcomes can be disabled (or enabled). A disabled health outcome won't be examined when an extended health check is requested. 

Health outcomes are defined through the module configuration at /modules/healthcheck/config/outcomes. You can add or modify the health outcomes defined there.

Node nameValue

 
modules


 
healthcheck


 
config


 
outcomes


 
<health outcome 1>


 
<health outcome 2>


 
<health outcome N>


Health outcomes are checked in the order they are defined; the first health outcome whose health voters return true is returned as the result of a health check, and any remaining health outcomes are ignored. 

Here are the configurable properties of a health outcome:

Node nameValue

 
modules


 
healthcheck


 
config


 
outcomes


 
<health outcome name>

A unique name identifying the health outcome

 
class

Should be info.magnolia.health.HealthOutcome


 
enabled

true (the default if not specified) or false

If false, the health outcome will not be checked during an extended health check.

 
conditions


 
class

The class name for a boolean voter set

If not specified, it will be info.magnolia.voting.voters.BoolVoterSet.

 
voters


 
<health voter or boolean voter set>

Configuration of health voters is described below. 

Note: you can also define further boolean voter sets, along with boolean operations, to build up complex conditions. 

Health voters

Health voters check a single, specific condition about the health of a Magnolia instance. They can be combined with other health voters and boolean voter sets to form complicated logical expressions for a particular health outcome. 

The Extended Health Check module includes several health voters: 

  • To check if a Magnolia context is available
  • To check whether specific health events exist
  • To check if Magnolia needs to be updated
  • To check whether certain nodes or properties exist in Magnolia's JCR repository

ContextAvailableVoter - check if a Magnolia context is available

The Magnolia context is fundamental to Magnolia operation (unsurprisingly) and indicates a serious problem with Magnolia if one is not available. 

ContextAvailableVoter has the following configuration: 

Node nameValue

 
<voter name>

Name of the voter

 
class

Should be info.magnolia.health.voters.ContextAvailableVoter

 
enabled

true (the default if not specified) or false

If false, the voter will not be evaluated

 
not

true or false (the default if not specified)

If true, the result of the voter will be negated (e.g. !result).

HealthEventPropertyVoter - checks for specified health events 

The HealthEventPropertyVoter checks whether specific health events exist meeting the configured criteria. You can also specify a threshold for the number of health events found, as well as the expected value of a health event property. 

HealthEventPropertyVoter has the following configuration:

Node nameValue

 
<voter name>

Name of the voter

 
class

Should be info.magnolia.health.voters.HealthEventPropertyVoter

 
enabled

true (the default if not specified) or false

If false, the voter will not be evaluated

 
not

true or false (the default if not specified)

If true, the result of the voter will be negated (e.g. !result).

 
identifier

The identifier of the health event

Health events have the following identifiers: 

loggedMessage - the health event was created from a log message

publicationError - the health event was created from a publication error

If not specified, the identifier will be loggedMessage.

 
propertyName

(required) The name of the health event property whose value will be checked 

 
propertyValue

(required) The expected value of the health event property

 
predicate

Specifies how the value of propertyName will be compared to the expected propertyValue

The following comparisons are available: 

  • isDefined: property propertyName is defined in the health event
  • equals: propertyValue equals the actual property value
  • notEquals: propertyName is defined in the health event and propertyValue does not equal the actual property value
  • matches: propertyValue is a regular expression that matches the actual property value
  • doesNotMatch: property propertyName is defined in the health event and propertyName does not match the actual property value

 
threshold

The number of health events matching the identifier, propertyName, propertyName and predicate. If more health events are found, the voter will return true, otherwise false.

If not specified, threshold will be 0.

 
interval

Defines an interval in milliseconds from the current time when the health voter is checked for health events 

Health events outside of the interval will not be checked. 

Use intervals to limit the health events considered (e.g. publication errors within the last 30 minutes). 

If interval is less than than 0, all health events will be checked (the default if not specified). 

MagnoliaUpdatedNeededVoter - checks Magnolia modules needing updating

The MagnoliaUpdatedNeededVoter checks whether one or more Magnolia modules needs updating. 

MagnoliaUpdatedNeededVoter has the following configuration:

Node nameValue

 
<voter name>

Name of the voter

 
class

Should be info.magnolia.health.voters.MagnoliaUpdatedNeededVoter

 
enabled

true (the default if not specified) or false

If false, the voter will not be evaluated

 
not

true or false (the default if not specified)

If true, the result of the voter will be negated (e.g. !result).

PublicationFailureVoter - checks for Magnolia publication failures

The PublicationFailureVoter checks whether a publication failure has occurred. 

PublicationFailureVoter has the following configuration:

Node nameValue

 
<voter name>

Name of the voter

 
class

Should be info.magnolia.health.voters.PublicationFailureVoter

 
enabled

true (the default if not specified) or false

If false, the voter will not be evaluated

 
not

true or false (the default if not specified)

If true, the result of the voter will be negated (e.g. !result).

 
interval

Defines an interval in milliseconds from the current time when the health voter is checked for publication failures 

Publication failures outside of the interval will not be counted. 

Use the interval to limit the publication errors considered (e.g. publication errors within the last 30 minutes). 

If interval is less than than 0, all publication failures will be checked (the default if not specified). 

 
threshold

The number of publication failures within the specified interval counted. If more publication failures are found, the voter will return true, otherwise false.

If not specified, threshold will be 0.

QueryVoter - checks for nodes defined in the JCR repository

The QueryVoter checks whether nodes in the JCR repository are defined. This voter is useful for checking the messages workspace for system errors like the expiration of the Magnolia license.

QueryVoter has the following configuration:

Node nameValue

 
<voter name>

Name of the voter

 
class

Should be info.magnolia.health.voters.HealthEventPropertyVoter

 
enabled

true (the default if not specified) or false

If false, the voter will not be evaluated

 
not

true or false (the default if not specified)

If true, the result of the voter will be negated (e.g. !result).

 
workspace

(required) The workspace that will be searched

 
query

A valid JCR SQL 2 query that will be evaluated in the workspace

 
threshold

The number of nodes expected to be found for the health voter to return true.

If not specified, threshold will be 0 (one or more nodes are found by the query).

Health events

Health events are collected while Magnolia is running and provide a record that can be checked by health voters. There are two health voters - PublicationFailureVoter and HealthEventPropertyVoter - that use health events; the other voters - ContextAvailableVoter, MagnoliaUpdatedNeededVoter and QueryVoter - all check the state of Magnolia at the time of execution. 

Health events are collected from two sources: 

  • The Magnolia log
  • The results of Magnolia publications

Both sources can provide valuable insight into what has happened in a Magnolia instance outside of the time Magnolia's health is being checked.

Health events have: 

  • an identifier to indicate where the health event came from: "loggedMessage" for health events from Magnolia logging and "publicationError" from errors occurring during a Magnolia publication
  • name / value properties depending where the health event was collected

Health Log

Health events are stored in a health log and health voters can check the health log for matching their configuration to assess Magnolia's health. 

The health log can store a limited number of health events: 

  • up to 10,000 total health events
  • health events older than 6 hours are discarded 

Your health voters should not use intervals longer than 6 hours. 

Collecting health events from Magnolia logs

You can collect health events from Magnolia logs and save them in the health log through Magnolia's log4j configuration. 

You will need set up two log4j elements: 

  • A health log "Appender" to store any matching messages into the health log
  • One or more "Loggers" to select log messages to be saved by the health log appender

Note that you can filter events by both the health log appender (using the "Filters" attribute) and the loggers (using the "level" attribute). 

The health log appender is declared in the Extended Health Check module, you can use it in your log4j configuration without further declarations: 

Here's a sample health log appender:

Sample HealthLog appender
    <HealthMonitor name="license-monitor" messagePattern=".+">
      <PatternLayout>
        <PatternLayout pattern="%-5p %c %d{dd.MM.yyyy HH:mm:ss} -- %m%n"/>
      </PatternLayout>
    </HealthMonitor>

This HealthMonitor appender will save any log message directed toward it (messagePattern will match any non-empty message) with the specified layout pattern. 

HealthMonitor will save any matching log message to the health log with the following name / value properties:

  • logLevel: the log level of the message
  • logMessage: the log message
  • logThread: the thread where the message was logged
  • logName: the name of the Logger
  • logCallerFQCN: the fully qualified class name where the message was logged

Here's some sample loggers that select log messages and send them to the HealthMonitor appender above: 

Sample appenders
    <Logger name="info.magnolia.multisite.sites.MultiSiteManager" level="WARNING">
      <AppenderRef ref="license-monitor"/>
    </Logger>
    <Logger name="info.magnolia.sitemesh.config.MagnoliaConfigurableSiteMeshFilter" level="WARN">
      <AppenderRef ref="license-monitor"/>
    </Logger>

These loggers will select WARN level messages from the Magnolia Multi-Site module (specifically info.magnolia.multisite.sites.MultiSiteManager) and the Magnolia SiteMesh caching module (specifically info.magnolia.sitemesh.config.MagnoliaConfigurableSiteMeshFilter) and sends them to the HealthMonitor appender named "license-monitor". MultiSiteManager and MagnoliaConfigurableSiteMeshFilter both report expired licenses at WARN level.

Collecting health events from publications 

Errors during a Magnolia publication are not completely captured in the Magnolia logs; the specific error message returned by a Magnolia public instance to the Magnolia author is not recorded in the log of the public instance. Knowing why a publication failed is an important indication of the health of a Magnolia public instance: if the publication failed because of some failure of the JCR repository, the JCR repository Magnolia public instance may be corrupted and the instance should be replaced or repaired. On the other hand, some publication errors may be recoverable, for example, publishing a child node whose parent has not been published will cause a publication error that can be remedied by publishing the parent node and republishing the child node.

Publication errors can be collected by a filter. The filter detects publication requests and saves the results of the publication into the health log. 

The Extended Health Check module will install a filter "publishingMonitor" before the publication filter "publishing" to collect the result of publications. 

If you change either the publishingMonitor filter or publishing filter, please note: 

  • the publishingMonitor filter must be located before the publishing filter in the filter chain to collect publication results
  • the publishingMonitor filter should have the same bypasses configuration as the publishing filter to identify publication requests

If you don't want to collect publication results in the health log, you can disable the publishingMonitor filter (set its enabled property to false) or delete the publishingMonitor filter.

Health outcomes provided

The Extended Health Check module includes a number of health outcomes defined: 

NameHTTP status returnedDescription returned
error500500Magnolia has internal errorsCouldn't get a Magnolia context
error501501Magnolia has internal errorsOne or more Magnolia modules needs to be updated
error503503Magnolia public instance has publishing failuresOne or more publication errors was found in the health log
error402402Magnolia license has expired!One or more licensed expired messages were found in the messages workspace or one or more license expired log messages was found in the health log
errorTest502Test health error (Magnolia is really OK)

A test outcome (will always be returned) for testing the health check endpoint. 

NOTE: this outcome is disabled on installation of the Extended Health Check module. 

REST API

The Extended Health Check module comes with a REST endpoint:

GET

Health Check

Returns current health of a Magnolia instance according to its configured health outcomes.

Request URL

/.rest/health/v1/check

Returns the health check results with: 

  • healthy - true if the instance considers itself healthy (HTTP 200) or false if the conditions of some health outcome were met (HTTP status of the health outcome)
  • description - the description of the health outcome or "Magnolia is healthy" if healthy

Returns an HTTP status of: 

  • 200 - the instance is healthy
  • the HTTP status of the health outcome


GET

Reset Magnolia health events

Removes all Magnolia health events.

Note: reset may not change Magnolia health status if a health outcome uses health voters like ContextAvailableVoter, MagnoliaUpdateNeededVoter or QueryVoter that do not use health events.

Request URL

/.rest/health/v1/reset

Returns an HTTP status of: 

  • 200 - all health events removed


GET

Retrieve Magnolia health events

Retrieves all health events currently in the health log.

Request URL

/.rest/health/v1/dump

Returns an HTTP status of: 

  • 200 - all health events returned in the response body

Warnings

  • This module is at INCUBATOR level.

Changelog

  • Version 1.0 - Initial release of the extensions version of the module.