Repository backup and restore

Introduction

The Magnolia repositories are at the heart of the CMS and carry all the web page as well as any other content except the JSP templates themselves. You want to make regular backups and taking down the website isn't always an option. Be warned, I'm a linux guy and what I wrote below will be from that perspective. I'm sure there are Windows versions of utilities like wget and cron. As I recall, the Windows NT version of cron was 'at' so that's some place to start with a Google search.

Since Magnolia EE 3.6 we have developed the special module which handles the whole backup process. For more informations follow to Backup module documentation

The Export servlet

Magnolia starting at around version 2.1 (maybe available in 2.0) incorporated an XML export function completely compatible with the bootstrap functions used when you first installed the .war files. You can see it in action by looking at http://localhost:8080/.magnolia/mgnl-export. (or http://localhost:8080/magnoliaAuthor/.magnolia/mgnl-export) Obviously change the server name and port numbers to match your particular installation.

Here are the options available:

mgnlRepository: The repository to backup. Available options are website, users, roles, and config.

mgnlPath: The path within the chosen repository to backup. Be aware that at least the early 2.1 versions didn't accept the root node on bootstrap, so be sure to choose each of the first level paths. A prime example (from the shipping example) is /features of the website repository.

mgnlKeepVersion: Should the export include all the previous versions of your repository data? The only valid value is 'true'. Leave it out to only export the most current version. You'll probably want only the most current version if you backup on a nightly basis. Plus the bootstrapping process will only take the most current version.

mgnlFormat: Determines whether or not to add indent formatting to the xml. Set to true to turn this feature on. Be aware that while this makes it easier for you to look at, it may cause issues with handling new lines on import or bootstrapping. The default is to just concatenate all the tags and text nodes together in one long stream of XML data.

All these parameters are handled in a GET form submission to the server, so URLs like this can be written to handle the whole request:

http://localhost:8080/magnoliaAuthor/.magnolia/pages/export.html?\
mgnlRepository=website&amp;\
mgnlPath=/&amp;\
mgnlKeepVersions=false&amp;\
mgnlFormat=true&amp;ext=.xml&amp;\
command=exportxml

In versions older then Magnolia 3.0 you can use following url:

http://localhost:8080/.magnolia/mgnl-export?\
mgnlRepository=website&amp;\
mgnlPath=/features&amp;\
mgnlKeepVersions=false&amp;\
mgnlFormat=false&amp;\
exportxml=Export

Automating The Backups

For the purposes of this article, we're going to use wget from www.gnu.org. There are other options including curl. Use your favorite and adapt the examples below to fit your needs. Here's a wget example for exporting one part of the repository:

wget --user=superuser\
 --password=superuser\
 -O /path/to/backup/location/website/features.xml\
 "http://localhost:8080/.magnolia/mgnl-export?mgnlRepository=website&amp;mgnlPath=/features&amp;exportxml=Export"

In Magnolia 3.0 (starting with RC3) the URL and its parameters have changed, so the wget example looks something like this:

wget --user=superuser\
 --password=superuser\
 -O /path/to/backup/location/website.xml\
 "http://localhost:8080/magnoliaAuthor/.magnolia/pages/export.html?mgnlRepository=website&amp;mgnlPath=/&amp;mgnlKeepVersions=false&amp;mgnlFormat=true&amp;ext=.xml&amp;command=exportxml"

In case you get an "Unknown authentication scheme. Authorization failed." error form wget, try the additional --auth-no-challenge parameter.

Actually writing the shell script is a little beyond the scope of this article, but you'll essentially want a shell script that will do the command above for each top level context in each repository. An example bash script will be attached later. Use cron or Windows at to make this happen every night at around 5:00 when the web traffic is at it's lowest.

Creating an instant restore of last night's backup

Really simple. Well almost. You do have backups of your templates, docroot files, and any java classes you wrote, right? If you don't, go back and get backups now before something very bad happens. The repository isn't going to be worth much if you don't also have all the other stuff that makes your site work.

Ok, now we're ready to revive a down website. Do this a couple of times for practice on a test box before you have to do it for real. Take a copy of an original Magnolia war file and expand it. Delete all the xml files in WEB-INF/bootstrap, replacing them with your own backup set of .xml files. Put in your templates, docroot files, etc., ... where they belong from your dev environment. Re-pack the war file and deploy.

Congrats! Your website is back up and exactly the way it was the very last time you did your backups.

By the way, the Magnolia Enterprise Edition has a nice deployment packager that makes backups (and all other forms of managing and moving the state of your Magnolia installation) very simple. It allows you to define a "package" that consists not only of the repository information of your choice but also files, so you can include JSP's etc directly in your backup (and restore) package.

Binary Large OBjects (BLOBs)

You can configure JackRabbit to use so called externalBlobs in which case it will store binary data in the filesystem (in the repositories folder) instead of in DB. Doing so means that you can't make atomic backup from outside against running instance. To ensure atomicity of the backup you would have to shutdown JCR repository first, and therefore shutdown Magnolia as well. In contrast if you store all the data in the database externalBlobs=false, the database backup is enough to backup all your content. The atomicity of such backup is fully related only to the capabilities of the database to perform such backup ensuring that any non commited transactions are not included. You can change this configuration flag in the jackrabbit configuration file prior installation of the instance. It is not possible to change the blobs storage on existing repository. When using externalBlobs=true, the xml export or Magnolia's Backup Module are your only way to ensure atomic backup without shutdown of the instance. The xml export can be applied to backup each workspace independently and without versions, while the Backup Module is only able to perform backup of all the workspaces together and keeps the versions as well. This limitation is due to fact that JackRabbit stores versions for content of all the workspaces in one special workspace called "versions", thus making it impossible to separate versions for different workspaces easily.
Apart from the binary data (when using externalBlobs=true), the only other information stored in FS by JR are lucene indexes and custom node type definitions. The lucene indexes can be safely removed any time when instance is not running. They will be re-generated at startup. The custom node type configuration is one small (few KB) xml file.

Also since you have a lot of content and the export files will get quite big, you might want to use info.magnolia.module.backup.ie.XMLFileSplitter to break down export files in smaller chunks. This class is part of the Magnolia's Backup Module and it is command line utility and it takes path to the folder in which export files are as its only parameter.
Another possibility when you run into OutOfMemory exception would be to increase your JAVA_OPTS - the key attribute
is -Xmx, the maximum parameter of heap size and also should help to update the -XX:MaxPermSize parameter.

Page tree

Introduction

The Export servlet

Automating The Backups

Creating an instant restore of last night's backup

Binary Large OBjects (BLOBs)

2 Comments

Kimmo Björnsson

Jan Haderka