Concept - Branching

This page is about content branching: ability to split content creation into a separate branch, creating alternative content in parallel, and merging it back to the parent.

Introduction

There is a need to work on different branches of content. Why? In some cases it may be the lack of rigor in business processes or just the hectic nature of publishing business, or the result of closer cooperation of various business entities in search of higher effectiveness and synergic effects.

What is the Magnolia answer to branching? Versioning traditionally belonged to the domain of version control systems (VCS). In DMS and WCMS, versioning was usually limited to archiving the previous version of content but any advanced features such a concurrent development of different branches and possibly merging of branches was not required. This limited type of version is supported by JCR out of the box.

Branching content and working on multiple versions in parallel was difficult in the past, even in specialized VCS solutions. However, recent developments and adoption of systems like Git and Mercurial that make parallel concurrent multi-version development possible show us the way in which versioning capabilities of content repositories need to develop.

Is there a quick way out of the problem? Can we somehow merge traditional content repositories and VCS? One way would certainly be exposing VCS as a repository the way ModeShape offers. Another way is going beyond the specification and redeveloping versioning a part of the repository. Each of those solutions has drawbacks that won't be explored here in depth. Instead we look at how branching should work from an editor's perspective and the feasibility of implementation.

Epic

As an editor, I want to develop content for a future release of one or more live websites.

We will call this feature FutureRelease branch.

Implications

Technical:

The required APIs to copy on reference are already available.
The concept can also be used for an advanced tree-based language management approach. A language matrix allows better management of large scale multilanguage content.

Business:

The feature gives Magnolia a better position to compete large multinational enterprise clients.

User stories

Create a future release

As an editor I want to create a new FutureRelease. I want to properly name it and have the ability to pick sub content branches from my content tree.
Additionally, I want to specify the update strategy. I expect the following options:
Update on change (full CRUD operations)
Update on activation
No update
This is important to me because most of the time I update content partially.

Use cases: Relaunch, Campaigns

Early release

As an editor, I want to promote selected edited content from my FutureRelease branch to live content. Therefore, I expect a Promote action in the action bar.

Batch release

As an editor, I want to release all my prepared content from my FutureRelease branch at once. The modified content should overwrite the live content. When I activate the new FutureRelease content the new subtree replaces the old subtree completely. This also means that pages in the current live branch (trunk) are deleted if they don’t exist in the FutureRelease branch.

Content refresh

As an editor, I expect that unmodified content in my FutureRelease branch is refreshed if content on the live branch gets updated.
I expect that modified pages in the FutureRelease tree will not be overwritten if the sources content changes.

Implementation scenarios

The following factors are considered while evaluating a solution:

Performance
Reuse potential
Error-proneness
Complexity
Maintenance burden
Testability

Solid tree copy with change observer. The change observer keeps the content in sync.

As soon as a future release node is edited no updates are published anymore.

Pros	Cons
Easy to implement	May result in performance degradation in complex scenarios
Low complexity	Double data keeping
Low maintenance burden due simple modules
Reuse in advanced language management

Reference copy: Copy the tree with leaves containing the content references to the source

Pros	Cons
Easy to implement	Deleted nodes may become a problem
Reuse in advanced language management	Observer is needed for update on activation
Limited double data keeping	Increased product complexity

Virtual copy: No physical copy at all - An agent manages a virtual tree

Pros	Cons
Dead nodes are probably easy to manage	Complex to implement
Reuse in advanced language management	Out of sync issues if the agent is failing
No double data keeping	Difficult to test
	May become a maintenance burden because of the expected complex codebase.
	Heavy memory footprint
	Heavy CPU usage

Using workspaces as branches

We could try to use different workspaces to implement branching.

How would multiple workspace development look like? Using workspace management functionality, editor with appropriate permissions can decide at any point to branch the workspace. For the sake of simplicity we will consider branching possible only in the website workspace but the process would work with any workspace.

Branching happens as a one time copy operation during which all the content of original repository is copied over to a branch. From that point on, the editor can upon login decide which branch of content they want to work on and this is the branch that will be presented to them as a website workspace. They can continue to develop and even activate such content (if the branch was marked by its creator as publishable and is therefore replicated in all public instances).

At any point it would be possible to compare the tree of the master workspace to each branch. It would be possible to automatically merge non-conflicting changes. It would be also possible to present conflicts to the user initiating merge for manual resolution.

Other considerations:

Deleting

As long as deletions were not activated, they could be merged as well. Activating the deletion (and therefore wiping the actual content handle) would either need to be persisted in a graveyard sort of workspace or applied at the activation time to all branches, or not treated at all. Looking at how VCS solves this - branches are independent and merging two branches would actually reinstate the content deleted originally in one of the branches. ( double check !!)

Data size

The initial branching of a full workspace by copy would be expensive and time consuming operation. We could avoid that by just creating stub references that point to the original branch and copy only in case of change. However, this approach is complex and error prone.

Another possibility would be to support subtree branching, in which case only the selected subtree is copied over to the branch and the rest of the content is still served (overlaid) from the original workspace. This again introduces complexity on workspace manager to resolve such virtual content. Whichever approach is simpler remains to be seen. We could see if using links introduced by JCR 2.0 would help. Either way, this feature can start as full copy and could be possibly improved in the future.

Page tree