Parallel Activation

Draft for 5.3

Ideas for improving the publishing experience for big sites by increasing activation speed of big trees.

Problem

  • Activation of piece of content takes anywhere between .5 to 3 seconds depending on the size of the content.
  • When activating "including subpages" all the pages are processed sequentially.
  • This page is not concerned with activation to multiple public instances which was already parallelized.

Goal

  • increase the speed of activation for big trees (or potentially for sets of content should we ever support such)

Solution

Since recent improvements in activation MAGNOLIA-2427@jira,MGNLXAA-17@jira the main chunk of the time during activation is consumed by http authentication and data transfer. Also since fixing MAGNOLIA-2489@jira it is safe to publish multiple pieces of content in parallel even if they belong to same parent. That being said, it should be possible to parallelize activation to single subscriber by sending multiple pieces of content by different threads in parallel. Manual tests shows that when running up to 3 activations in parallel, the speed gain is nearly 100% (i.e. total time goes down to 50% for 2 parallel activations and to 34% in case of 3 parallel activations in comparison to running everything sequentially).
The things to consider are:

  • how many execution threads (if too many, the performance of public instance could be affected during publishing)
  • we should perhaps compute distribution graph before starting activation and activate deepest paths and those with most children at same level first, to gain maximum speed.

4 Comments

  1. This is a very interesting issue. A couple of remarks:

    1) I'm not sure why most methods in the ReceiveFilter have to be synchronized. As far as I could see, there is no mutable shared state that different threads can mess up with. Anyway, this may not be a major problem.

    2) When activating content in parallel of course it is not deterministic in what order the ReceiveFilter will get the contents to import. Couldn't this be an issue for javax.jcr.Session.importXML(..) as the parent node of the current content to be imported may not be there yet, thus raising a PathNotFoundException?

  2. re 1) Yes, most of the methods do not need to be synchronized any more as we already managed to remove most of the obstacles that called for the synchronization in the past, like non unique names of the temp transport files, shared states, etc.

    re 2) This is indeed the whole point why we can't just plug it in, but need to come up with a conceptual solution. Apart from issue of activating child before parent, also issue of activating siblings out of order needs to be considered as they are order relatively to each other and not complete sibling ordering map is generated for each piece of content activated. So the final solution would be probably along the lines of parallelizing activation of independent children of siblings (in the very least, or of content that is related on much higher level) rather then parallelizing parent-child or sibling activation.

    1. Yes, ordering of siblings makes parallelizing activation even more tricky. As to parent-child activation, however, I got an idea (perhaps it's naive or plain wrong, anyway...) - if parent is missing, create a dummy one, to be replaced with the actual one, as soon as it comes from the author instance. Just an idea, of course it needs further investigation.    

      1. It would be better to synchronize structure first in sequential manner without content and afterwards transfer content in parallel. The parent-chield issue will be gone then.