Rationale

In the past, we've tackled with the idea of splitting core several times: Repository dependencies split-upConcept - Core & jaas split, Proposal - core packages, ... Core has grown to 800+ classes and has many test classes. As such, it is becoming increasingly difficult to package, test, and design features new and old. Some fairly old code is barely used anymore; most code is stable, but would benefit from better packaging and design. The first targets will be things that have no dependencies on our own APIs, e.g., JCR Utilities (predicates, wrappers, etc.). Other example cases would be content-i18n, security, virtual URIs, etc.

Isolating features in smaller modules would help incrementally refactoring, rewriting or throwing code away.

In the future, some of those split out modules could even move out of magnolia_main, because they'll be very stable and have different life cycles (e.g., JCR Utilities).

With this and some modules being split out of magnolia_ui, we might even be able to "merge" main and ui into what it really is: Magnolia Platform (smile)

After some initial tests, what was a 3-step gradual implementation of the above is now a 4-step one. We added step #1 to the original proposal. We'd do this in 4 steps to give margin to users (and ourselves) to upgrade their dependencies. Ideally with Step #1 no change is needed for people who already use scope:import for magnolia-project or one of our webapps.

Proposal

4 steps:

Step #1 - 5.4Step #2 - 5.5 ?Step #3Step #4+

Code cleanup.

We have a lot of semi-cyclic dependencies in our code; package dependencies are a mess.

We also have a lot of unused code, or deprecated code still in use.

As a first step, we'll implement some changes to these problems, which will make step #2 easier (or possible at all).

We have scripted the split that could occur with 5.5; with CI and perhaps tools like Sonar, we gradually aim to make this possible.

We gradually add "splits" to the script in question and gradually improve the existing code base to make the split possible.

The Jenkins job is current at https://jenkins.magnolia-cms.com/job/platform_core-split/

  • Split:
    • magnolia_main/magnolia-core becomes a reactor of its own (i.e., a sub-reactor of the main one). This should make it pretty self-explanatory and readable to see what ends up in magnolia-core-all.
    • it has a new groupId/artifactId
    • every new submodule shares the new groupId (question)
    • one of these submodules uses the "old" info.magnolia:magnolia-core coordinates and is a relocation to info.magnolia.core:magnolia-core-all
    • info.magnolia.core:magnolia-core-all is an "empty" jar with dependencies to all the other submodules. (warning) This is safer than an uberjar from a classpath perspective (no duplicates), but pushes the real dependencies one level down, which might have implications on the resolution.
  • In this first step, we only split out non-module stuff: it is really about extracting "libraries", not creating new modules. We might extract, for example, the FreeMarker support classes, but keep the related configuration in the core-module. Module dependencies (META-INF/magnolia/*.xml) should not have to be modified.
  • Optionally, m-core-all has one class and/or resource that allows us to also determine that it IS the -all artifact at runtime (see Cargo's example), which could help with install/update warnings.

  • No package/class is moved/changed/renamed for the purpose of this task; we should be 100% compatible.
  • Inevitably, some code will be hard to split out; because we have packages such as info.magnolia.cms.core with shared concerns (e.g the filters subpackage has a lot of filters ... which are completely unrelated), so:
    • The split can occur gradually
    • If it turns out to be a problem, we can even have a big "legacy" submodule. The goal here is to split out what can be split. It used to be hard because e.g. our Content API was used all over the place. It's not the case for newer code.
  • The relocation maybe has a message along the lines of "Hey, we're in the process of splitting core, you might want to change your dependencies to xx:yy"
  • The split can occur gradually:
    • We don't need to do all the splits we want right in 5.4 (meaning also that Step #2 is not necessarily 5.5)
    • But we do need to work hard on the relocation message.
  • We don't need change dependents, but we need to ensure bundles' dependencyManagement is correct and "importable".
  • Ultimately, users should not have to care.
  • We removed all packages overlap (each submodule has its own package(s))
  • info.magnolia.core:magnolia-core-all  is officially deprecated
  • The relocation message is more prominent
  • We change all dependents to point to the new separate artifacts
  • (question) magnolia-core is flattened out into the root of magnolia_main ? It might make sense to keep sub-reactors, but this is inconsistent with i18n, jaas, and templating* (but we could change these instead)
  • We remove info.magnolia.core:magnolia-core-all
  • Submodules are gradually extracted outside of magnolia_main; at that point, they might get a different groupId/artifactId and even version numbering. (in which case, we'll probably want to go through another relocation substep for each)

Two use-cases to take care of and verify:

  • Building a project/webapp: release notes should insist on people using scope:import to make sure they bring in the correct versions of all our artifacts.
  • Building a module: release notes should indicate new coordinates; if a module depends on info.magnolia:magnolia-core:5.4, it will be redirected to core-all, with the "real" dependencies being one level lower than they were; if another dependency of the module-under-build has already been updated to use the new coordinates, the module might end up being built against the wrong version of core-x. Developers should be able to do 2 things: use scope:import as well (not recommended, since this will still bring in the relocation message, or they'll have to adapt the dependencies to bring in the "real" dependency they need from core), or simply change their dependencies. With Maven 3.2.3 and some careful testing, we can come up with a simple guide. (Use dependency:analyze to figure out which one you need).

We discarded the idea of using an uberjar (instead of the core-all which simply brings dependencies, it would have them physically in its jar. While there are options to build an uberjar of modules and generate/deploy a pom that does NOT have dependencies to these modules (which would be needed, or we'd end up with the uberjar AND the "normal" jars in the webapps), it would not solve the cases where people depend on core AND, say, dam.

How

Scripts currently reside at https://git.magnolia-cms.com/user/gjoseph/core-split-scripts. Check the README.txt for usage.

Notes

  • Not 100% sure on which approach is best yet (relocation>uberjar or relocation>dependencies)
  • We did something very similar with DAM 2.0 : https://jira.magnolia-cms.com/browse/MGNLDAM-403 - here we redirect to magnolia-dam-compatibility, which has "old" APIs we want to ultimately remove, but also has dependencies to all other new modules.
  • If considering uberjar, see the Maven Shade Plugin, and in particular these options: artifactSet (include only info.magnolia* artifacts, not transitives), createDependencyReducedPom, promoteTransitiveDependencies

  File Modified
PDF File CoreSplit.pdf 2015-03-20 by Magnolia International


  • No labels

9 Comments

  1. In the above you mention "why" and "how" of the split, but not what is the outcome. That's just between the lines and will be ultimately understood by different ppl differently. Can you pls clearly state the target for the split and how structure will look like when it's finished? Examples would help. IMO this is important to judge amount of work that has to go into the split, amount of time it will take and the total impact. We can have more initiatives that would build up upon this one later, but it would be good to know when exactly (upon achieving what) will this one end.

    1. The outcome to me is the same as the "why" (wink) "Isolating features in smaller modules would help incrementally refactoring, rewriting or throwing code away."

      It makes little sense to start listing modules here, as that would imply doing a whole lot of upfront research that I'm not gonna do now. One of the key point is "gradual"; there's no need to do it all at once.

      I can give some examples that are pretty obvious.

      As for work estimation, the bulk of it should be done in <1d for step #1, for someone who knows Maven. The "hard" part will be testing and re-testing different scenarios. Once that's done, it's also about being aware of how dependencies work in Maven and constantly keeping an eye for it in bundles etc.

  2. I would also like to see clear outline of how we tackle compatibility or whether we expect that everyone will rebuild their modules immediately or for next major release or after 2 major releases ... and if such rebuild will be more or less cosmetics (package names and dependencies) or whether we will end up doing more and have some other naming changes or class split or more complicated-to-grasp-from-outside changes. 

    1. That's what the 3 steps are for. In step#1, "dependents" of core do nothing. Ideally, we don't even have to move a single class, so runtime is 100% compatible.

      In step #2, we encourage changing the dependencies.

      In step #3, we enforce it.

    2. Ha and, as to "when", next major vs 2 major releases, etc, that's undefined, as far as I'm concerned. But it should be the goal to avoid having to rebuild anything, or even change any dependent for 5.4. But there's a bit of research that needs to go into this (see some of the comments in "notes").

      This doesn't even have to be in 5.4 btw; I'd like it to be, but only if we have a 100% solid confidence that it causes 0 problems.

  3. I like the gradual extraction of new sub-modules from core; how about an "explosion" of core rather than a sub-reactor?

    Nevertheless, as much as I like having an explicit groupId immediately, I had trouble figuring out the purpose of the uberjar:

    • stable for compatibility reasons (two-speed approach w/ new core)
    • VS. "repackaging" new artifacts as old groupId (staying edgy, gaining API changes, deprecations, but eventually getting deprecated itself over time)
    • especially because it's not eagerly deprecated; and because GAV needs to be updated anyway (is the relocation really not enough then?)
    1. The sub-reactor would help "vizualizing" what ends up in the uberjar/relocation-of-what-was-core

      The purpose of the uberjar is that dependents still depend on it, without having to figure which new modules they are actually using; that's step#2. THAT SAID, as we just discussed, I'll update the above, because I'm less and less convinced by the uberjar idea.

  4. Some findings and thoughts

    The current status is reflected in Extractions.groovy where the following components have been defined so far:

    • magnolia-jcr-util
    • magnolia-hamcrest-matchers
    • magnolia-lang-util
    • magnolia-jcr-nodebuilder
    • magnolia-freemarker
    • magnolia-content-api
    • magnolia-versioning
    • magnolia-channel
    • magnolia-i18n-content
    • magnolia-audit

     

    For these components it is pretty clear that we are talking about libraries and not magnolia modules.

    None of these extractions work as is at the moment. The scripts are working, but they don’t do any magic. We have to put a lot of thoughts into this and I’m not sure about any concrete next steps.

    As Greg pointed out in step 1 we need a code clean up first, which we haven’t started. And while cleaning up code, we need to make sure, that everyone is on the same page, so we need to ‘educate’ people. And before we do that, we need to know pretty precisely what we want to do :)

    ‘magnolia-jcr-util’ is a good example IMO because there are just a few lines of code making the extraction fail. See MAGNOLIA-6321

    To fix these we need to discuss some fundamental topics right at the start, like usages (and future) of MgnlContext. The usages of NodeUtil#getNodeByIdentifier(workspace, identifier) will have to be updated, which is pbly one of the most used methods in this class. We can’t deprecate it if we want to extract it, we will have to remove it. How to deal with compatibility? Deprecate it right away and remove it with magnolia 5.6+/6. And here we are talking of one of the easier extractions..

    One immediate step I could think of, would be to define the dependencies properly. Currently we add all newly created components to all modules automatically, which creates cyclic dependencies (jcr-util <-> versioning). But for this we would need to define how this should look, so pbly create some package/module diagram?

    In the end this is a huge project. It’s not something one person can pull off alone, it must be a team effort and it’s not something we can do in one major release.

    What about rewriting parts of core from scratch with some other topics on our mind as well, like OAK, versioning, clustering, activation? Seeing this as part of bigger initiative. Even then these steps would make sense as a starter but having a bigger perspective might help keeping the focus on the right spots?

    1. Re step 1, I think you sum it up nicely - decide how to clean it, get everyone on a same page, do the team effort to clean. Everyone has to do the work, one (you) has to orchestrate the effort and prepare the educational part of it (smile)

      Re "Deprecate it right away and remove it with magnolia 5.6+/6." - it seems like correct approach giving everyone time to adjust without too much pain or pressure. Just word of caution. We can't deprecate without introducing replacement at the same time. Only then we can say xyz is deprecated in favour of abc. But so far I don't see outlined what the replacement would be so we need to put some thoughts in that direction too.