Implemented in 4.5

Introducing IoC in Magnolia. Implementation tracked in MAGNOLIA-2569@jira

Rationale

Inversion of Control (or Dependency Injection) has been around for a while now, and is recognized and accept as a design principle which helps improving testability (and thus quality), code readability, maintenance, understandability, and countless other advantages.
MAGNOLIA-2569@jira

Terminology

  • IoC: Inversion of Control. This is the "concept", the "programming model".
  • DI: Dependency Injection. This is an implementation of the concept. We say that a Magnolia component uses/relies on DI, whereas we say that Magnolia itself uses/implements IoC.

Background information

Mandatory read: Martin Fowler's article! Wikipedia's article is probably also a good read, although maybe a touch too the theoretical.

Benefits

  • Testability
  • Clearer APIs
  • Better dependency management (which component uses which others), less singletons
  • Less brittle code (singletons = global state)
  • Cleaner lifecycle management of components (let the container manage it, instead of having some components arbitrary "start" others)
  • More in the comments ! (smile)

Library or custom implementation ?

A basic custom implementation would have been possible, but we quickly come in "reinventing the wheel" stuff. Guice and PicoContainer were the main contenders at Magnolia. Guice for its popularity, Pico for its practicality, and the fact that we have a potential bigger influence on the code base.
See IoC Container decision table.

Relevant JSRs

Some requirements

One of the points that was preventing us from introducing IoC in Magnolia was our observed components, which could not be "updated": until Magnolia 4.3, the singletons were re-instantiated when observation kicked in. Since MAGNOLIA-2553@jira, they are proxied, thus allowing code to keep references to these components provides a patch for this.

Not all of Magnolia needs to be "converted" in one goal, but there's a good chance changes to one component will lead to changes in its dependencies.

Here's a rough list of what we want to be able to achieve:

  • Modules interdependency: a module class should be able to depend on another, i.e get the other's instance injected, as long as its module descriptor declares such descriptor. Use case: Forum's dependency on RSSAgg.
  • Filters should be constructed via IoC and thus be able to declare dependencies, typically on module classes.
  • Content2bean: any class loaded via c2b should ideally be instantiated via the IoC container: currently c2b uses ClassFactory instead of ComponentProvider, so this might be an issue.
  • Replace usage of SystemProperty by a @ConfigurationProperty("myProperty") annotation on a private field of the component user.
  • Scopes: components we currently use as "singletons" will probably be application-scoped. Scoped components are components whose lifecycle is tied to a different lifecycle than the "application", ie for webapps, typically the session and the request. This is usually implemented by nesting the containers, and storing them as webapp-, session- and request- attributes. Since they're (really) lightweight, there's virtually no added performance cost. However we'll need to think how we "mark" the scope of components.
  • We still need some level of control on instance creations: for ex, we want to instantiate and IoC' RenderingModels, and let them depend on a module configuration class.
  • Taglib: Guice.injectMe() - or something similar (smile)
  • Event/listener mechanism - nice to have - so components can be "ping"ed when others get reloaded, etc. See Concept - Event mechanism
  • AOP ?
  • Content2bean : init() - will that work with proxies/interceptor ?

Scoped components

Picocontainer introduced the notion of scoped containers (Guice has this too).

In the case of Magnolia, we might consider more custom tailored scopes (no specific idea yet, just throwing the thought out there)

Implementation

Containers

At startup time, we create a "root" container. This container is meant to contain only the components needed to start Magnolia up. (Classes and dependencies of ModuleManager, Content2Bean, MagnoliaConfigurationProperties - see below)

When then create a "main" container, composed of all registered components. This includes, amongst others, ConfigLoader.

Main changes

Context listener

info.magnolia.init.MagnoliaServletContextInitializer (replaces info.magnolia.cms.servlets.MgnlServletContextInitializer) is now solely responsible for instantiating the container(s), and starting it.
Certain components (@AtStartup) are started "eagerly", when the container is started.

Properties

MagnoliaConfigurationProperties is meant to replace SystemProperty. It aggregates all PropertySource. Each property source is "separate", one can identify where a property comes from.

MagnoliaPropertiesResolver is used by DefaultMagnoliaConfigurationProperties to determine which magnolia.properties files should be added to the other default sources.

MagnoliaInitPaths is a simpler wrapper around the 4 basic properties used to resolve the locations of magnolia.properties files, amongst others (also used to determine where Magnolia is deployed, in turn used to define paths to logs, temp, etc.)

It is possible to modify the magnolia.properties resolving mechanism by replacing the above component in the root container by custom implementations.

Filters

MgnlMainFilter is now "assisted" by FilterManager. FilterManager is a regular components (has dependencies, is retrieved from the IoC container), is holding the filter chain, and determines if the root filter is the configured filter chain, or the install filter chain.

MgnlMainFilter is the one filter configured in web.xml. It retrieves the FilterManager via Components.getComponent(), and merely delegates to the root filter provided by FilterManager. It is still responsible for push/popping the context for each request.

Content2Bean

Content2Bean now uses ComponentProvider.newInstance() when instantiating a bean/subbean. As such, any component configured in the repository can use DI

Various

Introduced a WebContextFactory, to simplify/clarify the way a WebContext is created, init'd, and set. By registering a different WebContextFactory, one can provide a different implementation of WebContext (hello WebLogic module), as well as an different implementation of AggregationState (hello STK)

Due to the changes regarding properties, introduced a clear() method on SystemProperty, to allow tests to clean up after themselves. Tests now need to use this method in their tearDown() methods, instead of SystemProperty.getProperties().clear().

Lifecycle

A component's lifecycle (start/stop/dispose) can be managed by the IoC container. With PicoContainer, the lifecycle is typically "lazy": the component is started when first "requested". This makes sense in most cases. An "eager" lifecycle can make sense for "cached" components (i.e singletons), and we need this for certain components, typically the log configuration, module manager, repository provider, etc. See PicoContainer-specific notes for implementation details.

Issues raised by this topic

  1. Content2Bean - currently modified the API to work around a circular dependency issue in Content2Bean - see MAGNOLIA-3525@jira.
  2. Naming conventions - We currently use a lot of "-Manager" classes. Candidate suffixes for renames: -Service, -Provider )
  3. Progressive conversion: maintaining a bastard codebase, where some components would be handled both by the IoC container and regular FactoryUtil calls.
  4. scopes vs context (we have a lot of components which would be candidates for app-scope, except at some point or other, somewhere down their dependency hierarchy, they'll use, for instance, MgnlContext.getInstance. And worse, might fallback on the system context, or change their behaviour one way or the other, if no Context is set. These behaviours will somehow have to be wrapped and retrofitted in MgnlContext, see #3

Request-scoped components

At first sight, components like WebContext, AggregationState, and even the ServletRequest object are good candidate for the request-scope. The next almost-natural step would be to think: hey, but then I could make the Renderer dependent on AggregationState instead of relying on a threadlocal context to give it to me. Well, that isn't so easy. This could be solved only in one of 2 ways:

  • Make your renderer also request-scoped. That would mean that the RenderingFilter (which depends on RenderingEngine) would also become request-scoped. Which would in turn mean all filters are request-scoped, and the whole chain get re-instantiated on every request (which could be optimized with caching/cloning of course, but still). And this breaks the filter api, since they're meant to "receive" the request in the doFilter() call instead.
  • Depend on "provider" type of objects. Those could be app-scoped, and would know "how to" fetch the current AggregationState, for example. In the case of Magnolia, this would probably turn out to be something like (Web)ContextProvider, which would be a simple wrapper around the current context. This is better than nothing, as it could be easily mocked in tests.

If we go for the second approach, it sounds like we're going to end up with a "duplicate" of MgnlContext (minus all the static fluff). Should we thus deprecate (parts of) MgnlContext ?

Additionally, a choice would have to be made between:

  1. a provider:

    public interface WebContextProvider {
        WebContext get();
    }
    
  2. a proxy

    public class (or intf) WebContextProxy implements WebContext {
        WebContext getDelegate();
        // .. implement all methods of WebContext and delegate to current threadlocal instance
    }
    

Further changes and conversion

In progress and TODO

  • Install container - the root and main containers currently hold components which are only used and needed during the install process. We could isolate those in a specific container.
  • Scoped containers (session and request, perhaps more granular scopes?) - I still need to figure out exactly how picocontainer-web does this. Probably based on ThreadLocal, but haven't found out the crucial bit: where the container is actually instantiated and populated. Probably will need to startup one of the sample projects and use a debugger.
  • Update task for MagnoliaServletContextInitializer (warning)
  • Improve life cycle usage
  • ConfigLoader isn't really needed anymore, merely just start 3/4 other components. (startup order?) - does it really need to happen with doInSystemContext ?
  • Move loading of module descriptors to ModuleRegistry - make it lifecycle, ModuleManager should not care
  • Modules lifecycle (currently started by MM)
  • Module descriptor additions
  • Check c2b instantiation for trees, pages, commands, ServletDispatchingFilter.
  • Provide better hook for replacing the root container ? Comma-or-newline-separated list of classes, or do we need key=value ? Given the low amount of components, I think it's ok to have to re-declare ALL components, not just those you want to replace (which also allows removing some altogether)

Components registration

We're considering enforcing components registration via the module descriptor. Possibilities:

  • Additional element in module descriptor: See MAGNOLIA-3517@jira and proposal below.
  • We currently still use the <properties> of module descriptors. 1) Do we want to keep this for backwards-compatibility ? 2) There are a few "legacy" components that were never registered (not in mgnl-beans.properties, not in a module descriptor), and these were retrievable as singletons, because they were concrete classes. This currently does not work, and registration has become mandatory. Thoughts ?
  • Scannotation: we could also scan the classpath for certain annotations to "magically" register components. This would probably increase startup time dramatically, which is not desired. However, in combination with a Maven plugin, this could be interesting: in dev mode (or whatever other optional way of enabling scannotation), we'd scan for components, and in production we'd use what's in the descriptor (i.e the Maven plugin generated a complete module descriptor based on said annotations)

Proposal for components registration in the module descriptor:

<component scope="..">info.magnolia.FooBar</component>
<component scope="..">/server/foo/bar</component>
<component scope="..">config:/server/foo/bar</component>
<component key="..">...</component>
<provider key?>...</provider>(~= factory)
<composer>....</composer>
  • scope: not mandatory - values: tbd.
  • key: to replace an existing component, otherwise defaults to value. Not specifying a key allows registering multiple implementations of a given interface.
  • provider - similar to our current ComponentFactory, more explicit ?
  • composer: if there is a need for a more complex component registration - this would then be container-specific

Guidelines for converting existing components

Quick check-list:

  • deprecate getInstance() type of methods
  • declare a dependency by:
    1. adding a private final FooBar foobar field to your class
    2. adding a FooBar foobar parameter to your constructor, and assign it to the foobar field you just added.

We should avoid using ClassFactory and Components as much as possible. Some components will still need to of course, for instance to instantiate rendering models. Commands, trees and other MVC pages could hopefully be entirely handled by Content2Bean.

Avoid field-injection if possible: field level injection (annotating a field with @Inject for example) has its use and is a pragmatic solution to some problems (we used it for example for AbstractMgnlFilter, to avoid having to have a constructor-dependency on WebContainerResources on all our filters. You should avoid it as much as possible, and favor constructor-dependency-injection instead:

  • the constructor can be your "init" method - all dependencies are there and ready to use. (when using @Inject, you'll often need @PostConstruct too)
  • dependency-fields can be marked final, making your code safer
  • you'll need some magic to inject the value of that field in your tests

Some components will not be "convertible" as-is. In some instances, we will maybe need to introduce "provider" components. Some components might need to be renamed, and we will have to come up with naming conventions (what is a component, what is a "value object" - for instance, the module configuration beans are probably the latter, although we'll want to inject them)

  • No labels

13 Comments

  1. Another option maybe http://tapestry.apache.org/tapestry5/tapestry-ioc/

    and for completeness' sake: Spring.

    1. Spring does everything and the coffee; pico and guice do one thing and do it well. I suspect tapestry-ioc to be the same (i.e targeted at usage within tapestry applications); both might still be worth a look, especially if we want to stay container agnostic.

  2. I would not drop a basic custom solution to fast. Such a basic solution could simply:

    • introduce an annotation on the setter/property (@inject)
    • the injection is done in the FactoryUtil (before calling the init method)
    • the to-be-injected object can be received from the FactoryUtil based on the property type

    This way would could introduce IOC with a small amount of changes.

    As far the setter/constructor discussion goes: the setter approach is simply more near the way magnolia handles object creation in today (FactoryUtil or c2b).

    To keep the injected objects up to date (in case an observation is triggered) the approach described in MAGNOLIA-2553@jira could be used.

    1. I replay to this old statment of mine:

      • a custom solution is not an option any more. It wouldn't lead to a good solution. Introducing a real container can very well be combined with the power of c2b.
      • the constructor based approach is preferable even if it differs more from todays approach. Especially when writing tests you know what the object needs to work properly.
  3. With todays content2bean I have two issues which are kind of related:

    • Injection of the parent object. When building an object tree it might be necessary that the child nows the parent object (the theme knows the site). Currently the wiring is normally done in the setChild(o) method of the parent.
    • Multiple module instances. Reading the section about contexts I was reminded about a very old challenge which is that we would like to instantiate multiple instances of the same module (take DMS as an example).

    This are not requirements but if this yields a solution I would be happy (wink)

      • injection of parent object - i am not convinced. In fact, I'm not even sure it's a "good idea" to let "child" objects be "IoC'd" at all... although I see the need. Injecting the parent, however, might lead to some weird cyclic object graphs. Let's see how this could be achieved.
      • multiple module instances: yes. We're talking "per site", right ? That would mean moving the notion of Site into core, and have the SiteFilter "select" to proper container to continue the request. Two challenges: 1) making sure earlier filters are not impacted (i.e anything before the SiteFilter will get the "generic" configuration) 2) designing the configuration itself, i.e what's configured where.
        • injection of parent object: I agree that this is more c2b related and not really an IoC use-case. But for object trees it makes absolutely sense to make the traversable if needed. So lets say this is off topic.
        • multiple module instances: I was not especially thinking of the multi site scenario even if this would make probably most sense. I thought about having two independent DMSs modules or so. But now that I think more about it, I am not sure if this makes a lot of sense anymore now that we have the multisite concept.
  4. guys, give me a business rationale on this. What is the impact, and why would Magnolia's users really benefit from this. You know I am all for nice tech, but not for tech's sake alone.

    To put this into perspective, better testability is always nice, but unless it saves us money compared to manual testing or provides us with new clients because Magnolia is better tested, this is not a business reason. I know it's tough but get some number behind your efforts - how many classes to rewrite, how much effort per class etc.

    I'd like you to convince me that this is the best way to spend our resources before we go out and do it just because its cool. Thanks!

    1. Manual testing does not happen. (i.e it happens once, when a feature is implemented, then we get lazy and assume "it still works", even though every other new change might have broken the feature).

      Better testability = simplified testability = developers are more likely to write tests = less time wasted in debugging, since bugs are discovered early by said tests = saved money. Look at the coverage of most of Magnolia nowadays. It is simply too low, and this is one of the ways I'm hoping it will climb up.

      Another advantage which leads up to exactly the same conclusion, is that the codebase is greatly simplified. There are a bunch of hacks we currently that can simply be dropped the day we have IoC (for example how we had to invent a listener mechanism to tie the cache filter to the cache configuration, and make the cache filter gets notified when the cache config changes. That can simply be dropped (minus 1 class, minus about 10lines of code in the filter and the cache config class) - result: simpler code, simpler tests.

      As for classes to rewrite, except the few that actually implement IoC in Magnolia, there should be ~0. Meaning, some can be "improved" to make use of IoC (that's usually a 2 lines change), but in 99% of the cases, they won't have to. So once core is ready, I'm off the game (except for "consulting"), and other modules/components can be gradually "IoC'd". (which is generally a two-minutes thing)

      Of course, this is not something that gets sold to non-technical people; the main audience here are developers. Anyone developing their own modules, or other piece of custom technology within Magnolia (template/paragraph models, form processors, ...) will benefit from this.

    2. IoC is a well accepted pattern and far the most developers see it as best practice for many reasons. In a sense you can always ask the question whether one can benefit from good software architecture. And it is always hard to find the good arguments. In my opinion the crucial point is what you might earn if you don't steadily invest in good software architecture. The costs for improvements and changes to the software become disproportionate high at some point. Its a bit like doing physical exercises. You can do quite well without but at some age you might realize that you would have better invested some time into. We have stepwise approached IoC so that now we are able to do the very last step without overpaying it. We walked all that path, why should we not take the last two steps.

      I quickly searched the web if someone provided a decent answer the the same question and found the following statement at http://forum.springsource.org/showthread.php?t=55015

      my experience, IoC using the Spring container brought the following advantages:

      • flexibility
        • changing the implementation class for a widely used interface is simpler (e.g. replace a mock web service by the production instance)
        • changing the retrieval strategy for a given class is simpler (e.g. moving a service from the classpath to the JNDI tree)
        • adding interceptors is easy and done in a single place (e.g. adding a caching interceptor to a JDBC-based DAO)
      • readability
        • the project has one unified and consistent component model and is not littered with factories (e.g. DAO factories)
        • the code is briefer and is not littered without dependency lookup code (e.g. calls to JNDI InitialContext)
      • testability
        • dependencies are easy to replace mocks when they're exposed through a constructor or setter
        • easier testing leads to more testing
        • more testing leads to better code quality, lower coupling, higher cohesion

      So intercept-ability (aspects) might be the point where one can trace business value. A good indicator for the "importance" of the topic might be that whenever I don't put IoC on the list of 5.0 features I get immediately a reminder from various developers out there (wink)

    3. Just my 2 cents. Apart from the very good reasons already summarized by Greg and Philipp, I would not underestimate the weight technical people have in driving or, at least, strongly influence some business decisions, such as choosing a certain product over its competitors. Maybe I'll be naïve, but in my experience, I have often seen this pattern at work: the business person tells the tech person: "We have such and such project and I understand we need a CMS. I googled a bit and found X, Y and Magnolia. Please evaluate them and tell me what's the best one to do the job." It's at that point that we will sell IoC, though indirectly, to the business and the equation

      Better testability = simplified testability = developers are more likely to write tests = less time wasted in debugging, since bugs are discovered early by said tests = saved money

      will produce its persuasive effects.

  5. The link "Official Documentation Available" at the top unfortunately is broken!

  6. Thanks Joerg. Fixed