Implemented in 4.5.4

New Magnolia JCR Search API

Rationale

Current Search API encapsulated and exposed by i.m.cms.query.QueryUtil class is quite old and not so clean. It exposes too many methods in some cases and doesn't provide all necessary functionality in some others. The API also forces use of Collection API and retrieval of all the content returned by the query which is not always good practice. New API should make use of queries simpler, should encourage good practices while searching and allow taking as much benefit of other JCR utilities available as part of Magnolia since 4.5.
...

Structure changes:

  • Query methods should be renamed to search
    • Methods are not just creating query but also executing it so the output is result and search is more meaningfull name then query which implies more just creation of query
  • Date/time related methods should be moved to info.magnolia.cms.util.DateUtilclass
    • Those methods are unrelated to queries and sice we already have DateUtil class it's logical to move them there. Anyone looking for such methods will probably first search for Date*Util or something similar and really nobody will anticipate them in QueryUtil
    • For more they are not even used by current implementation of QueryUtil.
    • We should leave just methods where calendar is a parameter
    • Remove/deprecate the rest
      • Calendar is standard for keeping time/date parameter
      • Calling methods with integers might be confusing for some users especially with so many formats

Functional changes:

  • JCR QueryManager
    • Since info.magnolia.cms.core.search.QueryManager is deprecated and should not be used anymore we can use JCR query manager which is obtained from JCR session for specific workspace
    • That leaves minimum parameters for search method as before - workspace name and search statement itself
    • Workspace parameter is then used to call specific session over which the query will be executed
    • On default query is then executed in user context
    • To run it for system context method can be called within MgnlContext.doInSystemContext()
  • Return type NodeIterator
    • Not necessary to output Collection of whole content found by query every time
    • NodeIterator improves standard iterator with some methods making it easy to use
  • However we should also provide possibility to return Collection of Nodes
    • Sometimes more willing then NodeIterator
    • Since it's not already implemented should be added to info.magnolia.jcr.util.NodeUtil class for future usage
  • Search method won't hide errors
    • Common practise, hiding errors should not be used as default
    • However sometimes it's necessary to catch exception to not break rendering of template
  • Search method should be wrapped by TemplatingFunctions for those cases
    • Here we can safely catch the exception and just log it
    • It's already done this way however wrapped methods are also in QueryUtil class
    • By placing them into TemplatingFunctions class we will make clear their purpose
  • Node type selection by org.apache.jackrabbit.commons.iterator.FilteringNodeIterator-- Combination of JCR filter functionality and already implemented info.magnolia.jcr.predicate.NodeTypePredicateas parameter avoids necessity of writing selection method
    • Query result can be wrapped by this if optional parameter (NodeType) is provided
    • If type is not provided then default only default type - mgnl page - is returned

API changes

info.magnolia.cms.util.DateUtil+ public static String createDateTimeExpressionIgnoreTimeZone(Calendar calendar)
+ public static String createDateTimeExpression(Calendar calendar)
+ public static String createDateExpression(Calendar calendar)

info.magnolia.cms.util.QueryUtil+ public static NodeIterator search(String workspace, String statement, String language, String returnItemType, long maxResultSize)
+ public static NodeIterator search(String workspace, String statement, String language, String returnItemType)
+ public static NodeIterator search(String workspace, String statement, String language)
+ public static NodeIterator search(String workspace, String statement)

info.magnolia.jcr.util.NodeUtil+ public static Collection<Node> getCollectionFromNodeIterator(NodeIterator iterator)
+ public static NodeIterator filterNodeType(NodeIterator iterator, String nodeType)

  • No labels

6 Comments

  1. Looks fine for me.

    One thing we still have to consider. The returnItemType was for example used to aggregate pages while the actual content is actually found in components or the meta data sub nodes. Either we build an iterator adding the mechanism so that it aggregates ad-hoc or we drop the parameter assuming that one can formulate such queries now (I think it works but we need some examples).

    • find all pages having the template news
    • find pages with a certain text and ordered by creation date (the date is part of the metadata sub node)
    1. We discussed this. It should be possible to write a FilteringNodeIterator that would provide such functionality. But if it indeed works out of the box it would be best. Let's try.

      One more thing that just occurred to me is that this API still doesn't allow use of JCR-QOM since it requires input as string right? Maybe we should add something for that.

  2. > find pages with a certain text and ordered by creation date (the date is part of the metadata sub node)

    Not sure if you are already aware of this, but ordering on metadata properties (in general, on properties on subnodes) works fine if the property is aggregated in the indexing configuration. I don't remember if we ever discussed to add this configuration by default before, but we are used to add the following in our projects to make ordering work without doing nasty things (like searching for metadata nodes and manually get the parent to return pages in query results):

    <aggregate primaryType="mgnl:content">
    <include>mgnl:creationdate</include>
    <include-property>MetaData/mgnl:creationdate</include-property>
    </aggregate>

    1. Cool, do you have any more such goodies? Can we create concept page for changing indexing configuration and collect there all the changes you would like to see in the indexing by default? We still have time to pull it in 5.0.

      1. sure, I will have to investigate on what could be "standardized" but there are several nice things that could be pulled in:

        • ordering on metadata properties
        • exclusion of magnolia metadata properties and standard jcr properties (e.g. jcr:author) from fulltext search and excerpts
        • indexing of paragraphs into page fulltext by default (not sure if this can achieved using nodetypes)
        • configuring a default Analyzer which normalizes extended chars
        • a good default excerpt provider
        • a default spell checker configuration for search suggestions
        • a sample config for tags/categories which should not be tokenized by default

        this is all we usually configure that I can remember at the moment... we worked a lot with indexing so I can just collect some goodies from our projects, the only thing I should remark is that we recently had a few issues with jackrabbit 2.4.x where some of the usual config stopped working :/

        1. Jan, can you link to the further changelog/concept page about these changes ? Sounds interesting, but might "conflict" with MetaData as mixin ?