Indexing Configuration
The configuration parameter indexingConfiguration
is not set by default. This means all properties of a node are indexed.
If you wish to configure the indexing behaviour you need to add a parameter to the SearchIndex
element of either your repository configuration file or your workspace configuration file.
Any time you make changes to the indexing configuration do not forget to recreate the index from scratch.
See https://wiki.apache.org/jackrabbit/IndexingConfiguration
Configuration files
Indexing configuration file should be located in the package info.magnolia.jackrabbit
.
- indexing_configuration.xml
- indexing_configuration_default.xml
- indexing_configuration_website.xml
- indexing_configuration_dam.xml
- indexing_configuration_tasks.xml
To optimize the index size you can index only certain properties of a node type. Index rules are processed top down and the first matching rule gets applied and all remaining ones are ignored.
As of Jackrabbit 2.0 you can also use the match all regex for the namespace prefix part of a property name. However that's currently the only supported regular expression. Please note that you have to declare the namespace prefixes in the configuration
element that you are using throughout the XML file.
With the nodeScopeIndex
attribute set to false
the property will not be in the full-text index. Meaning it would be available for all searches except for those using contains(...)
in sql
and sql2
or jcr:contains(...)
for xpath
.
Here we are applying an index rule against nodes of type nt:base
. This also applies to nodes with a type that extends from nt:base
. Since nt:base
is the base node type of all primary nodes types this rule will apply everywhere.
<index-rule nodeType="nt:base"> <property isRegexp="true" nodeScopeIndex="false">mgnl:.*</property> <!-- Exclude Magnolia metadata from the full-text index. --> <property isRegexp="true" nodeScopeIndex="false">jcr:.*</property> <!-- Exclude JCR metadata from the full-text index. --> <property isRegexp="true">.*:.*</property> <!-- Include all properties from any namespace, even the empty namespace. --> </index-rule>
You may also add a condition to the index rule and have multiple rules with the same node type.
For example, let's say that we only want to boost page titles when the paged has been marked with a priority
property. Further more let's assume we also have a requirement to provide three priority levels of low, medium, and high.
<!-- Since the default boost it 1.0 we don't need to specify it. Anything not medium or high will be considered low. --> <index-rule nodeType="mgnl:page" condition="@priority = 'medium'"> <property boost="3.0">title</property> </index-rule> <index-rule nodeType="mgnl:page" condition="@priority = 'high'"> <property boost="5.0">title</property> </index-rule>
Finally, add a radio button to your page dialog for controlling page priority levels.
You may also reference properties in the condition that are not on the current node and/or specify the type of a node in the condition.
It is possible to configure boost
value on both nodes and/or properties that match an index rule. The default boost
value is 1.0
. Higher boost
values (a reasonable range is 1.0 - 5.0
) will yield a higher score value and appear as more relevant.
Here we are applying a boost
value of 3.0
added to the title
property on nodes of type mgnl:page
.
<index-rule nodeType="mgnl:page"> <property boost="3.0">title</property> </index-rule>
Sometimes it is useful to include the contents of descendant nodes into a single node to easier search on content that is scattered across multiple nodes.
Here we create an index aggregate on mgnl:page
that includes the content of mgnl:area
and mgnl:component
. This will make it easier to search content on a page that is located in one of its area or component subnodes.
<aggregate primaryType="mgnl:page"> <include primaryType="mgnl:area">*</include> <include primaryType="mgnl:component">*</include> </aggregate>
With this configuration part, you define how a property should be analyzed.
For example, let's say I wanted to target properties which I know store German language content with a German language analyzer.
<analyzer class="org.apache.lucene.analysis.de.GermanAnalyzer"> <property>text_de</property> </analyzer>
Custom configuration file
You can create a custom indexing configuration for any workspace. Once created the file can be configured at the workspace.xml file of the workspace you wish to target. Changes to this configuration require a reindexing of the workspace.
An example of this would be the website specific example shown above or the dam specific configuration here:
This shows an example of node data aggregation. Since the magnolia metadata is stored on the mgnl:asset node and the image metadata/data is stored on a mgnl:resource subnode we can aggregate this into one lucene document.