The indexing configuration shipped with Magnolia so far has never been tweaked/optimized (there is no indexing config file included at all and the searchIndex config in the main jackrabbit configuration file doesn't contain any interesting feature like spellchecker, analyzers, excerpts handling).

See http://wiki.apache.org/jackrabbit/IndexingConfiguration for details on jackrabbitindexing configuration

Following some samples that have been discussed during the unconference at #mconf12 with Jan. We should investigate on which features could be included by default in Magnolia 5.

A few utility classes and sample configurations are included in openmind criteria API

 

Search index configuration

<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">

<param name="path" value="${rep.home}/indexes/${wsp.name}" />
<param name="indexingConfiguration" value="/indexing_configuration.xml" />
<param name="excerptProviderClass" value="it.openmindonline.xxxx.HTMLExcerpt" />
<param name="analyzer" value="it.openmindonline.xxxxx.IndexAnalyzer" />
<param name="spellCheckerClass" value="it.openmindonline.xxxx.SpellChecker" />
<param name="supportHighlighting" value="true" />
<param name="useCompoundFile" value="true" />
<param name="cacheSize" value="10000" />
<param name="initializeHierarchyCache" value="false" />
<param name="enableConsistencyCheck" value="true" />
<param name="forceConsistencyCheck" value="false" />
<param name="autoRepair" value="true" />
<param name="textFilterClasses" value="" />

</SearchIndex>

 

indexing_configuration.xml sample

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.2.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0" xmlns:mgnl="http://www.magnolia.info/jcr/mgnl"
xmlns:jcr="http://www.jcp.org/jcr/1.0">

   <!-- custom analyzers for specific properties. For example, don't tokenize tags/keywords -->
   <analyzers>
      <analyzer class="org.apache.lucene.analysis.KeywordAnalyzer">
         <property>tags</property>
      </analyzer>
   </analyzers>


   <index-rule nodeType="nt:hierarchyNode">
      <property boost="10" useInExcerpt="false">title</property>
      <property boost="1.0" useInExcerpt="true">text</property>
      <!-- exclude jcr:* and mgnl:* properties -->
      <property isRegexp="true" nodeScopeIndex="false" useInExcerpt="false">.*:.*</property>
   </index-rule>
   <index-rule nodeType="mgnl:contentNode">
       <property boost="5" nodeScopeIndex="false" useInExcerpt="false">title</property>
       <property boost="2" nodeScopeIndex="false" useInExcerpt="true">text</property>
       <!-- exclude jcr:* and mgnl:* properties -->
       <property isRegexp="true" nodeScopeIndex="false" useInExcerpt="false">.*:.*</property>
   </index-rule>


  <!-- index text content on paragraphs. Can this be configured using nodetypes only? -->
  <aggregate primaryType="mgnl:content">
    <!-- aggregates content on the main column -->
    <include primaryType="mgnl:contentNode">nomeoftheareanode/*</include>
  </aggregate>
   

  <!-- index metadata attributes inside the main node, to allow sorting! -->
  <aggregate primaryType="mgnl:content">
     <include>mgnl:creationdate</include>
     <include-property>MetaData/mgnl:creationdate</include-property>
  </aggregate>
  <aggregate primaryType="mgnl:content">
     <include>mgnl:lastmodified</include>
     <include-property>MetaData/mgnl:lastmodified</include-property>
   </aggregate>
   <aggregate primaryType="mgnl:content">
     <include>mgnl:template</include>
     <include-property>MetaData/mgnl:template</include-property>
   </aggregate>


</configuration>

 

 

 

  • No labels

1 Comment

  1. I did something similar (although less "advanced") with the forum - at the very least, there's a specific config file and excerpt provider (smile)