You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Abstract

This concept page defines some rough ideas that could be used to either refine search in the ressources module using SolR or make a more generic faceted search using only Solr's faceted search.

Choice 1: Enhancing the resources module

Goal

The goal is to do a refined search inside the selected items, so the existing resources module will be enhanced with a customized search to search inside selected content.

Benefits

Search keywords inside the resources, assist the user with a search inside the resources.

Implementation possibilities

Create an new search box which talks to solr and filters on the url it is in as prefix, call itself to display the results or show another page outside the resources module page.

Drawbacks/Difficulties

No use of solr's faceted search will be made, the implementation will not be that generic.

Choice 2: Using SolR's faceted search

Goal

The goal is to use the faceted search offered by SolR to implement a generic faceted search on all website content.

Benefits

A generic approach with different visualizations are possible, faceting can be done on all the content, not only on resources.

Implementation possibilities

The URL splitted in paths could define a nice already existing categorization, for instance a faceted search today on teh corp website with a url categorization gives teh following results.

<lst name="facet_fields"><lst name="url"><int name="cms">529</int><int name="magnolia">529</int><int name="20011">524</int><int name="test">524</int><int name="community">168</int><int name="conference">137</int><int name="program">124</int><int name="company">97</int><int name="our">97</int><int name="news">93</int><int name="clients">91</int><int name="press">72</int><int name="day">68</int><int name="references">66</int><int name="www">64</int><int name="archive">62</int><int name="releases">62</int><int name="youtube">56</int><int name="embed">54</int><int name="2010">53</int><int name="speakers">53</int><int name="partner">50</int><int name="presentation">45</int><int name="old">43</int><int name="country">39</int><int name="partners">38</int><int name="amplify">35</int><int name="miami">35</int><int name="presentations">31</int><int name="4">30</int><int name="case">24</int><int name="studies">24</int><int name="dms">23</int><int name="landing">23</int><int name="features">22</int><int name="newsletter">22</int><int name="5">21</int><int name="de">21</int><int name="industry">21</int><int name="0">18</int><int name="pdf">17</int><int name="1">16</int><int name="2">16</int><int name="3">14</int><int name="level">13</int><int name="top">12</int><int name="8">11</int><int name="and">11</int><int name="briefs">11</int><int name="tech">11</int><int name="us">11</int><int name="coverage">10</int><int name="management">10</int><int name="directory">9</int><int name="presence">9</int><int name="products">9</int><int name="resource">9</int><int name="services">9</int><int name="virtual">9</int><int name="2009">8</int><int name="9">8</int><int name="mbc">8</int><int name="release">8</int><int name="spotlight">8</int><int name="support">8</int><int name="webinars">8</int><int name="contact">7</int><int name="industries">7</int><int name="location">7</int><int name="logos">7</int><int name="workshops">7</int><int name="7">6</int><int name="brief">6</int><int name="eps">6</int><int name="jobs">6</int><int name="navy">6</int><int name="t">6</int><int name="20">5</int><int name="a">5</int><int name="c">5</int><int name="development">5</int><int name="e">5</int><int name="enterprise">5</int><int name="evaluation">5</int><int name="open">5</int><int name="pr">5</int><int name="robots">5</int><int name="roles">5</int><int name="shirt">5</int><int name="static">5</int><int name="the">5</int><int name="travel">5</int><int name="txt">5</int><int name="venue">5</int><int name="visit">5</int><int name="workshop">5</int><int name="all">4</int></lst>

Filtering out the irrelevant tags could give a nice generic auto categorization. Of course this does not take in consideration the user defined categories through the category module.

Drawbacks/Difficulties

The Solr indexing is URL based, so there need to be way to either add those user selected categories to the URL which would be difficult I guess, or catch the categories from the URL and send them to the solR index.

This would be possible if there is a way to get the rootnode from teh url and browse the JCR to gather the different associated categories.

This will work only if there are no multiple categorizations present in the page on different subcontents !

 

Choice 3: Tagging content inside the page ( meta and micro tags )

Goal

The goal is to enhance the categorization module to tag content inside the page, either for the whole page (meta tag in header or in the div), this way they can be picked up by an external parser or search engine and offer SEO enhancement and in house faceting.

Benefits

Standardized categorization and content tagging, easily exploitable by standard parser tools and search engines.

Micro tags could as well be used to tell the custom magnolia extractor not to index certain content, for a complete page, "robots.txt" can be used

Implementation possibilities

Drawbacks/Difficulties

Multiple categorizations by page

 

  • No labels