Implemented in 4.5.3

 

Official Documentation Available

This topic is now covered in Google Sitemap module.

Introduction

Migrate the sitemap module in order to support Magnolia 4.5.

Goal

  • Create a standard module with configuration, templates and services (ok).
  • Replace JSP by FTL's (ok).
  • Better configuration (ok).
  • Add Specialized Sitemaps (Specialized Sitemaps) (next module version)
    • Mobiles
    • News
    • Video
    • Geo
    • Code Search
  • Add additional informations (linked to a page) to the SiteMaps (next module version)
    • Images
    • Video
  • Add Junit tests. (ok).
  • Add an AdminCentral Tools menu to access the Edit page (ko)

Global Requirements

Generate XML page(s) containing site informations following the siteMap protocol.

Have an Edit page that allows to add sites/virtualUri as component. The add Dialog should allows to:

  • Add either a new Site or VirtualUri Component.
  • For VirtualUri Component: no additional configuration.
  • For Site Component: a new Dialog is shown:
    • Define the Site root input field.

Have a configuration page (Dialog) that gives the abitiy to define per page the:

  • Change frequence
  • Priority
  • Visibility of the page and children in the SiteMap xml.

Google SiteMap requirements.

Requirements

Points not yet covered (next module version)?

  • A Sitemap file can contain no more than 50,000 URLs and must be no larger than 50MB when uncompressed. If your Sitemap is larger than this, break it into several smaller Sitemaps. These limits help ensure that your web server is not overloaded by serving large files to Google.
  • If you have more than one Sitemap, you can list them in a Sitemap index file and then submit the Sitemap index file to Google. You don't need to submit each Sitemap file individually.
  • As well as basic URL information, Sitemaps can contain detailed information about specific types of content on your site, including video, images, mobile, News, and software source code

Magnolia SiteMap requirements.

Points not yet covered (should we)?

  • Site variations:
  • Multi domains:
    'No. Please list only one version of a URL in your Sitemaps. Including multiple versions of URLs may result in incomplete crawling of your site'...

Google SiteMap Configuration

We should be able to configure:

  • Sites ans subSites to include in the SiteMap Url 
  • Display Virtual Uri or not.

Solutions

  • Create a configuration singleton used by the model and service (configure site to be displayed, date format, ...).
  • Create a Service responsible to perform the nodes search and convert these nodes to beans used for the rendering.
  • Create a new SiteMapModel  that uses the services and configuration singleton.
  • Create FTL's for rendering.

HowTo

Create a new SiteMap page

From the Admin interface, create a new GoogleSiteMap Page. This SiteMap page can be put at any place anywhere in the website tree.


Multiple SiteMap definitions are supported (create a SiteMap for DemoProject and one for DemoFeature or one for Both).

Create site and virtualUri components.

Edit the GoogleSiteMap Page. Add a Site or VirtualUri component by clicking the add (plus) button on component area.

SiteComponent

Selecting this component will open a new Dialog that allows to select the site(s), page(s), or subpage(s) to include in the sitemap.
Add one to n path.

If no entries are selected, nothing will be added to the sitemap.

SiteComponent entries Edit Properties:

Hide in Sitemap. If this checkbox is selected

  • This page will not be included in the sitemap XML.
  • Subpages are included to the sitemap.xml.

Hide all children... If this checkbox is selected

  • This page will be included in the sitemap XML.
  • Subpages are not included to the sitemap.xml.

If both checkbox are selected:

  • This page will not be included in the sitemap XML.
  • Subpages are not included to the sitemap.xml.

VirtualUriComponent.

No dialog is associated to this component. This will directly render all virtualUri defined into this instance.

SiteComponent entry Edit Properties

By selecting Hide in Sitemap, this virtualUri will not be displayed in the sitemap.xml

Access the sitemap xml

Just change the extension of the SiteMap page from .html to .xml.

(warning) Note that no duplicates url's (loc) are displayed. A filter mechanism is responsible to remove all duplicates.

Improvements

New functionalities to add

  • Support file creation
  • Support size and max url's contained in a sitemap.xml
  • Support sitemap index file
  • Support specific sitemap like mobile/images/video sitemap
  • Support images/video information's in a standard sitemap

Most of these functionalities are supported by an external java library (Apache License 2.0)

     <!-- For site map generation  -->
	<dependency>
		<groupId>com.google.code</groupId>
		<artifactId>sitemapgen4j</artifactId>
		<version>1.0.1</version>
	</dependency>

Unfortunately this library has to be forked in order to:

  • Not only be able to create xml files, but also to redirect the xml stream to an outputStream or printwriter (extend SitemapGenerator class)
  • Add tag support for images and videos in a standard sitemap (extend Renderer of the WebSitemapGenerator class)

Task:

  • Create new pages template for Mobile, News, Video.... SiteMap
  • Create page's dialog in order to define the output format (generate files, render xml, ...)
  • Modify the siteMap property dialog in order to support images and videos informations
  • Fork the external java library.

References

Google SiteMap About

Google SiteMap Errors

Best Practices

  • No labels

4 Comments

  1. How you want to handle the problem of:

    - We do not have site aware virtual uri mapping:

    The site map module should only display virtuel URI's which are relevant for the current rendered site, and not all defined virtualURI's.

    A google site mpa module should do in my expectiations:

    - Display only relevant sites of a specific sit ein multi site scope.

    - Need to be mappaable ot a specific site.

    - Only relevant virtual uri's should be taken into consideration.

    - All Mutli site variations to the same site should be taken into account. Such as domain shortenings etc.

  2. Eric, nice work! What I don't understand however is why one needs to create the page components. The site map page template should auto-create these or not?

  3. The dependencies between virtual URI's and their destinations need some thought. Google doesn't like it if we have two URI's pointing to the same content, and they may punish us by dropping such a site from their search index. In other words, a page that has a virtual URI mapping for it (let's call it vanity URL) should ensure that such a page if accessed under its "physical" URI renders a canonical meta tag that points to the vanity URI. This way, Google knows about it and no problems in terms of SEO.

  4. For further improvement, the site map page should provide direct access to the configurable fields. It is quite a pain in the behind to open all these dialogs if you have hundreds of pages & virtual URI's. Related, this info should be accessible directly through the main bar of each page, where the virtual URI mappings should also be listed. But I agree that this is a bit more than a migration of the module to 4.5 (wink)