Summary

Filter to cleaning up malformed and faulty HTML via the JSoup lib.

Requirements

Magnolia CE/EE 5.3+

Dependencies

<dependency>
	<groupId>org.jsoup</groupId>
	<artifactId>jsoup</artifactId>
	<version>1.7.2</version>
</dependency>

<dependency>
	<groupId>commons-lang</groupId>
	<artifactId>commons-lang</artifactId>
	<version>2.4</version>
</dependency>

This Module will not work with Magnolia 5.4.x until you bypass DAM and imaging. Download settings below or add it by yourself. Magnolia 5.4.x versions will be available at the magnolia marketplace.

config.server.filters.tidy.bypasses.dam.xmlconfig.server.filters.tidy.bypasses.imaging.xml

or add it by yourself:

What is getting installed

The module installs a new Filter below the Cache Filter.

By Default the Filter is not enabled

Screenshots

Without Tidy FilterWith Tidy Filter enabled

Download

Maven

<dependency>
	<groupId>de.lemonize.magnolia.tidyfilter</groupId>
	<artifactId>magnolia-tidyfilter</artifactId>
	<version>1.0.2</version>
</dependency>

Nexus

https://nexus.magnolia-cms.com/service/local/repositories/magnolia.forge.releases/content/de/lemonize/magnolia/tidyfilter/magnolia-tidyfilter/1.0.2/magnolia-tidyfilter-1.0.2.jar

Version History

1.0.0

  • First Release

1.0.1

  • Tested with Magnolia 5.3. The Module is now not anymore depend on Magnolia 5.3.6+

1.0.2

Credits

  • Vivian Steller from lemonize: he originally programmed this filter
  • Gregory Joseph from Magnolia for helping with Maven and Forge

 

 

 

  • No labels

9 Comments

  1. @tomwespi thanks for your efforts releasing the module!

  2. Hi Vivian Steller, thanks for the module. Trying to install using Magnolia 5.5.5.

    So far everything logs good. But got the problem that the content type of the response is null. Means your filter does not do anything. The sequence of my filters looks same as in your documentation. Any idea what may go wrong? Ever tested with Magnolia 5.5?

    1. Hi

      Is the filter enabled? Not tested with 5.5.x

      1. Yes, filter is enabled. Problem is that response.getContentType() resolves to null.

        If modifying the code filter works fine and HTML gets well formatted.

        1. Ah yes, there was some change, I did also some local changes but its not committed:


          @Override
          public void doFilter(HttpServletRequest request, HttpServletResponse response, FilterChain chain) throws IOException, ServletException {
          
              BufferedHttpResponseWrapper wrappedResponse = new BufferedHttpResponseWrapper(response);
          
              String extension = MgnlContext.getAggregationState().getExtension();
              // We assume it is a html page when no extension is set
              if (extension.isEmpty()) {
                  extension = DEFAULT_EXTENSION;
              }
              
              boolean doNormalFilter = true;
              if (extension.equalsIgnoreCase(DEFAULT_EXTENSION)) {
                  chain.doFilter(request, wrappedResponse);
                  cleanupHtml(wrappedResponse, response);
                  doNormalFilter = false;
              }
          
              if (doNormalFilter) {
                  chain.doFilter(request, response);
              }
          }
  3. Thanks! First view - it works. Can you please update Maven/Nexus repo?

  4. Hi Tom Wespi, another point... We actived the filter; basically everything fine and working.

    But then I recognized that specials chars (e.g. äöü) become corrupt. We use normal UTF-8 settings. Without filter all chars are fine. Any idea?

    1. It has probably something to do with your JVM or your system setting, check if everywhere UTF-8 is set. I will also ask my hosting provider, we had this problem but solved it quite a time ago

  5. Hi Tom Wespiis this module still active? Which Magnolia versions does it support? I'd like to add support badges to it - like in the other modules if it is still active.