UTF-8 page names

Created by Fabrizio Giustina, last modified by Ruth Stocks on 2013-05-15

Implemented in 4.3

Official Documentation Available

This topic is now covered in i18n and l10n > Authoring.

If the property magnolia.utf8.enabled is set to true UTF8 page names are accepted.

UTF8 for page names

The simple target is to be able to use non-ascii chars for page names (actually, for any node in the repository)

The current idea and status is described in the MAGNOLIA-3009 jira, and we have more info on

JCR supports non-ascii chars in node names, but we have to be sure that everything is encoded (or undecoded) properly.

The task may be broken in two steps:

review the reading/writing of nodes in the repository (server side), given that input values are properly encoded
review the decoding/handling of http requests

We will first approach item one, creating a set of unit tests for checking the base operations on nodes with extended chars (read, write, update, delete - from simple west-european chars to chinese). Magnolia can already read nodes with extended characters from the repo, but we will probably have to check carefully the escaping or removal of unwanted chars. At this moment everything is filtered by Path.getValidatedLabel() which just drops everything.

Item 2 looks a lot more complex. It involves:

- properly decode requests. Note than in 4.2 URLDecoding of request path has been removed, but we will have to put it back, since it's needed for some browsers (surely needed for firefox and not needed for IE). This anyway should have nothing to do with UTF8 normalization (URLDecoding, escaped chars are not UTF8)

- properly handle NFC/NFD strings, both in paths than in parameters

- review javascript calls in order to encode paths. Carefully check trees and dialogs with IE/Firefox/Chrome/Safari on Windows/Mac/linux

No labels

1 Comment

Magnolia International
Regarding module "store":
- we have some code and p-o-c's (some partially implemented, some functional) that could help leveraging information found from the Maven repositories. For 4.3, however, I'd avoid any kind of server-side tooling for now. Grab the content off of the wiki or the documentation site and inline it.
- I've wanted for a while to add a "module description" kind of paragraph on the documentation site. This would feed itself from the module descriptor and/or pom to display module version information, basic description, dependencies, etc. (i.e the only parameter in its dialog would be a link to a jar file or source files). This could in turn generate the table we have at http://documentation.magnolia-cms.com/modules.html instead of maintaining it manually. The data module could also be used (either to store data we don't/can't have in the module descriptor, or to store it all if that's more straightforward to start with). Such a page could be used for the above point (possibly with a different sub-template, for example)
- we can consider adding some of the information in the module descriptor, such as vendor/provider, license, although these are redundant with the pom. See Module Descriptor Generator (maven plugin).
  Also see Concept Module downloader updater for previous thoughts/research done about this.
Regarding utf-8 support:
- NFC/NFD might be a big mess to try and fix. Sticking to a fixed normalization seems like a fairly big overload to fix just a couple of corner cases of browser misbehavior (in most cases, they use one form or the other for the content they "create" (ie if use types something in a form) and keep whatever form is used when sending data back (ie if the data was already in the page)) As far as I can remember, one of the issues was when Safari was changing the form.
- Keep in mind another, vaguely related issues, which has to do with the support of dots in node names. While this is also something JCR permits, and also something we'd like to allow, it generates a whole bunch of other issues with the current codebase and features (subtemplates or selectors come to mind).
Regarding cache:
- new module? Essentially, what I'd like to see instead, to solve the stream issue of large items, is replacing Object get(Object key); by (or adding) void stream(Object key) to the cache interface. One can imagine a Cache impl that will be smart enough to pick stuff up where it's been stored (fs or ehcache for instance).
In general:

Thanks for the proposals! It would help, however, to keep them on separate pages, for commenting, validating and keeping track.
- Permalink
- 2010-03-01

Page tree

UTF-8 page names

UTF8 for page names

1 Comment

Magnolia International

Regarding module "store":

Regarding utf-8 support:

Regarding cache:

In general: