Multi-Codepage Zope and Structured Text

I was recently trying to find a solution for a problem with code pages in Zope, in conjunction with Structured Text. The basic problem, which is discussed all over the web, is this: the correct code page has to be configured in the running Zope instance to make STX work correctly with “non-US” ASCII characters. If this is not done, there’ll be problems with special formatting, because STX can’t find the borders of words correctly.

Example: *zaczęło* should really appear as zaczęło and Frühstück as Frühstück, they appear as zaczęło and Frühstück, respectively. Or one of them would work, but not the other, because these two words need different code pages: zaczęło needs ISO 8859-2 and Frühstück needs ISO 8859-1.

The problem is that STX is dependent on the code page configured for the running Zope instance. As there can obviously be only one code page configured at a time, this is a problem when there are pages that need one code page and others that need another one. Two solutions seemed useful, but didn’t prove to really work in the end:

  1. Not using STX. There are other variants of the Structured Text concept, which might not have the same problems. I didn’t really check no this because it wasn’t an option in my situation; too much text had already been written in STX format over the course of several years and it had been hard enough to have users understand that concept the first time.
  2. Using Unicode, just like I’m doing in this page, to be able to mix characters from different code pages. I actually tried going this way, but it didn’t work. While the Unicode encoding would allow for the characters to be shown correctly by the end users’ browsers, the STX implementation still had the same problems as before, parsing the strings when formatting was in use.

In the end, my workaround was simple: I changed the regular expressions in the STX implementation to work with whatever characters there might be between the start and end markers. The original expressions went out of their way to make sure there would only be specific characters, and that was the core of the problem because that character set was the one that would be defined depending on the code page.

For example, the line where the strong marker was searched for looked like this:

expr = re.compile(r'**([%s%s%ss]+?)**' % (letters, digits, strongem_punc)).search

I used this expression instead:

expr = re.compile(r'**([^*]+?)**').search

I made similar changes in the places where the em and ul markers are checked. Of course, my changes allow a wider range of characters between the markers than would originally be allowed, which may have its own problems. But the workaround seems to work fine for me, as long as no real fix is available (and there probably won’t be because I don’t think anyone’s really still working on the old STX implementations).

Click here to download a patch file with all changes I made to DocumentClass.py.

2 Comments on Multi-Codepage Zope and Structured Text

  1. I am working on an updating stx. Please see http://mcelrath.org:9675/newstx/FrontPageUnfortunately I’ve been very busy as of late but I intend to get this merged with zwiki soon.P.S. how are you using wordpress and stx, or is wordpress just for your blog and you have another site?– Bob

    Like

  2. Sounds interesting. So this is going to be usable as a drop in replacement for the old traditional STX, but with support for Unicode? And I won’t have to make people reformat all their texts? That would be nice!I’m using WordPress only for the blog. The site I’m referring to is a customer’s site, or rather there are several “national” sites running from the same Zope instance and from the same global templates, but with localised text.

    Like

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s