Re: CJK patch (was Re: pango)

From: Roland Kay <roland.kay_at_ox.compsoc.net>
Date: Mon Mar 21 2005 - 13:46:50 CET

Ok. Here it is. This hasn't been extensively tested but it seems to do "the
right thing" for simple Chinese and English documents. The code is much simpler
than before for three reasons:

        a, It treats character pairs rather than triplets.
        b, The duplication of information has been removed.
        c, It uses a generic Boolean function to make the decisions
           rather than a complicated set of nested "if" statements.

The rule table defining the Boolean function is a little opaque, but I'll add
an extensive comment if this gets integrated. One advantage over the
previous model is that any set of rules compatible with the current
categorisation of characters can be implemented without changing the code.

Another advantage results from treating character pairs rather than asking
whether we can break before or after a character. Now, hopefully, the
responses from GR_Graphics::canBreak() will be consistent regardless of
how we move through the text. This wasn't the case in the original code
where in the string "hello goodbye" canBreak would return "true" if asked
about breaking before the " " but false after the "o".

This patch causes the same asserts on the document I posted to the list
before as the old patch. This suggests that this is indeed as a result of
commenting out the calls to setUpperLimit() since that the only thing the
two patches have in common.

Please let me know what you think.

Best wishes,

R.

Received on Mon Mar 21 13:47:19 2005

This archive was generated by hypermail 2.1.8 : Mon Mar 21 2005 - 13:47:21 CET