Re: CJK line breaking

From: Tomas Frydrych <tomasfrydrych_at_yahoo.co.uk>
Date: Tue Mar 01 2005 - 08:48:12 CET

Hi Roland,

> bool GR_Graphics::canBreak(GR_RenderInfo & ri, UT_sint32 &iNext, bool /* bAfter */, UT_UCS4Char c2)
> {
> bool retval;
> iNext = -1; // we do not bother with this
> UT_return_val_if_fail(ri.m_pText && ri.m_pText->getStatus() == UTIter_OK, false);
>
> *(ri.m_pText) += ri.m_iOffset;
> UT_return_val_if_fail(ri.m_pText->getStatus() == UTIter_OK, false);
>
> /*
> * For CJK we need to consider both this character and the next one.
> */
> UT_UCS4Char c = ri.m_pText->getChar();
> UT_uint32 iPos = ri.m_pText->getPosition();
> ri.m_pText->setPosition(iPos+1);

You could just use the ++ operator instead of that, and the code will
probably more efficient with it.

> UT_UCS4Char c2 = ri.m_pText->getChar();
> ri.m_pText->setPosition(iPos);

The last line is not needed, since you do not access the iterator anymore.

> UT_DEBUGMSG(("canBreak: char1: %x, char2: %x\n",c,c2));
>
> // Is this a CJK character? (Note these values may be incorrect)
> if ((c>0x3400 && c<0x4dbf) || (c>0x4e00 && c<0x9faf) ||
> (c>0xf900 && c<0xfaff) || (c>0xfe30 && c<0xfe4f) ||
> (c>0x20000 && c<0x2a6df) || c==0xff0c)
> {
> if (c!=0xff0c && c2==0xff0c)
> return false;
> return true;
> }
>
>
> UT_return_val_if_fail(getApp(), false);
> return getApp()->getEncodingManager()->can_break_at(c);
> }
>

The iterator pointed to by ri.m_pText has its upper bound set at the end
of the current run of text and the c2 character is probably in the next
run. I am not sure why I set the upper boundary, and it would be
possible to leave the iterator to run till the end of the paragraph --
all that should be required is to comment out the setting of the upper
limit in the three fp_TextRun functions that call GR_Graphics::canBreak().

I have got to go now, but please ask if you need more information.

Tomas
Received on Tue Mar 1 08:53:34 2005

This archive was generated by hypermail 2.1.8 : Tue Mar 01 2005 - 08:53:34 CET