Re: RTF importer - Asian font names (PATCHES)

From: Hubert Figuiere <hfiguiere_at_teaser.fr>
Date: Sun Jun 05 2005 - 17:17:53 CEST

Roland Kay wrote:
>
> OK. Things have got a bit complicated so I'm attaching all
> the necessary patches to this email. They should apply in
> any order but on my system I apply them in this order:
>
> 1, RTF-AsianFontNames.patch (new)
> 2, RTF-warnings-2.patch (same as previous post)
> 3, XML-Props.patch (same as previous post)
>
>
> No. 1 is the finalised RTF Asian font names patch. This
> reads the escaped hex multi-byte font names used in Asia and
> stores them as UTF-8 in the document. Thus, Chinese users get
> to see the font names in Chinese characters if this was how
> they were encoded in the document. Also, with appropriate
> font installation or font aliasing, all valid documents can
> be displayed without all the characters turning into
> circles.
>
> I've gone back to using UT_String and not UT_UTF8String
> because the use of UT_UTF8String string was corrupting the
> Chinese font names. The reason is as follows:
>
> In China MSWord exports RTF with the font names encoded in
> the GB charset. I read these in one character at a time into
> a UT_String. Once the entire font name has been read I
> convert the string to UTF8 and hand it to
> RTFFontTableItem(). Thus, the UT_String never holds UTF8. IN
> fact, it holds the font name in the native character set.
> Trying to append the (8 bit) GB characters to a UT_UTF8String
> causes them to become corrupted.
>
> It seems much more sense to me to read the entire string in
> and convert it in one go, rather than trying to convert it
> one character at a time.

Then that is a job of UT_ByteBuf, not for UT_String. Because we just
store arbitrary bytes into a buffer.

Hub
Received on Sun Jun 5 17:18:11 2005

This archive was generated by hypermail 2.1.8 : Sun Jun 05 2005 - 17:18:11 CEST