Re: More RTF importer bug fixes and RFC (PATCH)

From: Robert Staudinger <robert.staudinger_at_gmail.com>
Date: Fri Jun 03 2005 - 14:09:10 CEST

Hi Roland,

this sounds very exciting!
Could you maybe create screenshots of editing chinese documents? If
the other developers agree we could put them on our website do
emphasize abi's capabilities.

Thanks,
- Rob

On 6/3/05, Roland Kay <roland.kay@ox.compsoc.net> wrote:
>
>
> Hi Guys,
>
> Here are two more RTF importer patches. They should be
> applied in the following order:
>
> 1, RTF-AltFontName-ver2.patch
> 2, RTF-warnings-2.patch
>
> The second one is very simple. It just fixes some warnings
> generated by declared but unused variables. One of these was
> introduced by my earlier patch. The other two are in the
> code that processes the \*\abirevision keyword. The code
> looks correct to me with the two unnecessary declarations
> removed. However, it might be an idea for whoever wrote that
> bit of code just to check.
>
>
> The first patch is a bit more involved. In Asia Microsoft
> Word exports RTF font tables like this:
>
> {\fonttbl
> {\f0\froman\fcharset0\fprq2{\*\panose ...}Times New Roman;}
> {\f17\fnil\fcharset134\fprq2{\*\panose ...}FZSongTi;}
> {\f18\fnil\fcharset134\fprq2{\*\panose ...}\'cb\'ce\'cc\'e5{\*\falt SimSun};}
> {\f19\fnil\fcharset134\fprq2{\*\panose ...}\'cb\'ce\'cc\'e5;}
> }
>
> NB: I've abbreviated the panose numbers.
>
> The third entry refers to a font whose name entirely made of
> Chinese characters (it's actually "SongTi") encoded in
> GB2312.
>
> Without the patch the importer ignores the first "\'" and
> then mistakes the first "cb" as the font name. It then
> ignores the rest. The result is that any Chinese font like
> this gets named with a two letter hex code, which looks
> pretty silly. Worse, since AbiWord subsequently find no font
> "cb" on the system all the Chinese characters come out as
> circles. The only way to view the document is then to
> "Select All" and choose a sensible font, which mucks up the
> formatting of the document.
>
> With the patch in place, the importer skips escaped hex
> sequences in the font names. If an alternative font name is
> given and the main font name is blank (either because it was
> really blank, or else because it had no ASCII characters)
> then the importer substitutes the alternative fontname for
> the real one.
>
> The result is that if the exporting application bothers to
> give alternative fontname, Chinese fonts have sensible
> predictable names. Sadly, while MSWORD gives alternative font
> names for the most common CJK fonts, it doesn't do so for
> all. Thus, in the case of a font which only has a non-ASCII
> name the patch substitutes "UnknownUnicodeFontName".
>
>
> A fringe benefit is that the font table parser is more robust
> than before and will correctly handle strange, but
> apparently legal, entries like:
>
> {\f20\froman Times New {\*\unknowncommand Fibble!}Roman;}
>
> If you're running AbiWord on Linux, this doesn't solve the
> problem of Chinese characters being represented as circles
> because the names of the Chinese fonts are different from
> Windows. This, abi can't find "SimSun", in the case of the
> above example, either. I guess this might not be a problem
> on a Windows machine. However, since the Chinese fonts now
> have sensible names we can create a font alias for SimSun.
> Once this is done Chinese documents can be opened and
> displayed immediately. By aliasing UnknownUnicodeFontName
> to the most common font for their region the user can then
> display most of the documents they receive if if the
> exporting app doesn't give an alternative name.
>
> On Suse, these aliases can be made by adding the following
> to /etc/fonts/local.conf and then running fonts-config.
>
> <!-- Alias SimSum to FZFangSong so that Chinese docs show up in AbiWord
> - R.Kay (02-06-05) -->
> <alias>
> <family>SimSun</family>
> <prefer>
> <family>FZSongTi</family>
> </prefer>
> </alias>
> <alias>
> <family>SimHei</family>
> <prefer>
> <family>FZHeiTi</family>
> </prefer>
> </alias>
> <alias>
> <family>UnknownUnicodeFontName</family>
> <prefer>
> <family>FZSongTi</family>
> </prefer>
> </alias>
>
>
> Issues outstanding:
> -------------------
>
> 1, It would be nice if abi could read the real Chinese
> font name rather than relying on the alternative
> name. Modifying the above patch to achieve this is
> trivial, and in fact I already have code to do this
> since that was my original intention.
>
> Unfortunately, it appears from the code that abi
> assumes that font names will only contain ASCII
> characters. For instance, when building lists of
> character properties the arrays seem to be of
> type XML_Char which if typedefed to char. I'm afraid
> that redefining XML_Char as UCS4 will cause problems
> throughout the program.
>
> Would allowing font names to contain arbitrary
> unicode characters be feasible in 2.6?
>
> 2, OpenOffice can identify the font as Chinese and
> automatically substitute an appropriate alternative
> without needing any font aliases. Does AbiWord have
> any such font substituting capability? Does anyone
> know how OO does this? I gather from some of the
> commons on bugzilla that this may not be abi's job.
>
>
> References:
> -----------
>
> These bug reports are related to these issues:
>
> http://bugzilla.abisource.com/show_bug.cgi?id=3312
> http://bugzilla.abisource.com/show_bug.cgi?id=3954
>
> Best wishes,
>
> R.
>
>
>
>
Received on Fri Jun 3 14:09:29 2005

This archive was generated by hypermail 2.1.8 : Fri Jun 03 2005 - 14:09:29 CEST