More RTF importer bug fixes and RFC (PATCH)

From: Roland Kay <roland.kay_at_ox.compsoc.net>
Date: Fri Jun 03 2005 - 06:04:32 CEST

Hi Guys,

Here are two more RTF importer patches. They should be
applied in the following order:

        1, RTF-AltFontName-ver2.patch
        2, RTF-warnings-2.patch

The second one is very simple. It just fixes some warnings
generated by declared but unused variables. One of these was
introduced by my earlier patch. The other two are in the
code that processes the \*\abirevision keyword. The code
looks correct to me with the two unnecessary declarations
removed. However, it might be an idea for whoever wrote that
bit of code just to check.

The first patch is a bit more involved. In Asia Microsoft
Word exports RTF font tables like this:

{\fonttbl
  {\f0\froman\fcharset0\fprq2{\*\panose ...}Times New Roman;}
  {\f17\fnil\fcharset134\fprq2{\*\panose ...}FZSongTi;}
  {\f18\fnil\fcharset134\fprq2{\*\panose ...}\'cb\'ce\'cc\'e5{\*\falt SimSun};}
  {\f19\fnil\fcharset134\fprq2{\*\panose ...}\'cb\'ce\'cc\'e5;}
}

NB: I've abbreviated the panose numbers.

The third entry refers to a font whose name entirely made of
Chinese characters (it's actually "SongTi") encoded in
GB2312.

Without the patch the importer ignores the first "\'" and
then mistakes the first "cb" as the font name. It then
ignores the rest. The result is that any Chinese font like
this gets named with a two letter hex code, which looks
pretty silly. Worse, since AbiWord subsequently find no font
"cb" on the system all the Chinese characters come out as
circles. The only way to view the document is then to
"Select All" and choose a sensible font, which mucks up the
formatting of the document.

With the patch in place, the importer skips escaped hex
sequences in the font names. If an alternative font name is
given and the main font name is blank (either because it was
really blank, or else because it had no ASCII characters)
then the importer substitutes the alternative fontname for
the real one.

The result is that if the exporting application bothers to
give alternative fontname, Chinese fonts have sensible
predictable names. Sadly, while MSWORD gives alternative font
names for the most common CJK fonts, it doesn't do so for
all. Thus, in the case of a font which only has a non-ASCII
name the patch substitutes "UnknownUnicodeFontName".

A fringe benefit is that the font table parser is more robust
than before and will correctly handle strange, but
apparently legal, entries like:

   {\f20\froman Times New {\*\unknowncommand Fibble!}Roman;}

If you're running AbiWord on Linux, this doesn't solve the
problem of Chinese characters being represented as circles
because the names of the Chinese fonts are different from
Windows. This, abi can't find "SimSun", in the case of the
above example, either. I guess this might not be a problem
on a Windows machine. However, since the Chinese fonts now
have sensible names we can create a font alias for SimSun.
Once this is done Chinese documents can be opened and
displayed immediately. By aliasing UnknownUnicodeFontName
to the most common font for their region the user can then
display most of the documents they receive if if the
exporting app doesn't give an alternative name.

On Suse, these aliases can be made by adding the following
to /etc/fonts/local.conf and then running fonts-config.

<!-- Alias SimSum to FZFangSong so that Chinese docs show up in AbiWord
     - R.Kay (02-06-05) -->
<alias>
 <family>SimSun</family>
 <prefer>
  <family>FZSongTi</family>
 </prefer>
</alias>
<alias>
 <family>SimHei</family>
 <prefer>
  <family>FZHeiTi</family>
 </prefer>
</alias>
<alias>
 <family>UnknownUnicodeFontName</family>
 <prefer>
  <family>FZSongTi</family>
 </prefer>
</alias>

Issues outstanding:
-------------------

1, It would be nice if abi could read the real Chinese
        font name rather than relying on the alternative
        name. Modifying the above patch to achieve this is
        trivial, and in fact I already have code to do this
        since that was my original intention.

        Unfortunately, it appears from the code that abi
        assumes that font names will only contain ASCII
        characters. For instance, when building lists of
        character properties the arrays seem to be of
        type XML_Char which if typedefed to char. I'm afraid
        that redefining XML_Char as UCS4 will cause problems
        throughout the program.

        Would allowing font names to contain arbitrary
        unicode characters be feasible in 2.6?

2, OpenOffice can identify the font as Chinese and
        automatically substitute an appropriate alternative
        without needing any font aliases. Does AbiWord have
        any such font substituting capability? Does anyone
        know how OO does this? I gather from some of the
        commons on bugzilla that this may not be abi's job.

References:
-----------

These bug reports are related to these issues:

http://bugzilla.abisource.com/show_bug.cgi?id=3312
http://bugzilla.abisource.com/show_bug.cgi?id=3954

Best wishes,

R.

Received on Fri Jun 3 06:07:33 2005

This archive was generated by hypermail 2.1.8 : Fri Jun 03 2005 - 06:07:33 CEST