Re: BUG 536


Subject: Re: BUG 536
From: Alan De Smet (chaos@highprogrammer.com)
Date: Tue May 02 2000 - 16:04:49 CDT


Harald Fernengel (harry@bnro.de) wrote:
> I looked at Bug #536 and found out that the reason of the crash lies
> in IE_Imp_RTF::ReadFontTable(). This function needs a rewrite to
> recognize information in subblocks and the font name itself
> properly.
>
> Alan De Smet posted a patch a while ago (04/19/00) which should
> solve the problem but it is only a quick workaround. My suggestion
> is to not apply Alan's patch but to rewrite the ReadFontTable()
> function (yes, I'll do it).

Since RTF is the least format damaging format I can get files
from my Macintosh MSWord 5 using friend, I'm very interested in
the quality of the RTF importer. :-) At the moment, I know there
are issues with the font table reading and some symbols
(Smart-quotes seem to get thrown away).

When I went looking for the font table issue, I found that the
code for parsing it simply wasn't as robust as it could be. In
particular, different RTF exporters each have their own quirks
that should be handled gracefully by an importer. The patch I
submitted was a "just work, darn it" patch, by no means any sort
of correct solution. In my "Copious Free Time", I intended to
seriously compare IE_Imp_RTF's implementation with the official
spec and the unofficial spec (aka "what Word really spits out"),
and see what I could do to make it more robust. Obviously I
haven't gotten around to it yet. :-(

A related problem is that the RTF importer simply gave up on the
first unexpected data. Given that it's certainly possible to
keep parsing and extracting useful data from an RTF, this didn't
seem like a good idea. Is it reasonable for AbiWord to output
"Hmmm, I got what I could out of this file, but there was stuff I
didn't recognize, so your document might be messed up, sorry,"
and try to keep going?

-- 
Alan De Smet  -  chaos@highprogrammer.com  -  http://highprogrammer.com/alan



This archive was generated by hypermail 2b25 : Tue May 02 2000 - 16:04:57 CDT