NextToken() in RTF importer - code review


Subject: NextToken() in RTF importer - code review
From: Matti Picus (matti@picus.org)
Date: Thu Sep 20 2001 - 05:54:07 CDT


This is pretty esoteric, I am looking for comments from those who understand.

Last week I modified NextToken to eat leading spaces. I did this to deal
with the following font definition:
{\f3\fnil \fcharset0\fprq0 Sans Serif 10cpi;}

However, now importing unknown fields is broken, as importing
{\field\fldedit{\*\fldinst {\rtlch \af28 \ltrch title}}{\fldrslt {\rtlch
\af28 \ltrch A Multiword Value For A Title Field}}}
results in AbiWord displaying
AMultiwordValueForATitleField
since NextToken is used to parse out the fldrslt value.

I went back to the RTF standard, and I quote:
-----------------------------------------
A control word takes the following form:
\LetterSequence<Delimiter>
Note that a backslash begins each control word.
The LetterSequence is made up of lowercase alphabetic characters between
"a" and "z" inclusive. RTF is case sensitive, and all RTF control words
must be lowercase.
The delimiter marks the end of an RTF control word, and can be one of the
following:
· A space. In this case, the space is part of the control word.
· A digit or a hyphen (-), which indicates that a numeric parameter
follows. ...
· Any character other than a letter or a digit. In this case, the
delimiting character terminates the control word but is not actually part
of the control word.
If a space delimits the control word, the space does not appear in the
document. Any characters following the delimiter, including spaces, will
appear in the document. For this reason, you should use spaces only where
necessary; do not use spaces merely to break up RTF code.
--------------------------------------------

According to this quote, the correct behavior is for NextToken to swallow
the delimiter if it is a space. Removing leading spaces is a big nono, and
the application that created bug 1211 (document 1207.rtf in the test/bugs
directory) is creating non-valid RTF.

I propose we stick to the standard, and closing bug 1207 by stating that
the RTF file is non-compliant. This requires backing out the change I made
to ie_imp_RTF.cpp, specifically function NextToken where:
-----------------------
        while( pKeyword[0] == ' ')
        {
        if (!ReadCharFromFile(pKeyword))
        {
                tokenType = RTF_TOKEN_ERROR;
        }
        }
-----------------------
should be returned to:
----------------------
        if (!ReadCharFromFile(pKeyword))
        {
                tokenType = RTF_TOKEN_ERROR;
        }
---------------------

What are your thoughts?

Matti



This archive was generated by hypermail 2b25 : Thu Sep 20 2001 - 04:54:48 CDT