Re: smart quotes, et al (was: Re: The 1.0 Jobs List ( :-) )


Subject: Re: smart quotes, et al (was: Re: The 1.0 Jobs List ( :-) )
From: Justin Bradford (justin@ukans.edu)
Date: Fri Jun 16 2000 - 13:43:35 CDT


> I agree with the latter characterization. After import of an MSWord
> smart quoted document, one sees these character codes (I haven't
> verified that those are the appropriate Unicode values):
>
> 0x2019 apostrophe
> 0x201c open double quotes
> 0x201d close double quotes

I just did, and yes, those are the correct unicode values.
wv converts everything out of the MS-encoding (or whatever the
Word document is in) to Unicode for us. We should always be
getting pure Unicode back from the Word imports.

> I think the fonts that AbiWord uses don't have glyphs in those
> positions, so GDK just draws a zero-width character there. If you use
> cursor keys to navigate, you see that it takes an extra step to get
> past where you know there is an invisible smart quote character.

Yes. The fonts being used by people seeing this problem don't support
these characters. We should be converting from Unicode to whatever
the encoding is of the current font. I don't believe that we currently
do this on unix machines.

However, I'm not sure how that will deal with "missing" characters. Since
there is no close quote in most encodings, I'm not sure if it would
actually be translated. We can test this pretty easily, though.

> For me, the problem decomposes into two related problems: First, there
> is always the possibility of characters that don't have glyphs in the
> current font. Second, there is the algorithmic stuff affiliated with
> the particular case of smart quotes. (Well, OK, three problems: the
> interface to ispell doesn't recognize the smart quote values as
> punctuation, so any smart-quoted words show as spelling errors.)
> Obviously, my experiment only examined the first problem. Some
> approaches with suggest themselves to me:

Well, ispell has many, many issues. I'm not really surprised that it fails
with this.

But, more generally, we should try to map using iconv (or corresponding
system libraries) first. If they don't deal well with cases like these
quotes, then we'll probably need to implement a special case mapping
system.

Justin



This archive was generated by hypermail 2b25 : Fri Jun 16 2000 - 13:43:41 CDT