Re: smart quote algorithm


Subject: Re: smart quote algorithm
From: Karl Ove Hufthammer (huftis@bigfoot.com)
Date: Thu Jul 20 2000 - 03:13:43 CDT


----- Original Message -----
From: "WJCarpenter" <bill-abisource@carpenter.ORG>
To: "AbiWord Mailing List" <abiword-dev@abisource.com>
Sent: Thursday, July 20, 2000 8:55 AM
Subject: smart quote algorithm

| PUNCT A subset of layman's "punctuation". I include only things that
| can normally occur after a quote mark with no intervening white
| space. Includes period, exclamation point, question mark,
| semi-colon, colon, comma, parentheses, square and curly brackets.
| There may be a few others that aren't on the kinds of keyboards I
| use, and there are certainly Latin1 and other locale-specific
| variants, but the point is that there are lots of random
| non-alphanumerics which aren't included in PUNCT for this algorithm.

We can probably use all characters defined as 'Punctuation' in the Unicode
standard. These are marked as 'Po', e.g.:

0021;EXCLAMATION MARK;Po;0;ON;;;;;N;;;;;

| ALPHA Alphabetic characters in the C isalpha() sense, but there are
| certainly some non-ASCII letter characters which belong in this
| bucket, too.

Almost 50,000 in Unicode ...

| The algorithm doesn't make a special case of using ASCII double quote
| as an inches indicator (there are other uses, like lat/long minutes;

For minutes and feet, U+2032 should really be used (yes, nobody (will) use them
(if AbiWord doesn't make them easy to use/insert!)).

BTW, I've found that " shouldn't be used for inches after all, U+2033 shold be
used (for inches and seconds).

| ditto for the ASCII quote) because it is tough to tell if some numbers
| with an ASCII double quote after them are intended to be one of those
| "other things" or is just the end of a very long quote.

Yes.

| So, the
| algorithm will be wrong sometimes in those cases.

| It is otherwise sort of conservative, preferring to not convert things
| it doesn't feel confident about. The reason for that is that there is
| a contemplated on-the-fly conversion to smart quotes, but there is no
| contemplated on-the-fly conversion to ASCII QUOTEs.

Well, you can turn of smart quotes. In word, you can use 'Undo' to undo
conversion of a single quote.

| What about the occasions when this algorithm (or any alternative
| algorithm) makes a mistake and converts a QUOTE to the curly form when
| it really isn't wanted, in a particular case, by the user?

So, if your inch marks is converted to a '99', you only have to press 'Ctrl+Z'.
This should work in AbiWord too.

-- 
Karl Ove Hufthammer



This archive was generated by hypermail 2b25 : Thu Jul 20 2000 - 03:19:43 CDT