Re: Bug 1691


Subject: Re: Bug 1691
From: Karl Ove Hufthammer (huftis@bigfoot.com)
Date: Mon Jul 16 2001 - 13:22:39 CDT


må 16 jul 2001 06:05:11, abiword@linuxfreemail.com:

> Bug 1569 has a pretty detailed discussion of this, seems to
> get more complicated the more you look at it! After reading
> that, I begin to think "our way" really *is* better. Have
> UPPER CASE and want Title Case? Just click lowercase and then
> sentence case.

My opinion:

I think we should do this *better* than MS Word does. Example:

The correct title case (GB English) should be:

'Taming of the Shrew'

*not*:

'Taming Of The Shrew'

Reference: <URL: http://www.unicode.org/unicode/reports/tr21/ >:

    The choice of which words to titlecase is
    language-dependent. For
     example, "Taming of the Shrew" would be the appropriate
     capitalization in English, not "Taming Of The Shrew".
     Moreover, the determination of what actually constitutes a
     word is also language-dependent. For example, l'arbre might
     be considered two words in French, while can't is
     considered one word in English.

      Note that while the archaic Georgian script contained
      upper- and
     lowercase pairs, they are rarely used in modern Georgian.

   The case mappings in the Unicode Character Database (UCD) are
   informative, default mappings. Case itself, on the other
   hand, has normative status. Thus, for example, 0041 "A" is
   normatively uppercase, but its lowercase mapping to 0061 "a"
   is informative. The reason for this is that case can be
   considered to be an inherent property of a particular
   character, but case mappings between characters are
   occasionally influenced by local conventions.

   There are a number of complications to case mappings that
   occur once the repertoire of characters is expanded beyond
   ASCII.
     * In most cases, the titlecase is the same as the
     uppercase, but not
       always. For example, the titlecase of U+01F1 "DZ" capital
       dz is U+01F2 "Dz" capital d with small z.
     * Case mappings may produce strings of different length
     than the
       original.
          + For example, the German character U+00DF "á" small
          letter
            sharp s expands when uppercased to the sequence of
            two characters "SS". This also occurs where there is
            no precomposed character corresponding to a case
            mapping, such as with U+0149 "'n" latin small letter
            n preceded by apostrophe.
     * There are some characters that require special handling,
     such as
       U+0345 combining iota subscript.
     * Characters may also have different case mappings,
     depending on the
       context.
          + For example, U+03A3 "S" capital sigma lowercases to
          U+03C3
            "s" small sigma if it is followed by another letter,
            but lowercases to U+03C2 "*s" small final sigma if
            it is not.
     * Characters may have case mappings that depend on the
     locale.
          + For example, in Turkish the letter U+0049 "I"
          capital letter
            i lowercases to U+0131 "Õ" small dotless i.
     * Since many characters are really caseless (most of the
     IPA block,
       for example) and have no matching uppercase, the process
       of uppercasing a string does not mean that it will no
       longer contain any lowercase letters.

[...]

   Converting to Titlecase

   Map each character x based on the the preceding character. If
   that character is cased, use UCD_lower(x), otherwise
   UCD_title(x).

   Remember to use the context-dependent mappings above, and
   consider the titlecase caveats.

-- 
Karl Ove Hufthammer



This archive was generated by hypermail 2b25 : Mon Jul 16 2001 - 13:23:17 CDT