Subject: Re: Bug 1691
From: Karl Ove Hufthammer (huftis@bigfoot.com)
Date: Mon Jul 16 2001 - 13:22:39 CDT
må 16 jul 2001 06:05:11, abiword@linuxfreemail.com:
> Bug 1569 has a pretty detailed discussion of this, seems to
> get more complicated the more you look at it! After reading
> that, I begin to think "our way" really *is* better. Have
> UPPER CASE and want Title Case? Just click lowercase and then
> sentence case.
My opinion:
I think we should do this *better* than MS Word does. Example:
The correct title case (GB English) should be:
'Taming of the Shrew'
*not*:
'Taming Of The Shrew'
Reference: <URL: http://www.unicode.org/unicode/reports/tr21/ >:
The choice of which words to titlecase is
language-dependent. For
example, "Taming of the Shrew" would be the appropriate
capitalization in English, not "Taming Of The Shrew".
Moreover, the determination of what actually constitutes a
word is also language-dependent. For example, l'arbre might
be considered two words in French, while can't is
considered one word in English.
Note that while the archaic Georgian script contained
upper- and
lowercase pairs, they are rarely used in modern Georgian.
The case mappings in the Unicode Character Database (UCD) are
informative, default mappings. Case itself, on the other
hand, has normative status. Thus, for example, 0041 "A" is
normatively uppercase, but its lowercase mapping to 0061 "a"
is informative. The reason for this is that case can be
considered to be an inherent property of a particular
character, but case mappings between characters are
occasionally influenced by local conventions.
There are a number of complications to case mappings that
occur once the repertoire of characters is expanded beyond
ASCII.
* In most cases, the titlecase is the same as the
uppercase, but not
always. For example, the titlecase of U+01F1 "DZ" capital
dz is U+01F2 "Dz" capital d with small z.
* Case mappings may produce strings of different length
than the
original.
+ For example, the German character U+00DF "á" small
letter
sharp s expands when uppercased to the sequence of
two characters "SS". This also occurs where there is
no precomposed character corresponding to a case
mapping, such as with U+0149 "'n" latin small letter
n preceded by apostrophe.
* There are some characters that require special handling,
such as
U+0345 combining iota subscript.
* Characters may also have different case mappings,
depending on the
context.
+ For example, U+03A3 "S" capital sigma lowercases to
U+03C3
"s" small sigma if it is followed by another letter,
but lowercases to U+03C2 "*s" small final sigma if
it is not.
* Characters may have case mappings that depend on the
locale.
+ For example, in Turkish the letter U+0049 "I"
capital letter
i lowercases to U+0131 "Õ" small dotless i.
* Since many characters are really caseless (most of the
IPA block,
for example) and have no matching uppercase, the process
of uppercasing a string does not mean that it will no
longer contain any lowercase letters.
[...]
Converting to Titlecase
Map each character x based on the the preceding character. If
that character is cased, use UCD_lower(x), otherwise
UCD_title(x).
Remember to use the context-dependent mappings above, and
consider the titlecase caveats.
-- Karl Ove Hufthammer
This archive was generated by hypermail 2b25 : Mon Jul 16 2001 - 13:23:17 CDT