Re: undo and combining characters

From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Mon Apr 22 2002 - 23:29:23 EDT

  • Next message: Andrew Dunbar: "Re: selections and combining characters"

     --- Karl Ove Hufthammer <huftis@bigfoot.com> wrote: >
    Paul Rohr <paul@abisource.com> wrote in
    >
    news:3.0.5.32.20020422085428.034194f0@mail.abisource.com:
    >
    > > How should undo work for combining characters?
    >
    > Well, combining characters may be input in several
    > ways. On my
    > Norwegian keyboard, I write é by pressing the Alt Gr
    > + 'the ´
    > deadkey', followed by an e. (BTW, note that the
    > decomposed form of
    > é in Unicode is e´, not ´e.) On French keyboards, I
    > believe there
    > is a separate é key. But exactly how the keypress
    > --> character
    > sequence is generated should be done by the OS.

    The concept of input and the concept of internal
    representation are really quite distinct. It just
    happens that the concept of "dead keys" and "combining
    characters" are similar - but reversed. In reality
    there are not related. Both the Norwegian and French
    keymaps on all OSes only return a single, precomposed
    character on entering an é: U+00E9

    Combining characters are now possible with Unicode for
    western languages but nobody is using them yet.

    Combining characters are currently used by Vietnamese
    to add "tone marks" to roman characters.

    > As for undoing a decomposed character (e.g. e´), I
    > think it's safe
    > to undo all characters back to (and including) the
    > last non-
    > combining character. For example if you write e´

    Don't confuse your precomposed é above with combining
    characters.

    On Vietnamese windows, you would type "e" and "e"
    would be displayed, you would then type a "tone mark"
    and this would be displayed above the "e" and the
    cursor would not move to the right. This is a
    combining character.

    > (where ´ is not
    > actually ´, but the combining ´) and press undo,
    > both characters
    > (which are probably displayed as one glyph) should
    > be deleted. (In
    > practice é would/should be written as the
    > pre-composed é character,
    > as per Normalization Form C <URL:
    > http://www.unicode.org/unicode/reports/tr15/ >. I
    > only use it here
    > as an exaple.)

    Normalization is a different subject which mostly
    comes into play with searching and sorting - it's
    probably only going to be confusing to mention it
    here.
    Though maybe we do need to discuss whether AbiWord
    should normalize all characters in its internal
    representation...

    > > What would a native speaker want to happen when
    > you "undo" the
    > > entry of a single "on-screen" character?[1] I
    > suspect that
    > > creating such an entity may take more than one
    > step (in the
    > > input method editor), but should they always be
    > undone
    > > individually?
    >
    > In case similar to my example above, yes. But not
    > always. See for
    > example the romaji input example at <URL:
    > http://www.w3.org/TR/charmod/#sec-CharExamples >.
    > How this should
    > be handled is depedant on the actual input method
    > used.

    See my earlier post.

    Andrew Dunbar.

    > --
    > Karl Ove Hufthammer

    =====
    http://linguaphile.sourceforge.net http://www.abisource.com

    __________________________________________________
    Do You Yahoo!?
    Everything you'll ever need on one web page
    from News and Sport to Email and Music Charts
    http://uk.my.yahoo.com



    This archive was generated by hypermail 2.1.4 : Mon Apr 22 2002 - 23:30:39 EDT