Re: undo and combining characters

From: Paul Rohr (paul@abisource.com)
Date: Mon Apr 22 2002 - 17:02:00 EDT

  • Next message: Hubert Figuiere: "Commit: Re: SDW Importer"

    Again, thanks for the reference, Karl. Having specific examples like this
    really helps make the design discussion more concrete.

    CAVEAT: I'm no expert on these issues, but I'm trying to synthesize a
    design principle which can be easily explained, so the resulting behavior
    will fall somewhere between the "Just Works" ideal and a less ambitious "not
    surprising" standard.

    At 03:28 PM 4/22/02 -0400, Karl Ove Hufthammer wrote:
    >Well, combining characters may be input in several ways. On my
    >Norwegian keyboard, I write é by pressing the Alt Gr + 'the ´
    >deadkey', followed by an e. (BTW, note that the decomposed form of
    >é in Unicode is e´, not ´e.) On French keyboards, I believe there
    >is a separate é key. But exactly how the keypress --> character
    >sequence is generated should be done by the OS.

    Agreed.

    >As for undoing a decomposed character (e.g. e´), I think it's safe
    >to undo all characters back to (and including) the last non-
    >combining character. For example if you write e´ (where ´ is not
    >actually ´, but the combining ´) and press undo, both characters
    >(which are probably displayed as one glyph) should be deleted. (In
    >practice é would/should be written as the pre-composed é character,
    >as per Normalization Form C <URL:
    >http://www.unicode.org/unicode/reports/tr15/ >. I only use it here
    >as an exaple.)

    Given what you've said so far, here's a possible design:

      Let the IME do as much composition work as it can.
      As far as AbiWord is concerned, the pre-composed character is atomic.
      That's what we store, render, select, format, and undo.

    So far, so good. That allows us to handle examples A.1 and A.2, and happens
    to be more-or-less what we've already got implemented.

    >> What would a native speaker want to happen when you "undo" the
    >> entry of a single "on-screen" character?[1] I suspect that
    >> creating such an entity may take more than one step (in the
    >> input method editor), but should they always be undone
    >> individually?
    >
    >In case similar to my example above, yes. But not always. See for
    >example the romaji input example at <URL:
    >http://www.w3.org/TR/charmod/#sec-CharExamples >. How this should
    >be handled is depedant on the actual input method used.

    Do you have a specific proposal here?

    According to example A.3, as far as AbiWord is concerned, all those Latin
    characters are never seen. The IME intercepts and translates them, handing
    us 3 kanji characters at once.

    Thus, the question becomes:

      - Are *those* characters atomic (for selection or deletion purposes)?
      - Should we glob them for undo purposes?

    Again, I'm not a native speaker, but I'd guess that the answer to the first
    question is yes. The second is less clear to me.

    but wait, there's more
    ----------------------
    Now we get closer to the screw cases I was worried about.

    For example, consider example A.4. Since we're letting the IME do the
    necessary composition (or decomposition), we have no way to differentiate
    the keystrokes used to create the two lam-alef ligatures here. Thus, should
    undo:

      - glob the "first" one (for a total of 4 steps), or
      - decompose the second one (for a total of 6 steps)?

    ( For anyone tempted to be tricky, yes we could theoretically jigger the
    undo records to differentiate them when originally typed, but not after the
    file's been stored and reloaded. Both behaviors should be consistent, no? )

    Even worse, how many undo steps should there be after typing the Tamil word
    in example A.5?

      - six?
      - five?
      - four?
      - one? (ie, punt and just don't allow character-level undo)

    Don't ask me, I don't speak or type either language. :-)

    bottom line
    -----------
    Like Martin, I've been hoping that Unicode was monstrous enough that we
    could always expect to encounter fully-composed characters in the piece
    table. That way, undo and selections would create a user experience that
    would Just Work like our trivial Latin cases.

    Evidently, it's not that simple, which is why we need answers to these
    questions.

    If my suggestions in the selection case make sense, then is there any reason
    to allow undo granularity which is *finer* than the selection granularity?
    If so, when and how? What kind of user experience will Just Work for the
    cases Andrew is most worried about?

    OK folks, have at it!

    Paul



    This archive was generated by hypermail 2.1.4 : Mon Apr 22 2002 - 17:02:33 EDT