Re: commit: abi: UTF8String class

From: Scott Rushfeldt (sirushfe@unity.ncsu.edu)
Date: Sun Apr 21 2002 - 18:14:54 EDT

  • Next message: Kenneth J.Davis: "Re: plugins, build fails with unresolved external symbols"

    ----- Original Message -----
    From: "Karl Ove Hufthammer" <huftis@bigfoot.com>
    To: <abiword-dev@abisource.com>
    Sent: Sunday, April 21, 2002 11:19 AM
    Subject: Re: commit: abi: UTF8String class

    > Andrew Dunbar <hippietrail@yahoo.com> wrote in
    > news:20020421150105.97083.qmail@web9608.mail.yahoo.com:
    >
    > > *May* map to a different glyph - but glyph is not the
    > > correct term, I believe. You could have a c with an
    > > acute accent and a cedilla, for instance, which would
    > > need three codepoints but appear on the screen to be
    > > one character. I don't have the proper definition for
    > > glyph handy sorry.
    >
    > Neither do I, but I can try: A glyph is a graphical presentation
    > form. I Unicode, there is neither not a one to one mapping from
    > characters to glyphs, or the other way. One character can displayed
    > as several glyphs and one glyph can be displayed as several
    > characters. E.g. the greek letter pi and the mathematical symbol
    > (usually) use the same glyph (graphical presentation), but they're
    > different character. Sometimes a character is displayed in
    > different ways depending on which language it is used in (e.g.
    > Japanese vs. Chinese).
    >
    > But we also have combining characters in Unicode. For example, to
    > write a é, you write and e, followed by a combining ´. This may be
    > rendered as an e with ´ superimposed (usually looks bad), but
    > usually a separate é glyph is used. Note that both, é, e and the
    > combining ´ characters are defined in Unicode. This is mainly for
    > backwards compatibility with older character sets (e.g. ISO-8859-
    > 1). Future characters will likely not feature any new pre-composed
    > characters.
    >
    > Lastly, none of this has anything to do with surrogate characters,
    > which completely matters even more! :)
    >
    > --
    > Karl Ove Hufthammer
    >

        I don't claim to be an expert on Unicode, but from what I've read might
    it work to store strings in the piece table as arrays of glyph objects, with
    each glyph object containing all the UTF-8 characters necessary to define
    the glyph to be displayed. This (from what I understand of Unicode) would
    allow the piece table to maintain its random access of glyphs, and still
    allow UTF-8 character combination. I don't know how this would effect
    proccessing speed, but it should be possible to eliminate calculations
    within the glyph class at all times except when the glyph is being edited.
    This could be done by storing what glyph should be displayed for each glyph
    object(based on the UTF-8 characters contained in the object). This would
    allow the glyphs to be accessed even faster by the piece table, and the
    glyph would only be changed if a particular glyph object is edited(I not
    sure how often this would happen). Hope this makes some sense since I am
    still very new to abiword and am not yet comfortable enough with the code
    IMO to do debugging. If this does seem liike an interesting idea, but my
    thinking on some point seems confusing please email me or the list(I am
    suscribed) and I would be happy to discuss my thinking.

    Happy Coding,

    Scott Rushfeldt
    sirushfe@unity.ncsu.edu



    This archive was generated by hypermail 2.1.4 : Sun Apr 21 2002 - 18:07:55 EDT