Re: commit: abi: UTF8String class

From: Scott Rushfeldt (sirushfe@unity.ncsu.edu)
Date: Sun Apr 21 2002 - 18:14:54 EDT

Next message: Kenneth J.Davis: "Re: plugins, build fails with unresolved external symbols"

Previous message: Leonard Rosenthol: "Re: Next Generation Containers."
In reply to: Karl Ove Hufthammer: "Re: commit: abi: UTF8String class"
Next in thread: Andrew Dunbar: "Re: commit: abi: UTF8String class"
Next in thread: Andrew Dunbar: "Re: commit: abi: UTF8String class"
Next in thread: Joaquin Cuenca Abela: "Re: commit: abi: UTF8String class"
Reply: Andrew Dunbar: "Re: commit: abi: UTF8String class"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

----- Original Message -----
From: "Karl Ove Hufthammer" <huftis@bigfoot.com>
To: <abiword-dev@abisource.com>
Sent: Sunday, April 21, 2002 11:19 AM
Subject: Re: commit: abi: UTF8String class

> Andrew Dunbar <hippietrail@yahoo.com> wrote in
> news:20020421150105.97083.qmail@web9608.mail.yahoo.com:
>
> > *May* map to a different glyph - but glyph is not the
> > correct term, I believe. You could have a c with an
> > acute accent and a cedilla, for instance, which would
> > need three codepoints but appear on the screen to be
> > one character. I don't have the proper definition for
> > glyph handy sorry.
>
> Neither do I, but I can try: A glyph is a graphical presentation
> form. I Unicode, there is neither not a one to one mapping from
> characters to glyphs, or the other way. One character can displayed
> as several glyphs and one glyph can be displayed as several
> characters. E.g. the greek letter pi and the mathematical symbol
> (usually) use the same glyph (graphical presentation), but they're
> different character. Sometimes a character is displayed in
> different ways depending on which language it is used in (e.g.
> Japanese vs. Chinese).
>
> But we also have combining characters in Unicode. For example, to
> write a é, you write and e, followed by a combining ´. This may be
> rendered as an e with ´ superimposed (usually looks bad), but
> usually a separate é glyph is used. Note that both, é, e and the
> combining ´ characters are defined in Unicode. This is mainly for
> backwards compatibility with older character sets (e.g. ISO-8859-
> 1). Future characters will likely not feature any new pre-composed
> characters.
>
> Lastly, none of this has anything to do with surrogate characters,
> which completely matters even more! :)
>
> --
> Karl Ove Hufthammer
>

I don't claim to be an expert on Unicode, but from what I've read might
it work to store strings in the piece table as arrays of glyph objects, with
each glyph object containing all the UTF-8 characters necessary to define
the glyph to be displayed. This (from what I understand of Unicode) would
allow the piece table to maintain its random access of glyphs, and still
allow UTF-8 character combination. I don't know how this would effect
proccessing speed, but it should be possible to eliminate calculations
within the glyph class at all times except when the glyph is being edited.
This could be done by storing what glyph should be displayed for each glyph
object(based on the UTF-8 characters contained in the object). This would
allow the glyphs to be accessed even faster by the piece table, and the
glyph would only be changed if a particular glyph object is edited(I not
sure how often this would happen). Hope this makes some sense since I am
still very new to abiword and am not yet comfortable enough with the code
IMO to do debugging. If this does seem liike an interesting idea, but my
thinking on some point seems confusing please email me or the list(I am
suscribed) and I would be happy to discuss my thinking.

Happy Coding,

Scott Rushfeldt
sirushfe@unity.ncsu.edu

Next message: Kenneth J.Davis: "Re: plugins, build fails with unresolved external symbols"
Previous message: Leonard Rosenthol: "Re: Next Generation Containers."
In reply to: Karl Ove Hufthammer: "Re: commit: abi: UTF8String class"
Next in thread: Andrew Dunbar: "Re: commit: abi: UTF8String class"
Next in thread: Andrew Dunbar: "Re: commit: abi: UTF8String class"
Next in thread: Joaquin Cuenca Abela: "Re: commit: abi: UTF8String class"
Reply: Andrew Dunbar: "Re: commit: abi: UTF8String class"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.4 : Sun Apr 21 2002 - 18:07:55 EDT