Re: wv: windows codepages, changes to allow conversion to

Paul Rohr (paul@abisource.com)
Tue, 09 Nov 1999 11:05:39 -0800


At 03:48 PM 11/9/99 -0000, Caolan McNamara wrote:
>Anyway back to what I am doing, if the character is a 16 bit unicode
>character there is no problem. If the character is 8 bit then in my
>code I will take the 8 bit character and figure out what windows
>codepage is being used for it and convert the character to unicode
>through one of the available mapping table, and then we can handle the
>resulting unicode character as normal.

Cool. Have you given any thought to packaging those mapping tables
separately? I suspect they'd be quite useful for other importers too (such
as RTF or WordPerfect).

>Now abiword uses the charhandler so it will have to add another variable
>to match the wv definition. After that, abiword will have to decide what
>it wants to do when it is given an 8 bit character and a lid, the character
>will still be given to the charhandler exactly as it was found in the word
>document, but there will be utility functions available to convert the
>character to unicode, something that works like this maybe
>
>(function pointer) = wvGetCodePageConverter(lid)
>unicodechar = (function pointer)(8 bit codepage character);

OK. If you and Justin agree that the API should really be this low-level,
that's fine by me. AFAIK, we'll never want to see the character variant in
the Windows codepage, but just go directly to the full unicode equivalent.

>This change will be in the next cvs commit I do tomorrow or later on, so
>I'm afraid that there will probably be some failed builds because of it.

Justin, is this something you'll be able to get to? If not, let us know.
It sounds like a very quick fix, so it shouldn't be that bad for someone
else to paste in the relevant code snippet.

Paul



This archive was generated by hypermail 1.03b2.