Re: New charset support ?


Subject: Re: New charset support ?
From: Vlad Harchev (hvv@hippo.ru)
Date: Tue Apr 03 2001 - 06:18:24 CDT


On Tue, 3 Apr 2001, Hubert Figuiere wrote:

> Vlad Harchev écrit:
>
> > AbiWord already in theory supports any charset when importing RTFs - see RTF
> > importer. It uses XAP_EncodingManager::charsetFromCodepage to get the name of
> > encoding for given codepage number the RTF text is in, and converts all
> > charactres to unicode using iconv. If iconv() doesn't know an encoding under
> > the name of form CPXXXX for some codepage XXXX, then appropriate entries
> > should be added to XAP_EncodingManager.cpp:MSCodepagename_to_charset_name_map
> >
> > Importing of RTFs with encodings different from locale's encoding works fine
> > at least for Russian and Chinese.
>
> How should I do for MacRoman as it does not have a codepage number but have
> an iconv() name ?
> I haven't found the method that does this.

 Hmm, how RTF header for files in MacRoman looks like? Does it include
\ansicpgXXXX or what? What is the corresponding charset iconv should use?
If \ansicpgXXXX is not included in the header of RTF file, then we should add
special support for that (new, I assume) keyword, and it should be something
like this, without the need to touch
XAP_EncodingManager.cpp:MSCodepagename_to_charset_name_map:

               if (strcmp((char*)pKeyword, "SOMEKEYWORD") == 0)
                {
                        m_mbtowc.setInCharset("CharsetNameForMacRoman");
                }
                break;

 
> BTW how do we insure that we have support for this conversion ?

 You mean how do we insure that iconv() have support for this conversion or
what?

>
> Hub
>

 Best regards,
  -Vlad



This archive was generated by hypermail 2b25 : Tue Apr 03 2001 - 06:59:40 CDT