Word document import and cyrillic encoding


Subject: Word document import and cyrillic encoding
sam@stl.ru
Date: Fri Mar 31 2000 - 03:22:14 CST


        Hello!

I'm new to AbiWord development and now state more from the user's
position. I found AbiWord a good editor for my needs, but I have to
import/export Russian texts generated by WinWord. The 0.7.8 release let
me import and read RTFs, not DOCs.
As I tried to import DOCs, both 6 and 7-8-9 Word-generated, I saw
garbage of symbols "<",">" and some letters.

I downloaded the source for 0.7.8 and made AbiWord understand WinWord 6
documents in Russian (Microsoft codepage 1251). The patch for this is:

File ~~~/src/wp/impexp/xp/ie_imp_MsWord_97.cpp , function CharProc(),
line 119,
        replace "if (chartype)"
        with "if (!chartype)"
(as I saw sample documents shipped with the distribution are still
readable after this patch)
        
This let word6 documents (MS CP1251) pass conversion and be displayed
correctly, but Word7-8-9 DOCs (Unicode?) still unreadable after import.

More practical interest I found in a fact that a program 'wvHtml' built
from the same 'wv-0.7.8.tar.gz ' as AbiWord handles both Word6 and
Word7-8-9 documents correctly, independently on documents' language and
charset. I tried to move the conversion source into AbiWord converter,
but got either no success or segfaults :(
BTW, when I try to open these documents as UTF8 text, it cause
segfault too, at the same time when English texts displays correctly.

Finally, I have two versions:
1. I've done something wrong. Please, give me advice: what, and
(maybe:) how to fix it.
2. The AbiWord code for conversion UTF8->CP1251 , needed to import
Russian documents by Word7-8-9 fails to work.

Unfortunately, I can't spend a lot of time finding the reason of this
strange thing (I'm a student and have to graduate from the Institute
this June-> I'm very busy:((. Please advise me where to seek for the
solution. Again, I want to stress on a fact, that "wvHtml" converter
program works fine. Have you any ideas how to integrate its algorithm
into AbiWord source?

SY, Serge Musorin.



This archive was generated by hypermail 2b25 : Fri Mar 31 2000 - 03:28:55 CST