Re: MSWord DOC (fwd)


Subject: Re: MSWord DOC (fwd)
From: Vlad Harchev (hvv@hippo.ru)
Date: Sat Nov 11 2000 - 10:10:31 CST


On Sat, 11 Nov 2000, Sam TH wrote:

> On Sat, Nov 11, 2000 at 07:55:05PM +0400, Vlad Harchev wrote:
> >
> > Just to add more information - here is the file that wv fails to treat
> > correctly (the data in it doesn't get converted to unicode at all). Most
> > probably CJK in Word6.0 format is yet not supported by wv. All that is needed
> > is to extract language code or whatever to be able to setup iconv descriptor
> > for right input charset.
> >
> > I should say that wv imports russian in word6.0 format just fine, so this is
> > only CJK problem.
>
> Vlad, what exactly is this document supposed to look like? When I run
> it thorough wv here, I get a document with about 7 CJK characters. Is
> that anything like correct? What's the desired result?

 To say the truth - I don't know - I don't know any CJK language.
 On my box (with my patch) and running with LANG=zh_CH.BIG5 I see "??word??".
 Of course I don't have CJK fonts. But when I save this file as .abw, I get
 the following characters inside (and of course usual xml tags around)
        ꒤ꓥwordꓩꓥ
 These are Big5 codes of the text in .doc file (but are written as unicode
chars). This all means that wv doesn't know encoding of the .doc file, so the
chars from the file were not converted to unicode.
  I suspect wv doesn't support CJK in word6.0 format yet (but handles CJK in
word8.0 format fine).

>
> sam th
> sam@uchicago.edu
> http://www.abisource.com/~sam/
> GnuPG Key:
> http://www.abisource.com/~sam/key
>

 Best regards,
  -Vlad



This archive was generated by hypermail 2b25 : Sat Nov 11 2000 - 10:29:50 CST