Re: MSWord DOC (fwd)


Subject: Re: MSWord DOC (fwd)
From: Sam TH (sam@uchicago.edu)
Date: Sat Nov 11 2000 - 10:41:32 CST


On Sat, Nov 11, 2000 at 08:10:31PM +0400, Vlad Harchev wrote:
> On Sat, 11 Nov 2000, Sam TH wrote:
>
> > On Sat, Nov 11, 2000 at 07:55:05PM +0400, Vlad Harchev wrote:
> > >
> > > Just to add more information - here is the file that wv fails to treat
> > > correctly (the data in it doesn't get converted to unicode at all). Most
> > > probably CJK in Word6.0 format is yet not supported by wv. All that is needed
> > > is to extract language code or whatever to be able to setup iconv descriptor
> > > for right input charset.
> > >
> > > I should say that wv imports russian in word6.0 format just fine, so this is
> > > only CJK problem.
> >
> > Vlad, what exactly is this document supposed to look like? When I run
> > it thorough wv here, I get a document with about 7 CJK characters. Is
> > that anything like correct? What's the desired result?
>
> To say the truth - I don't know - I don't know any CJK language.
> On my box (with my patch) and running with LANG=zh_CH.BIG5 I see "??word??".
> Of course I don't have CJK fonts. But when I save this file as .abw, I get
> the following characters inside (and of course usual xml tags around)
> ꒤ꓥwordꓩꓥ
> These are Big5 codes of the text in .doc file (but are written as unicode
> chars). This all means that wv doesn't know encoding of the .doc file, so the
> chars from the file were not converted to unicode.
> I suspect wv doesn't support CJK in word6.0 format yet (but handles CJK in
> word8.0 format fine).

I'll see if I can check this out on a real copy of word in the computer labs
later today. Unless one of our CJK hackers lets me know before then.
           
        sam th
        sam@uchicago.edu
        http://www.abisource.com/~sam/
        GnuPG Key:
        http://www.abisource.com/~sam/key




This archive was generated by hypermail 2b25 : Sat Nov 11 2000 - 10:41:34 CST