Re: MSWord DOC (fwd)


Subject: Re: MSWord DOC (fwd)
From: Vlad Harchev (hvv@hippo.ru)
Date: Sat Nov 11 2000 - 10:33:40 CST


On Sat, 11 Nov 2000, Sam TH wrote:

> On Sat, Nov 11, 2000 at 08:10:31PM +0400, Vlad Harchev wrote:
> > On Sat, 11 Nov 2000, Sam TH wrote:
> >
> > > On Sat, Nov 11, 2000 at 07:55:05PM +0400, Vlad Harchev wrote:
> > > >
> > > > Just to add more information - here is the file that wv fails to treat
> > > > correctly (the data in it doesn't get converted to unicode at all). Most
> > > > probably CJK in Word6.0 format is yet not supported by wv. All that is needed
> > > > is to extract language code or whatever to be able to setup iconv descriptor
> > > > for right input charset.
> > > >
> > > > I should say that wv imports russian in word6.0 format just fine, so this is
> > > > only CJK problem.
> > >
> > > Vlad, what exactly is this document supposed to look like? When I run
> > > it thorough wv here, I get a document with about 7 CJK characters. Is
> > > that anything like correct? What's the desired result?
> >
> > To say the truth - I don't know - I don't know any CJK language.
> > On my box (with my patch) and running with LANG=zh_CH.BIG5 I see "??word??".
> > Of course I don't have CJK fonts. But when I save this file as .abw, I get
> > the following characters inside (and of course usual xml tags around)
> > ꒤ꓥwordꓩꓥ
> > These are Big5 codes of the text in .doc file (but are written as unicode
> > chars). This all means that wv doesn't know encoding of the .doc file, so the
> > chars from the file were not converted to unicode.
> > I suspect wv doesn't support CJK in word6.0 format yet (but handles CJK in
> > word8.0 format fine).
>
> I'll see if I can check this out on a real copy of word in the computer labs
> later today. Unless one of our CJK hackers lets me know before then.

 I don't think you should bother. Chi-Wei Huang confirmed that those &#xHHHH
chars are Big5 - no need to confirm this again. Let's see what Dom will say.

 Best regards,
  -Vlad



This archive was generated by hypermail 2b25 : Sat Nov 11 2000 - 10:52:58 CST