Re: fix for wv


Subject: Re: fix for wv
From: Vlad Harchev (hvv@hippo.ru)
Date: Sat Nov 18 2000 - 02:05:43 CST


On Sat, 18 Nov 2000, ha shao wrote:

> On Fri, Nov 17, 2000 at 09:24:05PM +0400, hvv@hippo.ru wrote:
> >
> > wv was really terribly broken for word6 format files. Here is a patch that
> > fixes this.
> >
> > To CJK guys:
> >
> > * Now word6.doc from _Belcon_ gets imported properly too (and word2k document
> > from Chih-Wei Huang also OK)
> >
> > * word6.doc from Chih-Wei Huang's mail I've forwarded here doesn't import
> > properly - (chars are not converted to unicode) since wv thinks it's in
> > word7 format (!) - wvQuerySupported(&ps->fib,NULL) returns WORD7, so it
> > seems there is no clean workaround/hack for importing it (may be wordpad is
> > that broken - is word able to read this file ? And what version of windows
> > wordpad is used from - is it from win2k or from NT or from win9x?).
> > IMO the only hack that can be used - is to check whether the
> > arrived character's code is less (or more or is in the range) than some
> > constant for given charset, and if doesn't satisfy constraints on the value,
> > its character type is set to '1' to force conversion to unicode.
>
> I only see word6 and word8/Word97 document at
> http://www.wotsit.org/search.asp?s=text
> So it might as well that WORD7 do not use unicode either. It looks
> like that word6 has more common features with Word95 than
> word8 has with word95.

 Yes, I have the same impression now.
 
> I only see one place in the word97 document mentioned unicode with word95
> that state:
> =====
> XCHAR( eXtended CHARacter set):
>
> A data type which defines a "character". Each XCHAR corresponds to a character in the document, where "character" is defined as
> a glyph, regardless of whether it is a single-byte or double-byte character. With Word6/FE, Word95/FE, Word97/all and future
> versions of Word, this is defined as a 16-bit integer corresponding to the Unicode character code of the glyph.
> ======
> where /FE means far-east.
> If set chartype to 1 for word7 format also get proper result for
> .doc import under other languages(russian?), we can assume word7
> behave similarly with word6 in this aspect. What do you think?

 I should say I didn't have problems importing russian Word95 documents (I
believe - I don't remember exactly whether I tested word95 docs) - at least I
didn't have problems with docs generated by wordpads (chartype was set
correctly). So I conclude that word6/7/wordpad doesn't set chartype properly
only for CJK docs. So we can change " <= WORD6" to "<=WORD7" in the hack
you've added and check results.

> --
> Best regard
> ha_shao
>

 Best regards,
  -Vlad



This archive was generated by hypermail 2b25 : Sat Nov 18 2000 - 02:28:22 CST