some wv news, vers 0.5.39 and fixed word styles

Caolan McNamara (Caolan.McNamara@ul.ie)
Thu, 28 Oct 1999 12:43:17 +0100 (IST)


Word styles are now generated correctly when a style depends on other styles
even if that style is of a higher index number,unlike my original mechanism
which couldn't handle such complexity :-) so the "ISTD out of sequence" message
is a thing of the past.

This means that for the abiword people the word style code should be complete,
the wvParseStruct contains a STSH which has all the styles in it, their names
and their properties all correct (hopefully). And when receiving a PAP and CHP
from the element handler the dirty flag lets you know if the paragraph conforms
exactly to the style that it is based on. So that should be the end of that.

I finished a test run last night on 4747 document (556Megs) that have
accumulated through the online conversion site, and there were no crashes,
this includes stacks of word97,95,6 and 9(!!) documents. So I am now
particularly interested in files that can crash wv. There are certainly
showstopper bugs in there that havn't been unearthed yet. (This of
course was before todays stylesheet change, which might be buggy)

I was looking into word 5 and 2 support as well recently, and I made a
few small mods that will allow for this in the near future.

Non western european word 95 and 6 documents are not stored in unicode,
but probably use the windows codepages and some identifiers to specify
which is which. I havn't figured out yet what the full story is with them,
so the conversion of these documents might be quite wrong in terms of the
returned content. Consider this a known bug for the moment.

The new version will often do a lot of complaining about "invalid lists"
and "character or paragraph runs not being open". These are two
workarounds which I believe are fully correct, but I'm just leaving the
warnings in there for now to highlight that this is a possible danger
spot.

The next task is to cleanly implement graphic extraction, the current
generation of wvHtml puts a picture placeholder where each graphic should be,
it also attempt to set the correct size as well. All graphics are
embedded in another fileformat known as "escher", so I need to rewrite
my escher parser. Even after the graphics are extracted in usable form
some of them are wmf and emf files, I have written previously a library
to convert these into gif, but I need to rewrite it to use the new gd
library which only supports png. So there some work to be done on
that front as well.

The abiword people should be aware that adding graphic support to libwv
will add a dependancy on libz as the wmf files are stored compressed
inside word. I am open to suggestions for any required graphic
extraction api. The first couple of attempts will just hand off a FILE *
of a temporary file, so that I can examine each stage seperately.

This version is in abiword's cvs and at
http://www.csn.ul.ie/~caolan/publink/mswordview/development/
Version number is 0.5.39

I have a sneak preview of my new wv site and online converter at
http://www.csn.ul.ie/~caolan/wvWare/
Give the new online converter a whirl if you're interested in
wvHtml, but don't want to go to the hassle of compiling it (or
if the damn thing doesn't compile for you, you can see what you're
missing. So upload any files that crash on you with wv, and you
can submit bug reports as well, maybe I should use bugzilla, but
that seems like a bit too much work :-).

C.

Real Life: Caolan McNamara * Doing: MSc in HCI
Work: Caolan.McNamara@ul.ie * Phone: +353-86-8790257
URL: http://www.csn.ul.ie/~caolan * Sig: an oblique strategy
Listen to the quiet voice



This archive was generated by hypermail 1.03b2.