Re: word importer (was Re: commit -- Patch for HTML export (bug#


Subject: Re: word importer (was Re: commit -- Patch for HTML export (bug#
From: Caolan McNamara (Caolan.McNamara@ul.ie)
Date: Tue Dec 07 1999 - 02:44:35 CST


On 06-Dec-99 Justin Bradford wrote:
>
>Does anybody have a good .doc with an example of orphan/widow stuff?
>I'm looking through the paragraph/section properties stored in Word files,
>and I'm not sure which describes this behavior. I need to experiment.

PAP.fWidowControl ??
/*when 1, Word will prevent widowed lines in this paragraph from being placed at
the beginning of a page */

>Like the orphan/widow stuff, I'm not sure what describes this property in
>Word, so if someone has an example (ideally, with a very large
>space-after, which you can give me the exact numerical value of, as well),
>then that would be helpful in identifying it.

In word each paragraph has a BRC for the top bottom left and right of it.
the Brc is the Border code which describes any border type like a dashed line
or dotted, or none. And it contains a distance, the dptSpace which is measured
in twips. In feature-examples/supported-paragraph-properties.doc there are
paragraphs with the feature implemented. The html output attempts to retain
this distance using stylesheets so you should be able to see the space in
action with this. There is also brcbetween which is a seperate border code
to be placed between paragraphs which are of the same type as each other
(isPAPConform(PAP *current,PAP *previous)).

>
>> Yep. To do a good job of importing fields, you'll need to add more field
>> types, though. Our current set is quite anemic.
>
>Yeah. I'll go through what Word supports and post a list here for new
>types/extra features we might want to add.
>
>> Exactly. Just mimic what happens when abi/test/wp/Styles.abw gets imported,
>> and you should be fine. I'm pretty sure the existing APIs should be wide
>> enough for you, but if you've got more info to pass, let me know.
>>
>> The one caveat is that style lookups will fail if they're referenced before
>> they're defined. Just to be safe, the .abw format is laid out so we can
>> load all user-defined styles before any document content.
>
>Caolan, is there a function I can call to get styles?
>I can either make AbiWord styles as I find exception text in the Word
>file which uses a specific style (possibly reducing the number of styles
>I have to generate), or I could import all of the styles in a Word file
>at the beginning (slightly easier to code), as the user might expect some
>of their custom styles to import as well (despite not having used them in
>the document yet [is this even possible?]). I imagine most can be mapped
>directly to AbiWord's default styles...

yes the wvParseStruct contains a STSH member, this is filled with the details
of all the styles. Look at the header file definition.
There are stsh.Stshi.cstd styles in it, and each style is a STD, whose name
is xstzName (in unicode) and whose CHP and optional PAP are to be found as
union members of the UPE array in each STD. As mentioned much earlier, character
styles have only one upe while paragraph styles have two. character styles don't
fill a chp or pap, but instead fill a chpx, so for the moment I would reccomend
that you avoid using character styles like the plague, but the paragraph styles
are ok. You might need to read stylesheet.c to see how to identify which is
which, and to see what the elements are and what they will contain.

C.

Real Life: Caolan McNamara * Doing: MSc in HCI
Work: Caolan.McNamara@ul.ie * Phone: +353-86-8790257
URL: http://www.csn.ul.ie/~caolan * Sig: an oblique strategy
Turn it upsidedown



This archive was generated by hypermail 2b25 : Tue Dec 07 1999 - 04:10:36 CST