Re: word importer (was Re: commit -- Patch for HTML export (bug#461?))


Subject: Re: word importer (was Re: commit -- Patch for HTML export (bug#461?))
From: Justin Bradford (justin@ukans.edu)
Date: Mon Dec 06 1999 - 15:03:10 CST


> >On the Doc importer front:
> >1. text-position will be coming very soon.
>
> Cool. Aside from the lack of toolbar icons, this is the only thing needed
> to make that whole row green. (Kudos again to Luke for getting everything
> else in his initial patch.)

Ok, I've committed the code to handle super/subscript from Word files.

> >2. I'm not sure what orphans and widows refers to, exactly.
>
> Dumb formatting algorithms break paragraphs at the last line which happens
> to fit on the page (or in the column). However, this can sometimes leave
> only a few lines on either side of the page break -- one case is called a
> widow, the other is an orphan. Here are examples of the two cases:

Does anybody have a good .doc with an example of orphan/widow stuff?
I'm looking through the paragraph/section properties stored in Word files,
and I'm not sure which describes this behavior. I need to experiment.

> >3. tabstops means custom tab settings, right? If so, that's actually a
> >"bug" as I have code to generate the tabs in the ruler.
>
> Precisely. As the Tabs POW mentioned, we've already specified the syntax
> for left, center, right, bar, and decimal tabs, and all but bar tabs work
> properly on the ruler. This should be enough for you to confirm whether
> you've imported them properly.

Ok. The importer builds a series of tab descriptions, just like the
AbiWord format importer does. So, once the actual underlying tab code is
complete, these should show up.

Anyone with a .doc with custom tabs which do not show up in AbiWord?

> >5. columns is just support for multicolumn sections, right? I believe that
> >works.
>
> That's what it should be. Argue with Bob about whether it works or not. I
> haven't seen a test case either way. :-)

I believe that I have tested this with some of Caolan's example docs.

> >6. I'm not quite sure what section-space-after is.
>
> That property controls how much vertical white space should be put between
> those two sections on the same page.

Like the orphan/widow stuff, I'm not sure what describes this property in
Word, so if someone has an example (ideally, with a very large
space-after, which you can give me the exact numerical value of, as well),
then that would be helpful in identifying it.

> Yep. To do a good job of importing fields, you'll need to add more field
> types, though. Our current set is quite anemic.

Yeah. I'll go through what Word supports and post a list here for new
types/extra features we might want to add.

> Exactly. Just mimic what happens when abi/test/wp/Styles.abw gets imported,
> and you should be fine. I'm pretty sure the existing APIs should be wide
> enough for you, but if you've got more info to pass, let me know.
>
> The one caveat is that style lookups will fail if they're referenced before
> they're defined. Just to be safe, the .abw format is laid out so we can
> load all user-defined styles before any document content.

Caolan, is there a function I can call to get styles?
I can either make AbiWord styles as I find exception text in the Word
file which uses a specific style (possibly reducing the number of styles
I have to generate), or I could import all of the styles in a Word file
at the beginning (slightly easier to code), as the user might expect some
of their custom styles to import as well (despite not having used them in
the document yet [is this even possible?]). I imagine most can be mapped
directly to AbiWord's default styles...

Justin



This archive was generated by hypermail 2b25 : Mon Dec 06 1999 - 15:03:14 CST