word importer (was Re: commit -- Patch for HTML export (bug#461?))


Subject: word importer (was Re: commit -- Patch for HTML export (bug#461?))
From: Paul Rohr (paul@abisource.com)
Date: Sat Dec 04 1999 - 16:44:38 CST


At 01:32 PM 12/4/99 -0600, Justin Bradford wrote:
>On the Doc importer front:
>1. text-position will be coming very soon.

Cool. Aside from the lack of toolbar icons, this is the only thing needed
to make that whole row green. (Kudos again to Luke for getting everything
else in his initial patch.)

>2. I'm not sure what orphans and widows refers to, exactly.

Dumb formatting algorithms break paragraphs at the last line which happens
to fit on the page (or in the column). However, this can sometimes leave
only a few lines on either side of the page break -- one case is called a
widow, the other is an orphan. Here are examples of the two cases:

      xxxxxxx xxxxx x xxxxx xxxx
  xxxxxx xxxxxx xxxxxxxx xxxxxx
  xxx xxxxxx xxxxxxxx xxxxxxx xxx
  xxxx xxxxxxx xxx xxxx xxxxxxx
  
  -- calculated page break --

  xxxxxx xx

      xxxxxxx xxxxx x xxxxx xxxx
  xxxxxx xxxxxx xxxxxxxx xxxxxx
  xxx xxxxxx xxxxxxxx xxxxxxx xxx
  xxxx xxxxxxx xxx xxxx xxxxxxx
  xxxxxx xx

      xxxxxxx xxxxx x xxxxx xxxx

  -- calculated page break --

  xxxxxx xxxxxx xxxxxxxx xxxxxx
  xxx xxxxxx xxxxxxxx xxxxxxx xxx
  xxxx xxxxxxx xxx xxxx xxxxxxx
  xxxxxx xx

These block-level properties tell the formatter to *not* leave widows or
orphans when breaking a specific block. (In this example, the first page
would get broken a line earlier, and the second would be broken either a
line earlier or later.)

Ask Eric for more details on the specific semantics, or just grep through
the code to see how they're used.

>3. tabstops means custom tab settings, right? If so, that's actually a
>"bug" as I have code to generate the tabs in the ruler.

Precisely. As the Tabs POW mentioned, we've already specified the syntax
for left, center, right, bar, and decimal tabs, and all but bar tabs work
properly on the ruler. This should be enough for you to confirm whether
you've imported them properly.

After that, then the primary bugs remaining are:

  - syntax for tab leaders, and
  - a bunch of formatter support.

>5. columns is just support for multicolumn sections, right? I believe that
>works.

That's what it should be. Argue with Bob about whether it works or not. I
haven't seen a test case either way. :-)

>6. I'm not quite sure what section-space-after is.

You can currently insert a "continuous" section break, which allows the next
section to be on the same page. This is useful if you want to change the
number of columns on the same page. For example you could have a page which
looks like this:

      xxxxxxx xxxxx x xxxxx xxxx
  xxxxxx xxxxxx xxxxxxxx xxxxxx
  xxx xxxxxx xxxxxxxx xxxxxxx xxx
  xxxx xxxxxxx xxx xxxx xxxxxxx
  xxxxxx xx
  
  -- explicit section break --

    xx xxx xxx xxxx xxx xxx
  xxx xxx xxxxx xxx xxxx xxxxx
  xxxxx xxx xxx xxxxxx xxx xx
  xxx xxx xxxx
  xxxx xxx xxxxx xxx xxx xxx
  xxxxx xxx xxx xxxx xxxx xxx

That property controls how much vertical white space should be put between
those two sections on the same page.

>Images, breaks, and fields are pretty easy (well, once you get an image
>buffer from wv). Mostly just requires implementing the special character
>handler.

Yep. To do a good job of importing fields, you'll need to add more field
types, though. Our current set is quite anemic.

>Styles are straight-forward, but require a bunch of annoyingly mundane
>code changes. Although, I guess I'm not sure what to do with styles from
>Word which do not have an AbiWord equivalent (ie. custom user styles). I
>can create new styles as I'm importing, right?

Exactly. Just mimic what happens when abi/test/wp/Styles.abw gets imported,
and you should be fine. I'm pretty sure the existing APIs should be wide
enough for you, but if you've got more info to pass, let me know.

The one caveat is that style lookups will fail if they're referenced before
they're defined. Just to be safe, the .abw format is laid out so we can
load all user-defined styles before any document content.

Paul



This archive was generated by hypermail 2b25 : Sat Dec 04 1999 - 16:39:35 CST