Re: A Proposal (why we should have setBold(true))


Subject: Re: A Proposal (why we should have setBold(true))
From: sam th (sam@bur-jud-118-039.rh.uchicago.edu)
Date: Wed May 31 2000 - 23:24:37 CDT


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 31 May 2000, Paul Rohr wrote:

> At 10:09 PM 5/31/00 -0500, sam th wrote:
> >The way it looks to me from the Abi side of the word importer is that wv
> >provides us with all the properties for a span at once. However, in HTML,
> >you have to deal with lots of messy inheritance. I don't think we should
> >assume that every file format we deal with will be as nice to our system
> >as wv is currently. But then again, maybe HTML is an exception.
>
> Bingo. Sounds like we've found the core issue.
>
> AFAICT, for most word processing formats (not just Word) you can easily
> determine all the properties of a span at once. Since they share this
> characteristic with AbiWord's internal format, doing the required mappings
> is tedious, but not usually that hard.
>
> By contrast, classic HTML definitely has "lots of messy inheritance" --
> which is what makes that importer somewhat harder to write. You have to
> keep track all of the goofy nesting situations.
>
> The necessary state machines to flatten nested markup really aren't that
> bad, though. For well-formed XHTML, essentially the transformation you're
> doing is just a tree-walking exercise:
>
> <B>one --> font-weight:bold
> <I>two</I> --> font-weight:bold; font-style:italic
> three</B> --> font-weight:bold
>
> However, you'll need a different state machine for classic HTML so that you
> can properly interpret format-toggling "messes" like this:
>
> <B>one --> font-weight:bold
> <I>two</B> --> font-weight:bold; font-style:italic
> three</I> --> font-style:italic
>
> Sounds to me like those importers are *exactly* where such code belongs, no?

Well, I had never planned to implement HTML, as opposed to XHTML. I have
no desire to write that parser (and it would belong in the importer). But
the problem is that with your first example, the importer has to keep
track of the state of the formatting at any given point, which is
information that is avalible in the PT. This is why I think that a
controller class, as suggested by Jeff among others, is the best way to
handle this.

Now, if I had an insertion point while I was doing the importing, the PT
would be much easier to use (I could just use the sort of code FV_View
uses). I could then make changes to formatting, instead of just appending
further formatting. But the view maintains its own insertion point, not
the PT.

So the options I see are as follows:

1 - the importer could duplicate lots of FV_View functionality.

2 - the importer could have access to that sort of functionality (via a
controller class).

Option 1 would probably happen sooner. But I think Option 2 is the Right
Thing To Do (TM).

And lest you think that this is unique to XHTML, you should look at a
KWord document sometime. Importing that would take yet more heavy lifting
(which would be easier with a controller).

Just my 2c
           
                                     sam th
                                     sam@uchicago.edu
                                http://sam.rh.uchicago.edu
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.1 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE5NeWHt+kM0Mq9M/wRAnNSAKDG2v4UHfcuP/8PNmR0K1E5THaHdQCeOFIK
fCcgOyuOFBTDj0LCby/ogOA=
=RdMD
-----END PGP SIGNATURE-----



This archive was generated by hypermail 2b25 : Wed May 31 2000 - 23:24:27 CDT