previews for embedded objects


Subject: previews for embedded objects
From: Paul Rohr (paul@abisource.com)
Date: Wed Mar 15 2000 - 03:39:40 CST


At 08:11 AM 3/14/00 GMT, Caolan McNamara wrote:
>One thing that word also does for the insertion of ole objects or
>external entities such as equatrions and clipart is to have the result
>stored as a graphic which is a snapshot of the last graphical layout
>of the active object. i.e. insert a equation edit object and the
>result of the field is stored as a wmf picture of what the object
>looked like, again allowing previewers and importers that do not
>understand ole2 objects to at least display what it looked like. In
>this case the field data itself contains the actual information as to
>where the active objects data is stored.

Oh, are those objects done as *fields* too? I didn't realize that.

I guess if I squint hard enough I can see the analogy. To achieve the
desired effect, you'd need to somehow store the following three hunks of
information:

  - an inline placeholder at the appropriate spot in the text
  - the blob of opaque data to be embedded
  - another blob of rasterized data which shows what it last looked like

However, in our file format I'd rather not implement that using field
markup. Instead, we could easily move both blobs out of line using data
constructs more-or-less as follows:

  <p>
  The following four examples should all really have a props attribute
  as well (to capture height and width), but I've omitted it to save space.
  </p>
 
  <p>
  This is a raster image: <image dataid=pic.png/>
  This is a vector image: <image dataid=clip.svg/>
  This is an unreadable OLE object: <object dataid=name1 preview=name1.svg/>
  This is an equation we can handle: <object dataid=eq1 preview=eq1.svg/>

  ...

  <data>
  <d name=pic.png type=png> ...(base64-encoded PNG)... </d>
  <d name=clip.svg type=xml> <svg>...</svg> </d>
  <d name=name1 type=ole2> ...(unreadable base64-encoded junk)... </d>
  <d name=name1.svg type=xml> <svg>...</svg> </d>
  <d name=eq1 type=xml> <mathml>...</mathml> </d>
  <d name=eq1.svg type=xml> <svg>...</svg> </d>
  </data>

Notes on this proposal:

1. Part of the idea here is to allow us to preserve round-trip fidelity for
embedded content from other file formats that we can't handle natively in
AbiWord.

For example, say we're opening a Word2000 document on BeOS which has an
embedded Visio document as part of its OLE stream. Even though we're not
likely to find an appropriate handler to invoke which is capable of editing
that data on the current platform, there's still no reason why we should
lose track of it entirely. By "parking" the unreadable gunk in our file
format, our BeOS user could edit other parts of the document and then save
it back out, so that the embedded Visio content remained there when the
updated document got transferred back to a Windows box.

This problem is most likely to arise when moving documents across platforms,
but even on the same platform, not everyine will have the same set of apps
installed. For example, even another Word2000 user isn't guaranteed to have
a copy of Visio, too.

2. As always, I've put minimal thought into the attribute names here. (For
example, "dataid" could just as easily have been "blob" and "preview" could
have been "last".) Better suggestions are welcome.

3. Binary data items are all base64-encoded. XML data items aren't.

4. Note the addition of a simplistic "type" field to data items.
Currently, they're all PNGs, but we knew all along that this would have to
change. I'm assuming that there's no need to explicitly tag various XML
types with an attribute, since the container tags inside (and/or their
namespaces) are more definitive anyhow.

Also, I debated how much metadata to put there for stuff we don't know how
to read -- type=opaque or type=blob seemed like giving up too easily, so I
compromised and just labelled it as OLE2 data for anyone who really really
wants to crack it open and wade through it any more than that. (Likewise,
any other platform-specific opaque content which can only be read via say,
bonobo or opendoc APIs, should probably also be base64-encoded and labeled
as such with the name of the embedding mechanism.)

5. An argument could be made that the content should be properly labelled
by MIME/content types instead. I'm somewhat sympathetic to this position,
but I question whether we could accurately determine the necessary content
types to allow accurate de-enveloping.

How much information *does* OLE2 provide about the contents of one of these
blobs? How far up the usefulness curve do they go?

  - just a GUID?
  - a user-readable name for the application or file format?
  - a reliable mime type?

Somehow, I'm not holding my breath that enough of that work is already done
for us, which is why the current proposal doesn't even attempt to generate
more specific content types.

6. Finally, before we go too far down this path, we should certainly add a
refcounting mechanism to data items so that unused ones don't continue to
hang around in the file format indefinitely. (This should be a known bug,
but I'm not usre it ever got logged in bugzilla.)

Paul



This archive was generated by hypermail 2b25 : Wed Mar 15 2000 - 03:34:16 CST