Re: Using AbiWord for HTML


Subject: Re: Using AbiWord for HTML
From: Martin Sevior (msevior@mccubbin.ph.unimelb.edu.au)
Date: Thu Feb 07 2002 - 09:23:41 CST


On Thu, 7 Feb 2002, F J Franklin wrote:

> [copying to abi-dev]
>
> > > AbiWord cannot open C:\Temp\junk.htm. It appears to be a bogus or invalid
> > > document.
> >
> > I have the impression that Abiword is very intolerant about anything it
> > tries to read. This is where I think the conception of file import is
> > wrong; instead of giving up, Abiword should do the best out of the worst
> > job.
>
> Sounds like a good idea. If I understand correctly, currently importers
> return UT_OK for success, anything else being failure, why not add a
> third, "fuzzy" option (e.g., UT_IE_INTERRUPTED)? - In which case, the
> "bogus or invalid document" dialog is still displayed but a document
> exists and is processed normally.

This might be a good long term goal but I think we need to be very careful
about this. If a document is labelled bogus it is often because there is
some internal inconsistency. This inevitabally leads to segfaults. There
are examples of these types of documents in bugzilla already.
I think they were caused by bugs in the RTF importer.

If some brave soul wants to write a "document fixer" routine that fixes
these internal inconsistencies then it should not be too hard to import a
document up to the error.

>
> Or maybe this should be up to individual importers and AbiWord doesn't
> care? For example, the (X)HTML importer could append
>
> <p><b>Transfer Interrupted!</b></p>
>
> to what has been read successfully, and return UT_OK - AbiWord is none the
> wiser.
>
> On quick note on HTML import: AbiWord's XHTML importer needs the input to
> be valid XML and most HTML documents in the wild fail this requirement
> very quickly...
>
> Regards, Frank
>
> Francis James Franklin
> F.J.Franklin@shef.ac.uk
>
> It's getting them wrong that is living, getting them wrong and wrong and
> wrong and then, on careful reconsideration, getting them wrong again.
> That's how we know we're alive: we're wrong.
> --- Philip Roth
>
>
>
>
> -----------------------------------------------
> To unsubscribe from this list, send a message to
> abiword-user-request@abisource.com with the word
> unsubscribe in the message body.
>



This archive was generated by hypermail 2b25 : Thu Feb 07 2002 - 09:23:59 CST