Re: Importing HTML


Subject: Re: Importing HTML
From: Sam TH (sam@uchicago.edu)
Date: Tue Apr 24 2001 - 13:05:03 CDT


On Tue, Apr 24, 2001 at 12:24:29PM -0400, Dom Lachowicz wrote:
> >Any chance of a partial/lossy import, ignore all unknown tags, dump all
> >unmatched tags ...???
>
> We already ignore unknown tags (such as frames or tables). I have *no* idea
> how to get expat or libxml2 to not choke on unmatched tags. I'm not sure
> that we would even want to do this. We'd need a parsing engine that's a lot
> more complex like Gecko to do this "correctly."
>

You cannot do this. It is an error for an XML processor to keep
processing after a well-formedness error.

Also, the real solution is to reccomment HTML Tidy (do a google
search).
           
sam th --- sam@uchicago.edu --- http://www.abisource.com/~sam/
OpenPGP Key: CABD33FC --- http://samth.dyndns.org/key
DeCSS: http://samth.dyndns.org/decss




This archive was generated by hypermail 2b25 : Tue Apr 24 2001 - 12:52:13 CDT