Re: Importing HTML


Subject: Re: Importing HTML
From: Dom Lachowicz (cinamod@hotmail.com)
Date: Tue Apr 24 2001 - 11:24:29 CDT


>Any chance of a partial/lossy import, ignore all unknown tags, dump all
>unmatched tags ...???

We already ignore unknown tags (such as frames or tables). I have *no* idea
how to get expat or libxml2 to not choke on unmatched tags. I'm not sure
that we would even want to do this. We'd need a parsing engine that's a lot
more complex like Gecko to do this "correctly."

>More simply, what im trying to suggest is, "Error: this document
>contains
>invalid HTML. Would you like to import it as plain text with line
>breaks"

This'd be ok, but you can already import HTML as text (choose "open as
text"). Are you suggesting that we remove the markup tags on import?

>Please :)
>
>or as a temporary measure we could recommend a HTML validator like:
>http://validator.w3.org/

This is a reasonable suggestion.

Dom

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com



This archive was generated by hypermail 2b25 : Tue Apr 24 2001 - 11:24:35 CDT