Re: HTML importer plans...

michael@surfnetcity.com.au
Thu, 25 Mar 1999 14:58:08 +1100


On Thu, Mar 25, 1999 at 05:29:03AM +0100, Drazen Kacar wrote:

> > Headers: (Everything within the <HEAD> tag)
> >
> > Basically, I'll leave the headers alone. I might save them into memory for a
> > HTML export, but other than that, they're not needed for AbiWord.
>
> Wrong. When AbiWord gets a little I18N support, you'll need to look for
>
> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-2">
>
> if the document was written in Latin 2. That's the only way to declare
> code page within the document. It would also be nice to look for LANG
> attribute(s), such as
>
> <HTML LANG=en>

Thanks for pointing this out. Before I step off the the edge, I'll read the
HTML specs again, and re-assess what we need from the headers.

> > Comments?
>
> Good luck. You're gonna need it.

Thanks ;-)

Seriously, though, I'm not going into this with the approach of trying to
interpret what the user's intention was, unless it just fits in. If there's
bad HTML, don't expect a fuzzy logic mechanism that will interpret what the
user's intention was. That's not my style ;-)

-- 
-- Michael Samuel <michael@surfnetcity.com.au>


This archive was generated by hypermail 1.03b2.