Re: bogus document


Subject: Re: bogus document
From: WJCarpenter (bill-abisource@carpenter.ORG)
Date: Mon May 14 2001 - 14:14:19 CDT


[[Copying to the dev list, where such details are probably of more
interest. Respondents should drop the user list off replies to
this.]]

sam> The most likely funny character is the ampersand, which we don't
sam> properly escape in some places.

Nothing simple like that. It's some 3-byte sequence for opening and
closing smart quotes. Probably something that isn't legit in UTF-8.
Here is an isolated dump of the bad and good cases:

:; od -xac bad.xml
0000000 3fe2 6d3f 6e65 7375 3fe2 0a3f
          b ? ? m e n u s b ? ? nl
        342 ? ? m e n u s 342 ? ? \n

:; od -xac good.xml
0000000 80e2 6d9c 6e65 7375 80e2 0a9d
          b nul fs m e n u s b nul gs nl
        342 200 234 m e n u s 342 200 235 \n

-- 
bill@carpenter.ORG (WJCarpenter)    PGP 0x91865119
38 95 1B 69 C9 C6 3D 25    73 46 32 04 69 D6 ED F3



This archive was generated by hypermail 2b25 : Sat May 26 2001 - 03:51:04 CDT