using charset fallbacks in wvLIDToCodePageConverter


Subject: using charset fallbacks in wvLIDToCodePageConverter
From: Vlad Harchev (hvv@hippo.ru)
Date: Mon Nov 13 2000 - 00:07:35 CST


 Hi,

 wvLIDToCodePageConverter returns cp950 and cp936 that are unknown to the
majority of iconv implementations. As I understand, Big5 and GB2312 is very
good approximation of them. So, this patch should make import of word6 format
files with CJK inside possible (once 'chartype' is returned as non-zero for
CJK word6 docs). It tries first to iconv_open("cp*","cp*") , and then, if
failed, will use fallback charset. The return value is stored in static
variable, so that check is performed only once and cached.

 Not tested.

 This should be commited to upstream wv too IMO.

 Best regards,
  -Vlad

--- text.c-was Mon Nov 13 09:44:00 2000
+++ text.c Mon Nov 13 09:54:41 2000
@@ -51,6 +51,25 @@
         }
 
 
+#define CPNAME_OR_FALLBACK(name,fallbackname) \
+{ \
+ static char* cpname = NULL; \
+ if (!cpname) \
+ { \
+ iconv_t cd = iconv_open(name,name); \
+ if (cd==(iconv_t)-1) \
+ { \
+ cpname = fallbackname; \
+ } \
+ else \
+ { \
+ cpname = name; \
+ iconv_close(cd); \
+ } \
+ }; \
+ return cpname; \
+}
+
 char *wvLIDToCodePageConverter(U16 lid)
         {
         switch(lid)
@@ -62,9 +81,9 @@
                 case 0x0403: /*Catalan*/
                         return("CP1252");
                 case 0x0404: /*Traditional Chinese*/
- return("CP950");
+ CPNAME_OR_FALLBACK("CP950","BIG5")
                 case 0x0804: /*Simplified Chinese*/
- return("CP936");
+ CPNAME_OR_FALLBACK("CP936","GB2312")
                 case 0x0405: /*Czech*/
                         return("CP1250");
                 case 0x0406: /*Danish*/



This archive was generated by hypermail 2b25 : Mon Nov 13 2000 - 00:28:47 CST