7 messages in net.sourceforge.lists.courier-users[courier-users] The Possibility to Su...
FromSent OnAttachments
ma...@intron.acMay 21, 2006 11:37 pm 
Sam VarshavchikMay 22, 2006 3:54 am 
YsbeerMay 25, 2006 2:52 pm 
Sam VarshavchikMay 25, 2006 3:19 pm 
ma...@intron.acMay 27, 2006 12:24 am 
Sam VarshavchikMay 27, 2006 6:56 am 
ma...@intron.acMay 27, 2006 7:34 am 
Actions with this message:
Paste this link in email or IM:
Paste this link in email or IM:
Atom feed for this thread
Paste this URL into your reader:
Subject:[courier-users] The Possibility to Substitute GNU Libiconv for Your Unicode LibraryActions...
From:ma...@intron.ac (ma@intron.ac)
Date:May 21, 2006 11:37:38 pm
List:net.sourceforge.lists.courier-users

Hi, Mr. Sam,

I think GNU libiconv is a better choice than you maintain a Unicode library yourself. Libiconv's maintainers are more professional to trace The Unicode Consortium.

Actually, it is oriental people, who speak large character set languages, that has much more eager requirement for Unicode support than western people, most of whose languages can be expressed in 256-glyph character sets. But at the same time, the maintenance of large character sets such as Chinese (GB18030, BIG5-HKSCS), Japanese (Shift-JIS) is a piece of tiring work. The constitutors of these encodings, Chinese/Japanese/Korean governments and other organizations, are modifying these encoding standard continually, according to Chinese/Japanese/Korean people's writing fashions.

You said to me that GNU libiconv cannot provide meta data that Courier requires. But I think there is a workaround with GNU libiconv:

Assume a byte string: [b1 b2 b3 b4 b5 ... bn] (ended with a CR/LF) 1. Initialize GNU Libiconv: iconv_open("UCS-4BE", "SOME ENCODING"); 2. Try iconv() against: [b1] If successfully, the current character is [b1], skip [b1] and continue from step 2. 3. Try iconv() against: [b1 b2] If successfully, the current character is [b1 b2], skip [b1 b2] and from step 2. 4. Try iconv() against: [b1 b2 b3] If successfully, the current character is [b1 b2 b3], skip [b1 b2 b3] and continue from step 2. 5. Try iconv() against: [b1 b2 b3 b4] If successfully, the current character is [b1 b2 b3 b4], skip [b1 b2 b3 b4] and continue from step 2. 6. Try iconv() against: [b1 b2 b3 b4 b5] If successfully, the current character is [b1 b2 b3 b4 b5], skip [b1 b2 b3 b4 b5] and continue from step 2. 7. Try iconv() against: [b1 b2 b3 b4 b5 b6] If successfully, the current character is [b1 b2 b3 b4 b5 b6], skip [b1 b2 b3 b4 b5 b6] and continue from step 2. 8. Output "?" as a dummy substitution, Skip [b1], and continue from step 2.

Of course, some optimization measures can be applied to the above workaround. Only trials of [b1] and [b1 b2] is needed for GB2312, GBK, BIG5, BIG5-HKSCS, EUC-JP and Shift-JIS. GB18030 requires [b1], [b1 b2] and [b1 b2 b3 b4]. UTF-8 requires [b1], [b1 b2], [b1 b2 b3], [b1 b2 b3 b4], [b1 b2 b3 b4 b5] and [b1 b2 b3 b4 b5 b6].

------------------------------------------------------------------------ From Beijing, China