19 messages in com.mysql.lists.dotnetRe: Charsets, collation and connections.| From | Sent On | Attachments |
|---|---|---|
| Jorge Bastos | 26 Feb 2005 04:55 | |
| Brandon Schenz | 26 Feb 2005 05:34 | |
| Jorge Bastos | 26 Feb 2005 05:37 | |
| Brandon Schenz | 26 Feb 2005 05:47 | |
| Jorge Bastos | 26 Feb 2005 05:56 | |
| Frank | 26 Feb 2005 06:03 | |
| James Moore | 26 Feb 2005 07:35 | |
| James Moore | 26 Feb 2005 07:47 | |
| Jordan Sparks | 26 Feb 2005 07:59 | |
| James Moore | 26 Feb 2005 08:40 | |
| Daniel Fisla | 26 Feb 2005 12:08 | |
| Jorge Bastos | 26 Feb 2005 12:30 | |
| Guy Platt | 28 Feb 2005 03:46 | |
| Kevin Turner | 28 Feb 2005 04:45 | |
| Reggie Burnett | 28 Feb 2005 06:37 | |
| mike...@mygenerationsoftware.com | 28 Feb 2005 07:05 | |
| Daniel Fisla | 01 Mar 2005 15:21 | |
| Reggie Burnett | 02 Mar 2005 04:56 | |
| Daniel Fisla | 02 Mar 2005 13:44 |
| Subject: | Re: Charsets, collation and connections.![]() |
|---|---|
| From: | Reggie Burnett (reg...@mysql.com) |
| Date: | 03/02/2005 04:56:44 AM |
| List: | com.mysql.lists.dotnet |
Daniel
Are you saying you think this is a problem with the connector or with the server?
Daniel Fisla wrote:
After doing much research I am at a loss how really utf8_bin and utf8_general_ci differ, besides the obvious -- sort order.
What I inferred from MySQL docs and some online articles is the following.
utf8_bin is pretty much what it suggests, utf8 strings are stored as bytes and MySQL server compares/sorts these on single byte basis, where the value of each byte determines sort order. Make sense to me as utf8 is multi-byte encoding anyway.
So what is utf8_general_ci collation all about, since it is DEFAULT collation for utf8 character set. But here is the kicker, it does not implement all utf8 characters, especially for languages like jp, ko, arabic, and other right to left languages.
I know for fact ucs2_general_uca (default collation for ucs2 Unicode encoding) has only partial Unicode support, so my guess is, the same may be true for utf8_general_ci collation.
The only way I can get full Unicode support is to use utf8_bin collation as utf8_general_ci messes some characters? (chars ?? are returned from the db)
My question is, has anyone else run into similar issues with collations?
Especially, how can utf8_general_ci not support all characters but be the default collation for utf8?
The only way to get things to work is to set charset to utf8 for everything, and utf8_bin collation for everything as well. (connection, db, tables)
I had to modify the mysql connector 1.0.4 to include:
SET collation_connection = @@collation_database;
# database collation is utf8_bin
Cheers,
-Daniel.
-- Reggie Burnett, Software Developer MySQL Inc, http://www.mysql.com
MySQL Users Conference (Santa Clara CA, 18-21 April 2005) Early registration until February 28: http://www.mysqluc.com/




