19 messages in com.mysql.lists.dotnetRe: Charsets, collation and connections.
FromSent OnAttachments
Jorge Bastos26 Feb 2005 04:55 
Brandon Schenz26 Feb 2005 05:34 
Jorge Bastos26 Feb 2005 05:37 
Brandon Schenz26 Feb 2005 05:47 
Jorge Bastos26 Feb 2005 05:56 
Frank26 Feb 2005 06:03 
James Moore26 Feb 2005 07:35 
James Moore26 Feb 2005 07:47 
Jordan Sparks26 Feb 2005 07:59 
James Moore26 Feb 2005 08:40 
Daniel Fisla26 Feb 2005 12:08 
Jorge Bastos26 Feb 2005 12:30 
Guy Platt28 Feb 2005 03:46 
Kevin Turner28 Feb 2005 04:45 
Reggie Burnett28 Feb 2005 06:37 
mike...@mygenerationsoftware.com28 Feb 2005 07:05 
Daniel Fisla01 Mar 2005 15:21 
Reggie Burnett02 Mar 2005 04:56 
Daniel Fisla02 Mar 2005 13:44 
Subject:Re: Charsets, collation and connections.
From:Reggie Burnett (reg@mysql.com)
Date:03/02/2005 04:56:44 AM
List:com.mysql.lists.dotnet

Daniel

Are you saying you think this is a problem with the connector or with the server?

Daniel Fisla wrote:

After doing much research I am at a loss how really utf8_bin and utf8_general_ci differ, besides the obvious -- sort order.

What I inferred from MySQL docs and some online articles is the following.

utf8_bin is pretty much what it suggests, utf8 strings are stored as bytes and MySQL server compares/sorts these on single byte basis, where the value of each byte determines sort order. Make sense to me as utf8 is multi-byte encoding anyway.

So what is utf8_general_ci collation all about, since it is DEFAULT collation for utf8 character set. But here is the kicker, it does not implement all utf8 characters, especially for languages like jp, ko, arabic, and other right to left languages.

I know for fact ucs2_general_uca (default collation for ucs2 Unicode encoding) has only partial Unicode support, so my guess is, the same may be true for utf8_general_ci collation.

The only way I can get full Unicode support is to use utf8_bin collation as utf8_general_ci messes some characters? (chars ?? are returned from the db)

My question is, has anyone else run into similar issues with collations?

Especially, how can utf8_general_ci not support all characters but be the default collation for utf8?

The only way to get things to work is to set charset to utf8 for everything, and utf8_bin collation for everything as well. (connection, db, tables)

I had to modify the mysql connector 1.0.4 to include:

SET collation_connection = @@collation_database;

# database collation is utf8_bin

Cheers,

-Daniel.

MySQL Users Conference (Santa Clara CA, 18-21 April 2005) Early registration until February 28: http://www.mysqluc.com/