10 messages in com.mysql.lists.bugsRe: Wrong sorting order in croat.conf...| From | Sent On | Attachments |
|---|---|---|
| Dubravko Penezic | 25 Mar 2004 04:35 | .conf |
| Sinisa Milivojevic | 25 Mar 2004 05:58 | |
| Sinisa Milivojevic | 25 Mar 2004 07:18 | |
| Dubravko Penezic | 26 Mar 2004 00:15 | |
| Sinisa Milivojevic | 26 Mar 2004 05:25 | |
| Alexander Barkov | 30 Mar 2004 06:06 | |
| Dubravko Penezic | 30 Mar 2004 11:43 | |
| Alexander Barkov | 31 Mar 2004 02:37 | .txt |
| Alexander Barkov | 31 Mar 2004 02:44 | |
| Dubravko Penezic | 01 Apr 2004 02:13 |
| Subject: | Re: Wrong sorting order in croat.conf (spouse all version)![]() |
|---|---|
| From: | Dubravko Penezic (dpen...@srce.hr) |
| Date: | 04/01/2004 02:13:01 AM |
| List: | com.mysql.lists.bugs |
Hello !
I check complitly both sorting order for croatian character acording ISO 8859-2 and CP 1250 and recreate new table :
* Croatian Sorting Order acording Babic's "Hrvatski pravopis" - character with ASCII code less then 41(Hex) a sorted on respekting order to thay ASCII value - charcter upper and lower case are same - character with aditional elements which are not part of Croatian alphabet come after same character without that elements - every aditional character after 41(Hex) whicha are not part of alphabet (special chracter, signe) a sorted by thay ASCII position and come after last alphabet character - ISO-8859-2 mostly use on Unix/Linux platform - CP-1250 implemented on Microsoft Win platform - according actual Croatian law ISO 8859-2 is recomedation
# sort_order array (must have 256 elements) Croataian ISO-8859-2 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 43 44 48 4B 4D 4E 4F 50 52 53 54 56 57 59 5B 5C 5D 5F 62 64 66 67 68 69 6B 6E 6F 70 71 72 73 41 43 44 48 4B 4D 4E 4F 50 52 53 54 56 57 59 5B 5C 5D 5F 62 64 66 67 68 69 6B 7B 7C 7D 7E 7F 80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F 90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F A0 42 A2 55 A4 55 60 A7 A8 61 60 63 6C AD 6D 6C B0 42 B2 55 B4 55 60 B7 B8 61 60 63 6C BD 6D 6C 5E 42 42 42 42 55 47 45 46 4C 4C 4C 4C 51 51 49 4A 58 58 5A 5A 5A 5A D7 5E 65 65 65 65 6A 63 60 5E 42 42 42 42 55 47 45 46 4C 4C 4C 4C 51 51 49 4A 58 58 5A 5A 5A 5A F7 5E 65 65 65 65 6A 63 FF
# sort_order array (must have 256 elements) Croataian CP-1250 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 43 44 48 4B 4D 4E 4F 50 52 53 54 56 57 59 5B 5C 5D 5F 62 64 66 67 68 69 6B 6E 6F 70 71 72 73 41 43 44 48 4B 4D 4E 4F 50 52 53 54 56 57 59 5B 5C 5D 5F 62 64 66 67 68 69 6B 7B 7C 7D 7E 7F 80 81 82 83 84 85 86 87 88 89 61 8B 60 63 6D 6C 90 91 92 93 94 95 96 97 98 99 61 9B 60 63 6D 6C A0 42 A2 55 A4 42 60 A7 A8 61 60 63 6C AD 6D 6C B0 42 B2 55 B4 55 60 B7 B8 42 60 63 55 BD 55 6C 5E 42 42 42 42 55 47 45 46 4C 4C 4C 4C 51 51 49 4A 58 58 5A 5A 5A 5A D7 5E 65 65 65 65 6A 63 60 5E 42 42 42 42 55 47 45 46 4C 4C 4C 4C 51 51 49 4A 58 58 5A 5A 5A 5A F7 5E 65 65 65 65 6A 63 FF
where I my found *_ci files ? ... is it in 4.1.x or 5.0.x source ?
Dubravko Penezic ISA, SRCE University Computing Center Zagreb, Croatia
On Wed, 31 Mar 2004, Alexander Barkov wrote:
Hello!
Dubravko Penezic wrote:
Hi !
What you sed is complitly wrong becouse Ss in win-1250 is on position 9A and 8A and in ISO-8859-2 is on position B9 and A9.
Yes, I agree. The the above pages state this too. And this is what I wrote in my previous letter to Sinisa:
latin2:
0xA9 0x0160 #LATIN CAPITAL LETTER S WITH CARON 0xB9 0x0161 #LATIN SMALL LETTER S WITH CARON
cp1250:
0x8A 0x0160 #LATIN CAPITAL LETTER S WITH CARON 0x9A 0x0161 #LATIN SMALL LETTER S WITH CARON
I'm 100% sure abote these three facts:
1. "croat" was created for cp1250 character set, and it provides correct sort order for Croatian language for cp1250.
Wrong ... check in repository of codepage you have in source dir ./sql/share/charsets/
Did you try croat.conf on a Windows machine? Did it produce wrong Croatian sort order?
croat.conf win1250.conf
what you talking about is win1250 in some case called cp1250
Yes, it is fine for Croatin too (and for some other languages). That means we had two cp1250+Croatian compatible configurations, and didn't have a single one for latin2+Croatian.
I'm attaching a new sort order array for latin2+Croatian configuration. Can you please replace the old one by this new array, and test if sort order is fine.
There is also HTML file attached, it demonstrates the sort order in a clear manner.
Thank you!
Croatian languge have only one code page standard which is recommandate by law, and that is ISO Latin 2 under ISO code 8859-2.
CP1250 or win1250 or what ever Micro$oft would like ta call thay "standard", is only forced standard, but situation is radical changed in last 5 years.
Also freely check linux/unix implementation of croatian character, you will finde only latin2 under ISO-8859-2 code shema.
Also very simply test of ordering will show you that croat.conf sort under ISO-8859-2 code page except wrong Ss position, inserting character whit code of win1250/cp1250 will destroy that oreder.
2. "croat" was renamed into "latin2_croatian_ci" in mistake in 4.1, it should have been "cp1250_croatian_ci" instead.
once again wrong, see above.
3. We didn't have Croatian sort order for latin2 character set in MySQL so far, and it should be just added now.
Partialy thrue, you have,but with wrong Ss position :)
Your version could be used as a template for latin2 Croatian, but for my opinion, the patch fixes only a half of the problem, and an additional fix is required: to put CAPITAL LETTER Z WITH CARON on the same position with SMALL LETTER Z WITH CARON. Now they are sorted differently.
I will check tomorow morning what I can do with other part, maybe I will need some help about meaning of other part of table.
Please confirm this. Or am I mistaken? Please send the complete table for testing in this case.
I will send tomorow table for testing.
Thank You too, I understand that people outside Croatia, and many inside dont understand what is going on, and also government do nothing to make order in code page standard.
Dubravko Penezic
P.S.: Also I will try to make some test table for testing order and lower/upper case. P.S.S.: We use one table which may help to understanding code page whic are in use in Croatia. http://www.open.hr/hiz/kodsus/primjena.html
-- For technical support contracts, visit https://order.mysql.com/ __ ___ ___ ____ __ / |/ /_ __/ __/ __ \/ / Mr. Alexander Barkov <ba...@mysql.com> / /|_/ / // /\ \/ /_/ / /__ MySQL AB, Full-Time Developer /_/ /_/\_, /___/\___\_\___/ Izhevsk, Russia <___/ www.mysql.com +7-912-856-80-21





.conf