10 messages in net.java.dev.jna.usersRe: [jna-users] String related questions
FromSent OnAttachments
Bartko ZoltanJul 12, 2007 4:49 am.java, .java
Bartko ZoltanJul 12, 2007 6:08 am 
Wayne MeissnerJul 12, 2007 6:09 am 
Timothy WallJul 12, 2007 8:03 am 
Timothy WallJul 12, 2007 8:38 am 
Bartko ZoltanJul 13, 2007 12:10 am 
Bartko ZoltanJul 13, 2007 1:05 am 
Timothy WallJul 13, 2007 7:11 am 
Oleksandr MaksymchukOct 22, 2007 4:25 am 
Timothy WallOct 22, 2007 6:34 am 
Actions with this message:
Paste this link in email or IM:
Paste this link in email or IM:
Atom feed for this thread
Paste this URL into your reader:
Subject:Re: [jna-users] String related questionsActions...
From:Bartko Zoltan (bart@bartkozoltan.com)
Date:Jul 13, 2007 12:10:37 am
List:net.java.dev.jna.users

As soon as the String I was trying to send to the lib was not pure ASCII, I had to use a NUL. Example:

sending "apué", "emberibbé" and "anyué" resulted in "apué��wqTA ", "emberibbé" and "anyuébbé" in the library.

From this I concluded that é (e acute) in the first word was not recognised correctly, because there is some garbage after it. I don't know why, but the second word passed through the conversion without any problem. The third word, however, received an ending of the second word. Therefore I added NUL to the end of the strings and it worked.

I tried also repeating the words (without the terminating NULs), e.g. "apué", "apué", "emberibbé", "emberibbé", "anyué", "anyué". The result was that the first occurence (nos. 1, 3, 5) was as described at the beginning, but nos. 2, 4, 6 passed through correctly.

As to the conversion: When using the Hunspell spell-checking library, the following procedure should be observed: 1. initialize the spell checker with the correct dictionary 2. get the encoding of the dictionary 3. check the spelling of the words sent to the library in the correct encoding 4. do some final cleanup.

Now, how do I do step 3 (I mean, converting the text I want to check to the correct encoding)?

Thanks

Zoltan

Dňa Thursday 12 July 2007 17:38:56 Timothy Wall ste napísal:

Automatic string conversion always appends a NUL terminator to native strings (or at least it should). Is your library perhaps expecting a double terminator? Where exactly did you need to add the explicit NUL?

On Jul 12, 2007, at 11:16 AM, Bartko Zoltan wrote:

In the meantime I found out that adding \u0000 to the end of the string solves the problem - I hope no memory leak is introduced with this.

I will have a look at the thing you said.

Thanks for the help.

Zoltan

Dňa Thursday 12 July 2007 17:04:21 Timothy Wall ste napísal:

On Jul 12, 2007, at 9:08 AM, Bartko Zoltan wrote:

Another thing I have found out:

if I do not use accented characters, the library receives the strings correctly. However, if I use accented characters, weird things happen:

If you look at how native strings are generated (in dispatch.c:newCString), there is likely a difference between the actual encoding and the encoding you want.

The native code uses String.getBytes to generate a C char array; this returns the bytes in the platform's native character set, not UTF-8. This code should probably be refactored to move some of the conversion into the java library and make the conversion dependent on some external setting (either a system property or an option set on the library).

With the current library you can always do String.getBytes(String encoding) and pass in an array of bytes (don't forget to NUL- terminate) instead of a String.