5 messages in com.googlegroups.google-desktop-developerRe: Google UTF-8 URL encoding| From | Sent On | Attachments |
|---|---|---|
| Mohamed El Wakil | 24 May 2005 01:21 | |
| mart...@gmail.com | 26 May 2005 17:39 | |
| Mohamed El Wakil | 27 May 2005 03:14 | |
| mart...@gmail.com | 29 May 2005 14:13 | |
| Mohamed El Wakil | 30 May 2005 13:57 |
| Subject: | Re: Google UTF-8 URL encoding![]() |
|---|---|
| From: | mart...@gmail.com (mart...@gmail.com) |
| Date: | 05/26/2005 05:39:11 PM |
| List: | com.googlegroups.google-desktop-developer |
I don't know if this will help but this type of method,it is what I use to convert xml files from UTF-8 to UTF-16 via ADO stream, It will work for other character sets like arabic or iso-8859-6
There are probably lower level Windows API calls, to do this such as MultiByteToWideChar to convert Ansii to Unicode asuming you use the LCID or codepage. Some sample VB code here from Michael Kaplan that does this, he wrote a book on windows code internationalization.
http://www.trigeminal.com/code/InCodePage.bas
Sub testutfMethod() Dim strm As Object Dim strXML As String Set strm = CreateObject("ADODB.Stream") strm.Type = 2 '2 is type text (adTypeText) 'pick UTF-8 strm.Charset = "UTF-8" ' or strm.Charset = "iso-8859-2"
strXML = "<?xml version=""1.0"" encoding=""UTF-8""><root>You have to pay" & _ " 2.50 if you use German umlauts ä, ö, ü.<\/root>" strXML = "<Language Text= Árvízt?r? tükörfúrógép " & _ " Translation= flood-proof mirror-drilling machine," & _ " only all non-ASCII letters">Hungarian (hu)</Language>" strm.Open strm.WriteText strXML strm.SaveToFile "c:\temp\test2005061.xml", 2 strm.Close End Sub
Sub ReadUTF8SaveFileInUTF16(strFileIn As String, strFileOut As String) '1/2 ReadToFile / SaveToFile snippet 'http://www.codeproject.com/soap/XMJFileStreaming.asp?msg=841289&mode=all&userid=903408#xx767979xx 'used ado 2.7 Dim stm As ADODB.stream Dim strPath As String Dim strData As String
'the character set names for the machine are in the registry 'For a list of the character set strings that is known by a system, see 'the subkeys of HKEY_CLASSES_ROOT\MIME\Database\Charset 'in the Windows Registry.
Set stm = New ADODB.stream stm.Open stm.Charset = "UTF-8" 'input file character set stm.Position = 0 stm.Type = adTypeText ' stm.LoadFromFile strFileIn ' if you just try and dump out stream ' without reading and writing you get double BOM
stm.Position = 0 'reset to beginning of stream Dim strDataout strData = stm.ReadText() ' line below used to change encoding instruction for xml files ' <?xml version="1.0" encoding="UTF-16" ?> 'strData = Replace(strData, "utf-8", "UTF-16", 1, 1) strData = Replace(strData, "utf-8", "UTF-16", 1, 1) Debug.Print strData stm.Position = 0 ' set output file character set stm.Charset = "UTF-16" ' "Unicode" '"iso-8859-1" "ascii" '"Big5" '"hebrew" stm.WriteText (strData) stm.SaveToFile strFileOut, adSaveCreateOverWrite stm.Close Set stm = Nothing End Sub




