10 messages in com.googlegroups.bloggerdevRe: Problems with character encoding ...| From | Sent On | Attachments |
|---|---|---|
| Iúri Chaer | 05 Apr 2005 14:03 | |
| Toru Marumoto | 05 Apr 2005 14:45 | |
| Iúri Chaer | 05 Apr 2005 16:43 | |
| Toru Marumoto | 07 Apr 2005 08:48 | |
| Volabulary | 28 Apr 2005 06:46 | |
| Steve Jenson | 28 Apr 2005 10:50 | |
| Volabulary | 29 Apr 2005 04:55 | |
| ehet...@gmail.com | 01 May 2005 00:01 | |
| David Stewart | 01 May 2005 13:56 | |
| ehet...@gmail.com | 02 May 2005 16:02 |
| Subject: | Re: Problems with character encoding through Atom![]() |
|---|---|
| From: | ehet...@gmail.com (ehet...@gmail.com) |
| Date: | 05/01/2005 12:01:26 AM |
| List: | com.googlegroups.bloggerdev |
Steve,
The following entry doesn't work for me. *** <?xml version="1.0" encoding="utf-8" standalone="yes"?> <entry xmlns="http://purl.org/atom/ns#"> <generator url="http://purl.org/net/emacs-atom-api/">Elisp Atom API</generator> <author> <name>Erik</name> </author> <issued>2005-04-30T23:34:38-07:00</issued> <title mode="escaped" type="text/html">Test entry</title>
<content type="application/xhtml+xml" xml:space="preserve"> <div xmlns="http://www.w3.org/1999/xhtml">'Tést êñtrÿ'</div> </content> </entry> ***
There is a hexdump later so that you can verify that the data is in proper utf-8. If I post this using:
curl --data-binary @test-entry.xml -H "Content-type: application/xml; charset=utf-8" -# -v -u ehetzner:XXXXXX http://www.blogger.com/atom/12265347
We get the following session:
* About to connect() to www.blogger.com port 80 * Trying 66.102.15.100... * connected * Connected to www.blogger.com (66.102.15.100) port 80 * Server auth using Basic with user 'ehetzner'
POST /atom/12265347 HTTP/1.1
Authorization: Basic ZWhldHpuZXI6Zm9ydHky User-Agent: curl/7.13.2 (i386-pc-linux-gnu) libcurl/7.13.2 OpenSSL/0.9.7e zlib/1.2.2 libidn/0.5.13 Host: www.blogger.com Pragma: no-cache Accept: */* Content-type: application/xml; charset=utf-8 Content-Length: 485
<?xml version="1.0" encoding="utf-8" standalone="yes"?> <entry xmlns="http://purl.org/atom/ns#"> <generator url="http://purl.org/net/emacs-atom-api/">Elisp Atom API</generator> <author> <name>Erik</name> </author> <issued>2005-04-30T23:34:38-07:00</issued> <title mode="escaped" type="text/html">Test entry</title> <content type="application/xhtml+xml" xml:space="preserve"> <div xmlns="http://www.w3.org/1999/xhtml">?<80><98>Tést êñtrÿ?<80><99></div> </content> </entry>< HTTP/1.1 200 < Date: Sun, 01 May 2005 06:59:12 GMT < Server: Apache < Set-Cookie: JSESSIONID=233362E8BC88A692F49BC28F4B1CA285; Path=/ < Transfer-Encoding: chunked < Content-Type: application/xml; charset=utf-8 < Set-Cookie: NSC_cmphhfs-fyu=0a1401860050;Version=1;Max-Age=120;path=/ ^M######################################################################## 100.0%^M######################################################################## 100.0%* Connection #0 to host www .blogger.com left intact
* Closing connection #0 <?xml version="1.0" encoding="utf-8" standalone="yes"?> <entry xmlns="http://purl.org/atom/ns#"> <link href="http://www.blogger.com/atom/12265347/111493075270349241" rel="service.edit" title="Test entry" type="application/atom+xml"/> <author> <name>E.</name> </author> <issued>2005-04-30T23:34:38-07:00</issued> <modified>2005-05-01T06:59:12Z</modified> <created>2005-05-01T06:59:12Z</created> <link href="http://ehetzner.blogspot.com/2005/04/test-entry_111493075270349241.html" rel="alternate" title="Test entry" type="text/html"/> <id>tag:blogger.com,1999:blog-12265347.post-111493075270349241</id> <title mode="escaped" type="text/html">Test entry</title> <content type="application/xhtml+xml" xml:base="http://ehetzner.blogspot.com" xml:space="preserve"> <div xmlns="http://www.w3.org/1999/xhtml"> <div xmlns="http://www.w3.org/1999/xhtml">‘T?<83>©st ?<83>ª?<83>±tr?<83>¿â€™</div> </div> </content> </entry>
I get mangled data on the weblog (i.e., the content is interpreted as latin-1). You can see this in the reply from blogger.
Here's a hexdump: 00000000 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31 |<?xml version="1| 00000010 2e 30 22 20 65 6e 63 6f 64 69 6e 67 3d 22 75 74 |.0" encoding="ut| 00000020 66 2d 38 22 20 73 74 61 6e 64 61 6c 6f 6e 65 3d |f-8" standalone=| 00000030 22 79 65 73 22 3f 3e 0a 3c 65 6e 74 72 79 20 78 |"yes"?>.<entry x| 00000040 6d 6c 6e 73 3d 22 68 74 74 70 3a 2f 2f 70 75 72 |mlns="http://pur| 00000050 6c 2e 6f 72 67 2f 61 74 6f 6d 2f 6e 73 23 22 3e |l.org/atom/ns#">| 00000060 0a 20 20 3c 67 65 6e 65 72 61 74 6f 72 20 75 72 |. <generator ur| 00000070 6c 3d 22 68 74 74 70 3a 2f 2f 70 75 72 6c 2e 6f |l="http://purl.o| 00000080 72 67 2f 6e 65 74 2f 65 6d 61 63 73 2d 61 74 6f |rg/net/emacs-ato| 00000090 6d 2d 61 70 69 2f 22 3e 45 6c 69 73 70 20 41 74 |m-api/">Elisp At| 000000a0 6f 6d 20 41 50 49 3c 2f 67 65 6e 65 72 61 74 6f |om API</generato| 000000b0 72 3e 0a 20 20 3c 61 75 74 68 6f 72 3e 0a 20 20 |r>. <author>. | 000000c0 20 20 3c 6e 61 6d 65 3e 45 72 69 6b 3c 2f 6e 61 | <name>Erik</na| 000000d0 6d 65 3e 0a 20 20 3c 2f 61 75 74 68 6f 72 3e 0a |me>. </author>.| 000000e0 20 20 3c 69 73 73 75 65 64 3e 32 30 30 35 2d 30 | <issued>2005-0| 000000f0 34 2d 33 30 54 32 33 3a 33 34 3a 33 38 2d 30 37 |4-30T23:34:38-07| 00000100 3a 30 30 3c 2f 69 73 73 75 65 64 3e 0a 20 20 3c |:00</issued>. <| 00000110 74 69 74 6c 65 20 6d 6f 64 65 3d 22 65 73 63 61 |title mode="esca| 00000120 70 65 64 22 20 74 79 70 65 3d 22 74 65 78 74 2f |ped" type="text/| 00000130 68 74 6d 6c 22 3e 54 65 73 74 20 65 6e 74 72 79 |html">Test entry| 00000140 3c 2f 74 69 74 6c 65 3e 0a 20 20 3c 63 6f 6e 74 |</title>. <cont| 00000150 65 6e 74 20 74 79 70 65 3d 22 61 70 70 6c 69 63 |ent type="applic| 00000160 61 74 69 6f 6e 2f 78 68 74 6d 6c 2b 78 6d 6c 22 |ation/xhtml+xml"| 00000170 20 78 6d 6c 3a 73 70 61 63 65 3d 22 70 72 65 73 | xml:space="pres| 00000180 65 72 76 65 22 3e 0a 20 20 20 20 3c 64 69 76 20 |erve">. <div | 00000190 78 6d 6c 6e 73 3d 22 68 74 74 70 3a 2f 2f 77 77 |xmlns="http://ww| 000001a0 77 2e 77 33 2e 6f 72 67 2f 31 39 39 39 2f 78 68 |w.w3.org/1999/xh| 000001b0 74 6d 6c 22 3e e2 80 98 54 c3 a9 73 74 20 c3 aa |tml">?..Tést ê| 000001c0 c3 b1 74 72 c3 bf e2 80 99 3c 2f 64 69 76 3e 0a |ñtrÿ?..</div>.| 000001d0 20 20 3c 2f 63 6f 6e 74 65 6e 74 3e 0a 3c 2f 65 | </content>.</e| 000001e0 6e 74 72 79 3e |ntry>| 000001e5




