11 messages in com.googlegroups.google-calendar-help-dataapiRe: International characters problem
FromSent OnAttachments
Kulvinder Singh21 Mar 2007 05:05 
Kulvinder Singh28 Mar 2007 05:56 
Kyle Marvin28 Mar 2007 07:02 
Kulvinder Singh28 Mar 2007 21:31 
Kyle Marvin29 Mar 2007 06:28 
Kulvinder Singh29 Mar 2007 06:52 
Kyle Marvin29 Mar 2007 07:17 
Kulvinder Singh29 Mar 2007 07:52 
Kulvinder Singh29 Mar 2007 07:53 
Charlie Wood29 Mar 2007 08:07 
Kyle Marvin29 Mar 2007 08:25 
Subject:Re: International characters problem
From:Kulvinder Singh (kulv@yahoo.com)
Date:03/29/2007 07:53:57 AM
List:com.googlegroups.google-calendar-help-dataapi

Hi Kyle,

Thanks for the reply.

I think i need to convert the string into Byte array in UTF8 Encoding, remove
the unwanted bytes, convert that back to string and the HMTL Encode it.

Please tell me if i am wrong.

Kyle Marvin <kmar@google.com> wrote: On 3/29/07, Kulvinder Singh <kulv@yahoo.com> wrote: Hi Kyle,

Thanks thanks thanks for your prompt reply. I got it.

The reason why i just cant let the "Content" sent to Google straight away is
that in some of my Outlook Calendar events, there were Printer Feed characters
like PageBreaks and the Google Server is not able to handle them. Thats why i
was getting GoogleServer Errors for Unicode 0xc characters. Also, & and < are
not supported by Google server. So i need to HTMLEncode them.

Consider a situation where in the Content for a Google Event contains a
PageBreak, & and < (all three of them). In that case, the kind of error i am
getting is :

org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0xc) was
found in the element content of the document. org.xml.sax.SAXParseException: The content of elements must consist of
well-formed character data or markup.

org.xml.sax.SAXParseException: The entity name must immediately follow the
&#39;&amp;&#39; in the entity reference.

Please tell me how to handle a event content containing PageBreak, & and <
characters which Google cant handle.

OK, I see. There's really two separate things I'd recommend:

- I'd recommend you filter out the control characters (like PageBreak) out of
the content. You can do this by looking for characters in certain ranges (ex.
0-31 are control chars in most character sets)

- Because you are generating XML, you need to _XML escape_ characters that have
special meaning in XML documents. Here's the characters to look out for and
the encoded equivalents:

less than (<) : &lt; greater than (>) : &gt; ampersand (&) : &amp; apostrophe (') : &apos; quotation (") : &quot;

Hope this helps,

-- Kyle

Thanks in advance.

Kyle Marvin < kmar@google.com> wrote: On 3/28/07, Kulvinder Singh <kulv@yahoo.com > wrote: Hi Kyle,

Thanks a lot for your reply. Please help me furthur in this.

I want to insert an Event in Google Calendar with Description/Content as "æ
¾æœ¬çœŸå ¸ &&&&". Now, how should i decide whether this is an HTML or text
programmatically?

If you aren't using any HTML formatting in the content (as in the example
string) , then I'd recommend just using plain text content. If you encode it
as UTF-8 characters, you shouldn't have to worry about doing anything else with
the HTTP Content-Type header or XML encoding attribute.

Should i set the "type" of Content element as "text/html" everytime and
HTMLENCODE it by default or any other way ?

See above. Just use "text" or leave it off since that is the default.

Kyle Marvin <kmar@google.com> wrote: Hi Kulvinder,

For data in the API, you should not HTML encode the input data unless the "type"
attribute of the title and content is "html". If you haven't set type, the
default value is "text", per the Atom syntax spec (RFC4287). The gd:where
element attributes don't have a type and are always considered to be text.

For GData to be able to properly interpret encoded characters in your data, you
need to provide appropriate signals about the encoding. The character set
encoding of data sent via the API is determined by one of two things: the value
of the charset attribute of the HTTP Content-Type header or the value of the
"encoding" attribute on the <xml> declaration. If both are present, the HTTP
header is considered canonical. If neither is present, the default is "utf-8".
See [1] for more details and some examples.

Hope this is helpful,

-- Kyle

[1] - http://diveintomark.org/archives/2004/02/13/xml-media-types

On 3/21/07, Kulvinder Singh < kulv@yahoo.com> wrote: Hi,

I am facing some international characterization problem with Google API.

I created a Google Event with Title = "Planeringsmöte för ansökan VR 20/3 kl 18.00 ca" and Where = "Planeringsmöte för ansökan VR 20/3 kl 18.00 ca" and Content = "Planeringsmöte för ansökan VR 20/3 kl 18.00 ca "

I did an HTMLEncode on these values of Title, Where and Content before
inserting in Google

Now the event is sucessfully created at Google but with a problem. When i see
the agenda or any Week/Month view etc. these characters are being rendered as :

" Planeringsm&#246;te f&#246;r ans&#246;kan VR 20/3 kl 18.00 ca"

but when i go to edit this event, the Title and Where seems corrected to
"Planeringsmöte för ansökan VR 20/3 kl 18.00 ca" but whenever i switch to any
other view i again see them Encoded.

Please help