atom feed11 messages in org.mozilla.lists.support-thunderbirdRe: Character Encoding
FromSent OnAttachments
Bob HensonJan 13, 2007 2:48 am 
NirJan 13, 2007 3:14 am 
Bob HensonJan 13, 2007 3:48 am 
Brian HeinrichJan 13, 2007 11:41 am 
Bob HensonJan 13, 2007 12:07 pm 
Tony MechelynckJan 14, 2007 4:23 am 
Bob HensonJan 14, 2007 7:32 am 
Brian HeinrichJan 14, 2007 2:38 pm 
Tony MechelynckJan 14, 2007 8:43 pm 
Tony MechelynckJan 14, 2007 9:10 pm 
Bob HensonJan 15, 2007 12:07 pm 
Subject:Re: Character Encoding
From:Bob Henson (ne@galenx.org.uk)
Date:Jan 15, 2007 12:07:13 pm
List:org.mozilla.lists.support-thunderbird

Tony Mechelynck wrote

Bob Henson wrote:

Tony Mechelynck wrote

Bob Henson wrote:

Nir wrote

Bob Henson wrote:

From years of reading this newsgroup, I've seen may messages concerning character encoding, but I have to confess I still haven't got the faintest idea what it's all about - certainly not when related to Thunderbird, anyway. I've recently been forced to take an interest by a spate of messages where the characters display incorrectly and the character encoding settings don't appear to behave as I would except (from my position of ignorance, that is :-) ). The question I'm rambling my way round to is this - is there a good reference source which explains the whys and wherefores of character encoding with Thunderbird in mine? I've had a look round the knowledge base etc., but it doesn't seem to explain things in a (simple?) manner that I can understand.

Regards,

This article may be easy : http://en.wikipedia.org/wiki/Character_encodings

Thanks, I've bookmarked that for future reference too. It certainly explains the coding in detail, which will help - now I've got to find out how that applies to Thunderbird in actual practice.

Regards,

It applies to Thunderbird inasmuch as Thunderbird needs to be able to represent in a readable fashion incoming messages that may be encoded using any character encoding in use on the Internet, and also to encode outgoing messages (written in any language) in a fashion suitable for transport over the Internet and display on the addressee's computer.

When writing a message (which could be in English, in French, in Arabic, in Chinese, or in any other language), you can select an encoding in "Options => Character encoding" (in the Compose window). If, when you click Send, the encoding chosen is not appropriate for the text of the message, Thunderbird will prompt you, asking if you want to transmit the message in UTF-8 (a "Unicode" or "universal" encoding which can represent the characters of any language known to man). You can either accept the suggestion, or "Cancel" and select a different encoding manually in the menu.

When reading a message sent to you by someone else, it may happen (if the charset is not properly described in the message headers) that some or all of the text appears as gibberish in your mailer. In that case you can try to select a more appropriate encoding by trial-and-error, using "View => Character encoding" (in the Message window).

This is the side of things that is of immediate concern. Yesterday, for example, I had a message from a utility company which had a Content-Type UTF-8 - however all the apostrophes displayed as ’ . I played around with the Thunderbird options and it made no difference whether or not I set UTF-8 or Western ISO-8859-1 as the default, nor whether I set Thunderbird to force display in the selected code or not. The only way to read it correctly was to use View > Character Encoding to force UTF-8. If I set UTF-8 as the INBOUND default and force it's use, then the UK £ sign doesn't display correctly. My interest in the settings is MAINLY to find out how I can avoid having to reselect the encoding manually everytime. To determine whether Thunderbird has some non-standard or buggy ways of displaying the characters (it had never happened with other mail clients) I felt I ought to know a bit more about what is going on. If I try to send this message now, Thunderbird throws up a message saying some characters may not display correctly if I send it in this character encoding, and that I should use UTF-8 for the best chance of it displaying correctly (that's what I'll do, so apologies if it doesn't display well). I am coming to the conclusion that Thunderbird's handling of character encoding is flawed or unnecessarily complicated somewhere - I've never had these problems with Outlook, OE6 or Agent.

Regards,

Normally, you should set Thunderbird to trust the headers and not force any character encoding on inbound mail (i.e. [Preferences] => Display => Fonts... => untick "Apply the default character encoding to all incoming messages". Depending on your version of Thunderbird, the name of the setting, and how to reach it, may be slightly different.) This will work OK for (in my experience) about 99% of the messages. The remaining few have (again, in my experience) their headers mis-set, and View => Character Encoding... will work for them, though in some cases you may have to go through several guesses.

Similarly, I have found it "usually better" not to force any encoding on outbound mail, but to reply in the original encoding by default, to write new mail in the default encoding (such as ISO-8859-1 or maybe ISO-8859-15), and to handle exceptions through the Options => Character Encoding... menu.

Your message seems to be correctly formatted: I see "apostrophes displayed as" a-circumflex euro trade-mark (which I suppose is what you intended) and I see the pound sign as a pound (sterling) sign. A few experiments show that the trademark sign is compatible with neither ISO-8859-1 (Latin1) nor ISO-8859-15 (Latin9) but exists in Windows-1252 (the proprietary encoding often used on Windows to represent text in "Western" languages). Writing ’ in that encoding then reading it back as if it were UTF-8, gives the codepoint U+2019 RIGHT SINGLE QUOTATION MARK which displays as a left-concave apostrophe. Thus it seems that for some reason your Thunderbird was trying to read a UTF-8 closing quote as if it had been in Windows-1252.

Best regards, Tony.

It would seem from what you say that my settings here should give maximum compatibility. The problems I have been getting are most likely down to messages with the headers mis-set, as you say. I wouldn't know how to recognise that, but it would also seem that I can't rule out Thunderbird having a problem too. The message we mentioned specifically above definitely had the UTF-8 designator in the headers, and Thunderbird was also set correctly, so I still can't say where the problem lies.

The answer, I suppose, is to use the "View" settings if the distortion warrants it, but it most likely won't merit it - it rarely stops me reading the sense of the message. So long as I know I'm not doing anything silly here I'm not too worried anyway. If there is a problem with Thunderbird I'll probably have to live with that too - if it takes as long as the PGP formatting problems to get fixed (over three years and still not done) I wouldn't like to hold my breath for a fix :-)

Thanks for your (and everyone else's) help. "A little help", as my Grandma used to say, "is worth a lot of pity"!

Regards,

Bob