Unicode explained
Posted on October 16, 2003
Filed Under /dev/null/ | 54 views |
The members of the RbNUG are on a roll today. Just posted: a pointer to the absolutely excellent “The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)” by Joel “Joel on Software” Spolsky explaining Unicode, text encoding,why there is no “high ASCII”, and why you should care:
I’ve been dismayed to discover just how many software developers aren’t really completely up to speed on the mysterious world of character sets, encodings, Unicode, all that stuff. A couple of years ago, a beta tester for FogBUGZ was wondering whether it could handle incoming email in Japanese. Japanese? They have email in Japanese? I had no idea. When I looked closely at the commercial ActiveX control we were using to parse MIME email messages, we discovered it was doing exactly the wrong thing with character sets, so we actually had to write heroic code to undo the wrong conversion it had done and redo it correctly. When I looked into another commercial library, it, too, had a completely broken character code implementation. I corresponded with the developer of that package and he sort of thought they “couldn’t do anything about it.” Like many programmers, he just wished it would all blow over somehow.
…
In this article I’ll fill you in on exactly what every working programmer should know. All that stuff about “plain text = ascii = characters are 8 bits” is not only wrong, it’s hopelessly wrong, and if you’re still programming that way, you’re not much better than a medical doctor who doesn’t believe in germs. Please do not write another line of code until you finish reading this article.
REALbasic does an excellent job of making text encoding handling fairly trivial for the developer (once you’ve dug you way through the docs and given it a try or two). I’ve come to realize in reading this article that while I knew how to implement text encoding properly I didn’t really know why. Why is very important.
Comments
Leave a Reply