February 10, 2006

A Slice of Life

An in-office discussion of Java’s Unicode support.

Jeremy: How many characters does Unicode support anyway?

Charles: It’s effectively unlimited. Although because Java only supports two-byte characters, you can only use the first 65,000 or so natively.

(A short discussion of the difference between UCS-2 and UTF-16 encodings ensued)

Matt: So if you wanted to translate Confluence into Klingon, you’d be out of luck.

Charles: Yes, but Klingon isn’t officially a part of Unicode, so you’d have to come up with your own encoding anyway.

Charles: ISO-8859-GARKH!

Everyone: …

Charles: I don’t think I’ve ever felt more like a nerd than I did in that moment.

Posted to nerd at February 10, 2006 11:40 AM
Comments currently disabled due to spam. If you want to comment on a post, email me, and I'll try to incorporate your feedback somehow.
Trackbacks <http://fishbowl.pastiche.org/mt-tb.cgi/700>
Comments

... and then as if to prove the point, you blog about it ;-).

Posted by: Robert at February 10, 2006 01:43 PM (#link)

Sweet holy flying underpants! That was definitely some nerdy shit you came out with. I presume you killed all of your collegues after that to protect your honour?

Posted by: Johnny K at February 10, 2006 09:02 PM (#link)

No, we're all still alive. Just a little bit freaked out. :)

Posted by: Jeremy Higgs at February 10, 2006 11:37 PM (#link)

Apparently, you are but a pale reflection in geekdom of the person who actually tried to get Klingon into unicode officially, failed, and created their own user encodings: (from http://en.wikipedia.org/wiki/Klingon_language)

"In September 1997, Michael Everson made a proposal for encoding this in Unicode. The Unicode Technical Committee rejected the Klingon proposal in May 2001 on the grounds that research showed almost no use of the script for communication, and the vast majority of the people who did use Klingon employed the Latin alphabet by preference. Everson created a mapping of pIqaD into the Private Use Area of Unicode, which he listed in the ConScript Unicode Registry (U+F8D0 to U+F8FF see here and here). Since then several fonts using that encoding have appeared, and software for typing in pIqaD has become available. As a result, blogs in pIqaD have begun to appear, raising the possibility of reapplying for inclusion in Unicode when there is a sufficient corpus. Existing text in Romanization can easily be converted to pIqaD also."

So maybe it's just a matter of time then...

Posted by: Rob Meyer at February 11, 2006 02:55 AM (#link)

Duh...didn't notice that you already linked to the rejected code set page...

Posted by: Rob Meyer at February 13, 2006 04:50 PM (#link)
Post a comment
(Real Names, Please)





(Leave blank if you want)


Remember personal info?

(HTML will be stripped, URLs automagically turned into links)