How toLowerCase and isLowerCase Interact: Not a Bug.

by Charles Miller on September 9, 2003

Cedric thinks he's found 64,000 bugs in the Java Character class. He ran the following test against all the available chars, and found there were about 64,000 characters that were not lower-case after you've called toLowerCase() on them:

if (! Character.isLowerCase(Character.toLowerCase(i)))

If you want to really trip out, try this test:

if (Character.isUpperCase(Character.toLowerCase(i)))

On my Mac that still returns 63 matches. There are 63 characters that are still upper-case after you've called toLowerCase() on them!

This is not a bug. The Javadoc for the Character Class explains that the toLowerCase() method does not necessarily return a lower-case letter. It returns either a lower-case letter, or the original letter if it has no lower-case equivalent.

In addition, the definition of upper-case in the same Javadoc includes three classes of characters: those with lower-case equivalents, and those marked either "Capital Letter" or "Capital Ligature" in the Unicode spec. Thus, you can toLowerCase() a character, and have it still be upper-case, because it is one of the Unicode characters that is upper-case, but has no lower-case equivalent to convert to.

What this does mean is that if your application makes an assumption as to the alphabet its users are going to be using (in this case, the latin alphabet assumption that all alphabetic characters will be either upper- or lower-case), you're going to fall over when someone starts using a different alphabet.

Previously: It's Better to Beg Forgiveness...

Next: Basic Mathematics