The Curse of '

July 1, 2003 6:04 PM

(Issue ROL-216)

Dear Roller maintainers. ' is not a valid HTML entity reference. The definitive list of HTML entity references is here, and ' is not on it.

' was introduced as a standard entity in XML, and thus is also standard in XHTML. Even if you are using XHTML, if you wish to produce web-pages that are backwards compatible with browsers that do not support XHTML (and IE is one of them), you should avoid '.

If you're desperate, you can use ' instead. (See also: the backwards-compatibility section of the XHTML standard)

Even if you're serving valid XHTML with an XML DOCTYPE, there is still significant controversy as to whether user-agents should handle it as XML unless it is also served with the text/xml MIME-type (which would cause IE to display the page as a parse-tree)

Most of the time, it is fine to leave apostrophes unescaped in XML. XML's escaping rules are such that if some character's meaning is unambiguous, it need not be escaped. For example, if you have the tag <foo>, you only need to escape it as &lt;foo>. By just escaping the less-than sign, you make the meaning of the greater-than unambiguous, and therefore it need not be escaped itself. Since apostrophes have no special meaning outside of tags, they need not be munged in regular text.

As an aside, it's amusing to see IE penalised for following the standard. :)

5 Comments

amen brutha! I wondered why those stupid ' crap things keep appearing in comments, and its my fualt! just because I use an evil browser...


From what I understand there is no text/xml MIME type. It's application/xml only.

-Russ

Actually, the escaping has nothing to do with XML. It was a part of a general "escape HTML" process mostly targetted at cross-site scripting (I don't recall the details of why ' and " are escaped). I've commented out the conversion for now, we'll see what the other Roller committers have to say.

And we wanted to punish IE users. ;-)

argghhh... I grotted! [my fautl] sorry everyone ;-)

I've written applications which changed apostophes into APOS tags (and other special characters into other tags) because these strings were safer to pass around near my MySQL databases. (Think of editing a message on a web page, previewing it, adding attachments, previewing it again, reading it off the database etc. etc. etc.) By filtering and "escaping" ALL my web application input, I have so far avoided the kind of SQL smashing nonsense that afflicts many other database-driven web applications out there ;)

At one point, I swear the IE+APOS thing worked. (Maybe five years ago when I started doing this kind of thing, when my primary test platform was IE.) I've switched to #XX since instead.

Previously: From the Hyperbole to the Ridiculous

Next: A Social Hack for Online Polls