One of the truly fun things about writing an application that runs against multiple databases is the heady joy of having to test said application against every database it claims to work with.
For the last day or two, a single test in our nightly build was failing against Oracle. Only against Oracle. The test in question creates three blog posts and checks that the "previous" and "next" links work between them. When running the test against Oracle, you could traverse forwards, but the link to go back wasn't there.
I ran the setup portion of the test manually, and somehow each post's "back" link pointed to the post you were viewing rather than the previous post. "Curious", I thought, and restarted the test server so I could attach a remote debugger.
Sod's law in full effect, the act of restarting the server in debug mode made the problem go away. All the links had magically reverted to pointing where they were supposed to point.
The query to find the previous post was mind-numbingly simple. You take the creation date of a post, and find the first post with a lesser date. We'd had problems with this sort of thing before - Hibernate maps dates to Oracle's DATE type (apparently changed to TIMESTAMP for Hibernate3) which only has a one-second granularity, but it's not a practical problem and the tests make sure the posts are spaced out sufficiently.
And anyway, if it was the data, why would restarting the application server make the problem go away? It must be something we're cache...
Just before we store a blog post, we set its creation date. Like any Java date, this is captured with millisecond precision. However, Oracle isn't storing our dates with the same precision, and zeroes the last few digits. So java.util.Date thinks the time is 1121235471964, but Oracle records it as 1121235471000.
Hibernate then shoves the original BlogPost object in the second-level cache. When you look up the post in future requests, you're getting back the original, millisecond-precise date from the cache. You pass that date back to the database to look for the previous blog post. Oracle is told to look for the first post before 1121235471964, a position for which 1121235471000 obviously qualifies. You then get back the post you started with.
When you restart the server the cached date goes away and you're left with only Oracle's version of things, so all your queries are back to normal.
The lesson, of course, is not that Hibernate sucks, or that Oracle sucks, or that Charles can't code his way out of a paper bag (although the fundamental truth is that all software sucks, and by corollary, all programmers suck). In the end, it's just a long-winded way to agree with Joel about leaky abstractions.
Code generation tools which pretend to abstract out something, like all abstractions, leak, and the only way to deal with the leaks competently is to learn about how the abstractions work and what they are abstracting. So the abstractions save us time working, but they don't save us time learning.