Seen on Twitter today:

This is one of those things that always seems un-intuitive to developers who aren't familiar with SQL. I've heard a lot of explanations for it over the years, but this is probably the most succinct I've come across, thanks to Matt Ryall on the Atlassian intranet.
…there's actually a reason for this distinction. They're not equivalent at all, because NULL isn't a value that you can be equal to. It represents a value that is undefined.
For example, if my date-of-birth in a database is NULL (because it's unknown) and so it yours, they shouldn't be considered the same. They're two different, undefined values. A query where this might happen is:
select s1.name, s2.name from staff s1, staff s2 where s1.id != s2.id and s1.dob = s2.dob
Or, in the previous example, there's a difference between records where the thru_date is definitely not equal to the from_date, and ones where you just don't know.
The evil part is where this behaviour becomes an option that can per changed with a database wide setting (i.e. ANSI Nulls).
Then people rely on one behaviour and assume it should be always true.
Better that the SQL standard should address this an have allowed an abbreviated syntax to achieve the null = null case, which is often valid.
Another way of thinking about this is that in SQL, logic is ternary, not binary: a comparison can result in TRUE, FALSE, or UNKNOWN [Null]. NULL isn't just a data value, it's also a logical value. And in most implementations, WHERE will include the instances that are TRUE, and exclude the instances that are FALSE or UNKNOWN.
One *really* evil scenario is where you have a compound key that can be partially NULL. For example, suppose the primary key of your table is composed from First Name, Middle Name, Last Name, suffix (Jr., III, etc.) and that combination must be unique -- let's pretend we're Actor's Equity and this is their membership database, OK? Actors can have the same first name, middle name or last name, of course, but no two Equity members can have the same complete name. It's a rule. Lots of actors, of course, have no middle name, so it's going to be NULL. And we'd like the Unique Index on Name to allow, for instance, both William H. Macy and William Macy, and recognize that they are different. At the same time we'd also like it to disallow having two "William NULL Macy" keys because they are the same. Unfortunately, the creators of SQL didn't foresee this particular scenario, and it's a complete crapshoot as to what any particular implementation will do. Some will allow you to have two "William NULL Macy" records because, technically, they're not equal -- they're unknown. Others will take the more pragmatic approach and treat NULLs in a compound key as a special case.
A lot of this comes about because NULL seemed like a good idea at the time (kind of like Not A Number in the ANSI FP spec...) but in practice it gets used for lots of different kinds of "missing data":
-- Not Allowed (some subtypes of entity have a value, some don't)
-- Missing (should be there, it's a real-world problem that it isn't)
-- Not Known (should be there, we just don't have it yet)
-- Optional (this entity happens not to have one)
-- etc. etc.
And unfortunately the semantics that we would really want are different in different cases.
Now don't even get me started on the difference between the empty string and the NULL string...
I know all this stuff, but that doesn't make NULL any less of a problem. It's easy to forget to account for a NULL, especially when the bug is that there's some rows not returned for a query.
I try very hard to come up with good defaults for columns rather than use NULLs, whenever possible. It simplifies my SQL greatly, as well as simplifying code that consumes the data from the database.
Good old ternary logic!
@Carl: in that case, wouldn't setting the middle and/or last names to the empty string for people with only one or two (rather than three) names suffice? Unlike NULL, it's well-defined, can be compared against, etc.
"This is one of those things that always seems un-intuitive to developers who aren't familiar with SQL."
If it's un-intuitive, that immediately raises the question: what is its benefit?
It makes all the SQL semantics much more complex. Even if it were only used for ternary logic (which it is certainly not -- consider OUTER JOIN or aggregates), that alone makes the logic system so complex that basic tautologies no longer hold (for instance, "p or NOT p" is not always true).
It (almost) doesn't matter how long you've been using SQL. When things get complex, humans rely on these basic tautologies for reasoning.
@Carl: That example uses NULL incorrectly. NULL doesn't assert "this column contains no value," it asserts "the value in this column is indeterminate." If you know the value of a column, you set its value, you don't leave it at NULL.
This is why SQL Server (for instance) doesn't allow any of the components of a primary key to be nullable: if you don't know enough about a row to assign the values that uniquely identify it, you don't know enough about it to insert it into your table.
> If it's un-intuitive, that immediately raises the question: what is its benefit?
If you can't think of one, just use NOT NULL everywhere and you'll never run into this problem.
If I want to make a list of all my friends who have the same birthday as one of my other friends, can I do this if I don't know everyone's birthday? It should be pretty intuitive that the answer is no.
An argument can be made that a database *shouldn't allow* queries like that. But I suspect that if that were a configurable option everyone would turn it off.
Null is not value. It is not zero (which is a value). It is not comparable to to anything. Any "if" or comparison that involves a null is automatically FALSE because the inability for it to be compared it to anything....
That is no reason for from_date <> thru_date to return false. Er, NULL.
In a way, it would make more sense if NULL <> NULL always returned true. That would be more in line with what you say.
compound the evil!
WHERE TABLE1.COL1 || 'X' = TABLE2.COL1 || 'X'
reduces it to a binary compare in Oracle.
@Paul: "Null is not value."
You may be able to make that claim for outer joins and aggregates.
But SQL also uses NULL for 3VL -- 3 _valued_ logic. Meaning that NULL is the third truth _value_.
@Paul: "Any "if" or comparison that involves a null is automatically FALSE because the inability for it to be compared it to anything."
That statement is false. Comparisons involving NULL (most, anyway...) result in NULL, not FALSE. It so happens that in predicates (e.g. WHERE and HAVING clauses) NULL is treated as "false-like" and the tuple is omitted from the result. But in constraints (e.g. CHECK), NULL is treated as "true-like", and the constraint passes.
@Bart Lateur: "In a way, it would make more sense if NULL > NULL always returned true."
The above statement assumes that the following statement is a tautology:
p = q OR p != q
which doesn't hold for 3VL.
You have the right approach here: you're analyzing NULL semantics for internal consistency, and finding flaws. At minimum, NULL semantics mean that tautologies that you expect are now invalid.
With all of the obvious confusion in this thread, I think it's perfectly reasonable to ask what purpose NULL semantics serve. They violate basic tautologies that humans use for reasoning and there is nothing intuitive about their behavior in a complex case.
On top of all that, they are treated differently in different contexts. OUTER JOIN produces NULLs even if there are no NULLs in the database at all, so clearly NULL cannot mean "unknown" in that case. But SQL then treats NULL according to 3VL semantics in statements like "p > q".
Hello. I think the article is really interesting. I am even interested in reading more. How soon will you update your blog?