Monday, May 16, 2005

 

NULLs in Sybase ASE

Here is where my story begins. For potential performance reasons, I was trying to change a "where" clause that looked like this:

ta.b & isnull (tb.b, ta.b) = isnull (tb.b, ta.b)

to one that looked like this:

ta.b & tb.b = tb.b

(ta and tb are different tables)

But I wasn't sure if this would work if tb.b was null. Would ta.b & null = null? So I did this test:

declare @a int
declare @b int
select @a = null
select @b = 1
select @b where (@b & @a = @a)
go
-- This returns 1 result.

This means @b & @a = @b is true if @a is null. So I thought I could go ahead. But I was wrong! Observe:

create table TableA (a int, b int null)
insert into TableA values (1, 1)

create table TableB (a int, b int null)
insert into TableB values (1, null)

select ta.b from TableA ta, TableB tb
ta.b & isnull (tb.b, ta.b) = isnull (tb.b, ta.b)
go
-- This returns 1 result

select ta.b from TableA ta, TableB tb
ta.b & tb.b = tb.b
go
-- This returns 0 results

What is the explanation for this?

The quick answer is that Sybase ASE treats NULLs differently in search clauses (the first case) than in join clauses (the second case, when you use a "where").

Search clauses: NULL = NULL
Join clauses: NULL = nothing, not even itself

Where does this come from? Oddly enough, from the ANSI SQL-92 Standard itself. The following quote is from Sybooks (look for the description of the "ansinull" configuration parameter):

"The SQL standard requires that if either one of the two operands of an equality comparison is NULL, the result is UNKNOWN. Transact-SQL treats NULL values differently. If one of the operands is a column, parameter, or variable, and the other operand is the NULL constant or a parameter or variable whose value is NULL, the result is either TRUE or FALSE."

Incidentally, "ansinull" is a Sybase ASE configuration parameter that will toggle the treatment of NULLs from ANSI to Sybase ASE.

ANSI: NULL = NULL is NULL, which is neither TRUE nor FALSE, it is UNKNOWN
Sybase ASE: NULL = NULL is TRUE

For example, consider TableB above and run this query:

select * from TableB where b <> "1"

ANSINULL = ON: This returns 0 results (ie. not the NULL row)
ANSINULL = OFF: This returns 1 result (ie. the NULL row)

If you want to get a result, but don't want to mess with ANSINULL, here are two approaches:

select * from TableB where b != 1 or b IS NULL
select * from TableB where isnull(b,0) != 1

Also make note that in Sybase ASE, an empty string doesn't exist, because that is the string representation of NULL.

So what is a guy to do?

The answer is to remember that NULLs don't always behave as you'd expect, and to follow these guidelines:
1. Use functions like ISNULL and NULLIF, and even COALESCE
2. Don't use = or != with NULL, use 'IS NULL' and 'IS NOT NULL' instead

ISNULL: Takes two values, using the second one if and only if the first one is NULL
NULLIF: Takes an expression and a value, returns NULL if the expression is true, otherwise the value
COALESCE: Takes a list of at least 2 values, and returns the first one that is NOT NULL

Comments:
I was researching this subject today (because I have a colleague who claims using NULLs is very inefficient - for which I have found *no* evidence) and I also ran across this article too - http://www.aspfaq.com/show.asp?id=2073#. Acknowledging that your blog is a simple balanced report, while the other article is really down on NULLs, with all due respect, I sort of don't understand why it's not more intuitive to more people. You summarized the difference between WHERE/JOIN ON nicely - and when you think about what we do with our SQL it makes sense: obviously one should be able to SELECT all rows WHERE a column(s) are NULL. But when we're joining tables, isn't it always on some type of significant ID/key? In this normal case, rarely would you have the joiner column value be NULL. And in cases where those columns do allow NULLs, I can't think of a situation where you'd want to find "the" row in another table that has NULL for its JOIN column, unless it's to provide some default value in a lookup table for "unknown", but that isn't reason enough (at least for me).
 
Thanks, that's an excellent link. To quote the author:

"My suggestion is to use a default value in every case that makes sense, and declare all such columns explicitly as NOT NULL."

"play it safe and ALWAYS explicitly declare NULL or NOT NULL"

Those are both good suggestions.

I agree that NULLs are confusing. But the solution is not to avoid things that are confusing, but rather to understand them, and then use them properly.

And, even if you feel you do understand them, TEST your assumptions.
 
Hi Robert - this is a slow conversation...anyway I've been
doing RDBMs for a loooong time, and
if you go back to the sources you may discover why NULLs are so confusing. I'd agree they should not be avoided simply because they
are confusing, but there are actually very good (and controversial) reasons to avoid
using NULLs. And it stems from the
fact that they "aren't" - they aren't even "no value"; so how can one make meaningful comparisons.

It's a core issue that Codd and Date disagreed on - and they didn't disagree on much. Codd
felt they had to be grudgingly allowed as a practical matter. Date felt it badly undermined the
relational model.

It is fashionable these days to pretend people who understand relational theory are kooks who should not be allowed near a "SQL Server"; but Date shows a very interesting excercize in his book "Database In Depth" in which
he explains why a pure model prohibits NULLs, and more interestingly how easy it is to
NOT use NULLs by following certain
table design ideas.

That said, I now have to sign off, because I need to get back to my code that is broken because I'm
misunderstanding making comparisons to NULL in the query!
Seriously, I don't think "reality" is a good excuse to use NULLs if you have that control, but when you're maintaining existing dbs, that's a whole other story - so we need to understand how each vendor handles this notion of "not true/not false" (which is the essense of the issue - things in a database are assertations of "truth", absense asserts "False". So what's a null?
a third possible logic value, which can mean "missing", "does not apply", "I don't know yet", "who cares"....the real problem is knowing what is meant by it.

Roger Reid
Senior Systems Analyst DPW/NY
 
This comment has been removed by a blog administrator.
 
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?