Thread: (FAQ?) JOIN condition - 'WHERE NULL = NULL'

(FAQ?) JOIN condition - 'WHERE NULL = NULL'

From
"Ian Sillitoe"
Date:
This is probably a stupid question that has a very quick answer, however it would be great if someone could put me out of my misery...

I'm trying to JOIN two tables (well a table and a resultset from a PL/pgsql function) where a joining column can be NULL


-- for a given id, return a bunch of columns that I can use for joins

psql> select * from get_colnames_for_id('1.10.8');


 depth1 | depth2 | depth3 | depth4 | depth5 |

--------+--------+--------+--------+--------+
      1 |     10 |      8 |        |        |
(1 row)

--
psql> select * from table where depth1 = 1 AND depth2 = 10 AND depth3 = 8 AND depth4 IS NULL;

id | depth1 | depth2 | depth3 | depth4 | depth5 |        name          |

---+--------+--------+--------+--------+--------+----------------------+
1  |      1 |     10 |      8 |        |        | name for node 1.10.8 |

(1 row)

-- I (wrongly) expected the following to be equivalent

psql> select c.* from get_cathcode('1.10.8') c JOIN table t USING(depth1, depth2, depth3, depth4);

id | depth1 | depth2 | depth3 | depth4 | depth5 |   name    |

---+--------+--------+--------+--------+--------+-----------+
(0 rows)

-- Whereas the following works...

psql> select * from get_colnames_for_id('1.10.8.10');

 depth1 | depth2 | depth3 | depth4 | depth5 |

--------+--------+--------+--------+--------+
      1 |     10 |      8 |    10  |        |
(1 row)

psql> select c.* from get_cathcode('1.10.8.10') c JOIN table t USING(depth1, depth2, depth3, depth4);

id | depth1 | depth2 | depth3 | depth4 | depth5 |          name           |

---+--------+--------+--------+--------+--------+-------------------------+
2  |      1 |     10 |     8  |     10 |        | name for node 1.10.8.10 |
(0 rows)



So, I'm currently assuming from all this that joining on t1.col = t2.col doesn't make any sense when t1.col and t2.col are both NULL - since:

psql> SELECT (NULL = NULL) IS TRUE;
 ?column?
----------
 f
(1 row)

psql> SELECT (NULL IS NULL) IS TRUE;
 ?column?
----------
 t
(1 row)

Unless I've missed something, the docs on http://www.postgresql.org/docs/8.1/static/functions-comparison.html seem to suggest that the concept is an example of bad programming and the workaround (of switching on the 'transform_null_equals' config) is a hack. Is this all true or did my logic just get screwed up at some point? Unless I've just missed something obvious, it seems useful to be able to join two tables based on a condition where they share a NULL column - is there another way of doing this?

Many thanks,

Ian


Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'

From
"Ian Sillitoe"
Date:
Apologies - just reread my post and I've confused matters with typos during the abtraction of my code. For the purposes of the example given, please read "get_colnames_for_id()" rather than "get_cathcode()"...

Cheers,

Ian

On Wed, Apr 2, 2008 at 5:49 PM, Ian Sillitoe <ian.sillitoe@googlemail.com> wrote:
This is probably a stupid question that has a very quick answer, however it would be great if someone could put me out of my misery...

I'm trying to JOIN two tables (well a table and a resultset from a PL/pgsql function) where a joining column can be NULL


-- for a given id, return a bunch of columns that I can use for joins

psql> select * from get_colnames_for_id('1.10.8');


 depth1 | depth2 | depth3 | depth4 | depth5 |

--------+--------+--------+--------+--------+
      1 |     10 |      8 |        |        |
(1 row)

--
psql> select * from table where depth1 = 1 AND depth2 = 10 AND depth3 = 8 AND depth4 IS NULL;

id | depth1 | depth2 | depth3 | depth4 | depth5 |        name          |

---+--------+--------+--------+--------+--------+----------------------+
1  |      1 |     10 |      8 |        |        | name for node 1.10.8 |

(1 row)

-- I (wrongly) expected the following to be equivalent

psql> select c.* from get_cathcode('1.10.8') c JOIN table t USING(depth1, depth2, depth3, depth4);

id | depth1 | depth2 | depth3 | depth4 | depth5 |   name    |

---+--------+--------+--------+--------+--------+-----------+
(0 rows)

-- Whereas the following works...

psql> select * from get_colnames_for_id('1.10.8.10');

 depth1 | depth2 | depth3 | depth4 | depth5 |

--------+--------+--------+--------+--------+
      1 |     10 |      8 |    10  |        |
(1 row)

psql> select c.* from get_cathcode('1.10.8.10') c JOIN table t USING(depth1, depth2, depth3, depth4);

id | depth1 | depth2 | depth3 | depth4 | depth5 |          name           |

---+--------+--------+--------+--------+--------+-------------------------+
2  |      1 |     10 |     8  |     10 |        | name for node 1.10.8.10 |
(0 rows)



So, I'm currently assuming from all this that joining on t1.col = t2.col doesn't make any sense when t1.col and t2.col are both NULL - since:

psql> SELECT (NULL = NULL) IS TRUE;
 ?column?
----------
 f
(1 row)

psql> SELECT (NULL IS NULL) IS TRUE;
 ?column?
----------
 t
(1 row)

Unless I've missed something, the docs on http://www.postgresql.org/docs/8.1/static/functions-comparison.html seem to suggest that the concept is an example of bad programming and the workaround (of switching on the 'transform_null_equals' config) is a hack. Is this all true or did my logic just get screwed up at some point? Unless I've just missed something obvious, it seems useful to be able to join two tables based on a condition where they share a NULL column - is there another way of doing this?

Many thanks,

Ian



Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'

From
Martijn van Oosterhout
Date:
On Wed, Apr 02, 2008 at 05:49:37PM +0100, Ian Sillitoe wrote:
> This is probably a stupid question that has a very quick answer, however it
> would be great if someone could put me out of my misery...
>
> I'm trying to JOIN two tables (well a table and a resultset from a PL/pgsql
> function) where a joining column can be NULL

You can't, NULL is not a value like other values.

> Unless I've missed something, the docs on
> http://www.postgresql.org/docs/8.1/static/functions-comparison.html seem to
> suggest that the concept is an example of bad programming and the workaround
> (of switching on the 'transform_null_equals' config) is a hack. Is this all
> true or did my logic just get screwed up at some point? Unless I've just
> missed something obvious, it seems useful to be able to join two tables
> based on a condition where they share a NULL column - is there another way
> of doing this?

'transform_null_equals' won't help you at all here since it only help
in the very specific case of comparing with a constant. The easiest is
to think of NULL as meaning 'unknown'. Clearly you can't copare that
usefully with anything.

Perhaps you can use a marker like -1 to acheive the effect you want?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment

Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'

From
Craig Ringer
Date:
Ian Sillitoe wrote:
> This is probably a stupid question that has a very quick answer, however it
> would be great if someone could put me out of my misery...
>
> I'm trying to JOIN two tables (well a table and a resultset from a PL/pgsql
> function) where a joining column can be NULL
>
Sounds like you might want something like:

SELECT * FROM tablea INNER JOIN tableb ON (NOT tablea.id IS DISTINCT
FROM tableb.tablea_id_fk);

which can also be written as:

SELECT * FROM tablea, tableb WHERE NOT tablea.id IS DISTINCT FROM
tableb.tableid_id_fk ;

There's been lots of recent discussion of IS DISTINCT FROM, which is why
it comes straight to mind.

If that's not what you meant (by NULL = NULL) then might you be looking
for an OUTER JOIN ?

--
Craig Ringer

Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'

From
"Ian Sillitoe"
Date:
Thanks for the reply - after a bit more poking around it seems that:

t1.col IS NOT DISTINCT FROM t2.col

should work - although I guess this means an upgrade from 8.1 to 8.3

Cheers,

Ian

On Wed, Apr 2, 2008 at 6:23 PM, Martijn van Oosterhout <kleptog@svana.org> wrote:
On Wed, Apr 02, 2008 at 05:49:37PM +0100, Ian Sillitoe wrote:
> This is probably a stupid question that has a very quick answer, however it
> would be great if someone could put me out of my misery...
>
> I'm trying to JOIN two tables (well a table and a resultset from a PL/pgsql
> function) where a joining column can be NULL

You can't, NULL is not a value like other values.

> Unless I've missed something, the docs on
> http://www.postgresql.org/docs/8.1/static/functions-comparison.html seem to
> suggest that the concept is an example of bad programming and the workaround
> (of switching on the 'transform_null_equals' config) is a hack. Is this all
> true or did my logic just get screwed up at some point? Unless I've just
> missed something obvious, it seems useful to be able to join two tables
> based on a condition where they share a NULL column - is there another way
> of doing this?

'transform_null_equals' won't help you at all here since it only help
in the very specific case of comparing with a constant. The easiest is
to think of NULL as meaning 'unknown'. Clearly you can't copare that
usefully with anything.

Perhaps you can use a marker like -1 to acheive the effect you want?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFH88EBIB7bNG8LQkwRAh7CAJ9ffmMnyE/OeJrTepSaOURb2WSRhACeMYql
tnrzLDVLyFfHhDqKiY02QOM=
=dhZf
-----END PGP SIGNATURE-----


Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'

From
Ted Byers
Date:
--- Martijn van Oosterhout <kleptog@svana.org> wrote:
> 'transform_null_equals' won't help you at all here
> since it only help
> in the very specific case of comparing with a
> constant. The easiest is
> to think of NULL as meaning 'unknown'. Clearly you
> can't compare that
> usefully with anything.
>
Not even a null in another record ... (hence my
question below).  If the value is unknown, then it
could be anything, and (thinking as a mathematician
considering real numbers) the probability of two
records having null having their true, but unknown
values be the same is indistinguishable from 0. (with
integers or decimal numbers or floating point numbers,
that would be qualified with the clause, for practical
purposes :)

> Perhaps you can use a marker like -1 to achieve the
> effect you want?
>
Is that really valid, though, especially in a join? I
mean, if the column in question has multiple nulls, in
each of the tables, then how do you, of the DB, figure
out which of the rows containing nulls in the one
table match up with rows in the other table containing
nulls?  Or is the resultset supposed to be the product
of the two sets (match each row with a null in the one
table with each row with a null in the other)?  That,
for me, creates a nightmare situation where some of my
tables have tens of millions of rows, and if even 1%
of the rows contains null in the relevant column, I
don't even want to think about processing the
resultset that would be produced from such an idea
using these tables.

My joins always only involve primary keys, or indeces
on columns that prohibit nulls, so this problem
doesn't crop up in my code, but would I be not too far
from right in expecting that the rational thing to do
when creating a join on columns that allow nulls is to
exclude ALL rows, in either table, where the columns
involved are null?

Cheers,

Ted

Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'

From
"Ian Sillitoe"
Date:
Ah - thanks and apologies for not finding those previous discussions. Does anyone else feel this might be useful as a point on the NULL section of the FAQ (it certainly would have saved me an afternoon)?

Cheers,

Ian

On Wed, Apr 2, 2008 at 6:24 PM, Craig Ringer <craig@postnewspapers.com.au> wrote:
Ian Sillitoe wrote:
This is probably a stupid question that has a very quick answer, however it
would be great if someone could put me out of my misery...

I'm trying to JOIN two tables (well a table and a resultset from a PL/pgsql
function) where a joining column can be NULL
 
Sounds like you might want something like:

SELECT * FROM tablea INNER JOIN tableb ON (NOT tablea.id IS DISTINCT FROM tableb.tablea_id_fk);

which can also be written as:

SELECT * FROM tablea, tableb WHERE NOT tablea.id IS DISTINCT FROM tableb.tableid_id_fk ;

There's been lots of recent discussion of IS DISTINCT FROM, which is why it comes straight to mind.

If that's not what you meant (by NULL = NULL) then might you be looking for an OUTER JOIN ?

--
Craig Ringer

Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'

From
"Brent Wood"
Date:
>>> "Ian Sillitoe" <ian.sillitoe@googlemail.com> 03/04/08 5:49 AM >>>


I'm trying to JOIN two tables (well a table and a resultset from a PL/pgsql
function) where a joining column can be NULL


In  a join, no value can be ascribed to a null field, so the equivalence fails. You can do tests like IS NULL, which
strictlyspeaking is test for meeting a condition (that of not having any value), not a test for equivalence. As
(simplistically)the condition NULL does equal the condition NULL, (NULL = NULL) is true. 

The simplest approach is perhaps to have a value which does not occur naturally (like -1), as a substitute for nulls in
therelevant columns. I believe this can be achieved via a view in your case, (pun intended :-), but which may be less
efficientif performance is an issue: 

create view depth_v as
select d.id,
          d.name,
          case when (d.depth1 is null) then -1 else d.depth1 end as depth1,
          case when (d.depth2 is null) then -1 else d.depth2 end as depth2,
          case when (d.depth3 is null) then -1 else d.depth3 end as depth3,
          case when (d.depth4 is null) then -1 else d.depth4 end as depth4,
          case when (d.depth5 is null) then -1 else d.depth5 end as depth5
from depth_table d;

You could then join against this view instead of your underlying table, eg:

select c.* from get_cathcode('1.10.8') c JOIN depth_v t USING(depth1, depth2, depth3, depth4);

The view will not have any NULL values in the depth fields, so the join should work.

see: http://www.postgresql.org/docs/8.2/static/functions-conditional.html

(Incidentally, if you are storing bathymetry or CTD data, I'd be interested in seeing your db structures, as I may be
doingsome work in that area soon :-) 


HTH,

  Brent Wood

Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'

From
"Ian Sillitoe"
Date:
I completely take your points - so maybe I should be asking for advice on database design instead.

We are annotating nodes on a hierarchical structure where NULL implied an absence of a value in this depth of the hierarchy. As a method of enforcing this view, we use constraints to make sure we only get one combination of rows for a given node of the hierarchy (whether the columns are integers or NULLs). I'm not suggesting this was 'correct', I'm just saying this was what we were thinking.

We considered using a flag of some description instead, however this also seemed a bit messy (although in hindsight probably far less messy than the job I now face trying to recode this table).

psql> select * from tree;
 id | depth1 | depth2 | depth3 |           name
----+--------+--------+--------+--------------------------
  1 |      1 |        |        | Name for node_id "1"
  1 |      1 |      1 |        | Name for node_id "1.1"
  2 |      1 |      2 |        | Name for node_id "1.2"
  3 |      1 |      2 |      3 | Name for node_id "1.2.3"
(3 rows)

psql> select * from leaf_node;
   id   | depth1 | depth2 | depth3 | depth4 | depth5
--------+--------+--------+--------+--------+--------
 leaf_1 |      1 |      2 |      3 |      1 |      1
 leaf_2 |      1 |      2 |      3 |      1 |      2
 leaf_3 |      1 |      2 |      3 |      1 |      3
 leaf_4 |      1 |      2 |      3 |      2 |      1
 leaf_5 |      1 |      2 |      3 |      2 |      2
(5 rows)

So, is there a generally accept method of modelling nodes in hierarchical structures (i.e. the 'tree' table in the example above) that doesn't use NULLs? We started off using ids such as '1.1.2' as the PRIMARY KEY but concatenating strings to go between columns and ids ended up being too painful.

Would it be sensible to use an ARRAY of depths as the primary key? I guess this would mean it would be possible to join

psql> select * from tree;
  id  | depth1 | depth2 | depth3 |           name
------+--------+--------+--------+--------------------------
[1]   |      1 |        |        | Name for node_id "1"
[1,2] |      1 |      1 |        | Name for node_id "1.1"

Advice most welcome.

Cheers,

Ian

On Wed, Apr 2, 2008 at 6:48 PM, Ted Byers <r.ted.byers@rogers.com> wrote:
--- Martijn van Oosterhout <kleptog@svana.org> wrote:
> 'transform_null_equals' won't help you at all here
> since it only help
> in the very specific case of comparing with a
> constant. The easiest is
> to think of NULL as meaning 'unknown'. Clearly you
> can't compare that
> usefully with anything.
>
Not even a null in another record ... (hence my
question below).  If the value is unknown, then it
could be anything, and (thinking as a mathematician
considering real numbers) the probability of two
records having null having their true, but unknown
values be the same is indistinguishable from 0. (with
integers or decimal numbers or floating point numbers,
that would be qualified with the clause, for practical
purposes :)

> Perhaps you can use a marker like -1 to achieve the
> effect you want?
>
Is that really valid, though, especially in a join? I
mean, if the column in question has multiple nulls, in
each of the tables, then how do you, of the DB, figure
out which of the rows containing nulls in the one
table match up with rows in the other table containing
nulls?  Or is the resultset supposed to be the product
of the two sets (match each row with a null in the one
table with each row with a null in the other)?  That,
for me, creates a nightmare situation where some of my
tables have tens of millions of rows, and if even 1%
of the rows contains null in the relevant column, I
don't even want to think about processing the
resultset that would be produced from such an idea
using these tables.

My joins always only involve primary keys, or indeces
on columns that prohibit nulls, so this problem
doesn't crop up in my code, but would I be not too far
from right in expecting that the rational thing to do
when creating a join on columns that allow nulls is to
exclude ALL rows, in either table, where the columns
involved are null?

Cheers,

Ted

Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'

From
Tom Lane
Date:
"Ian Sillitoe" <ian.sillitoe@googlemail.com> writes:
> Thanks for the reply - after a bit more poking around it seems that:
> t1.col IS NOT DISTINCT FROM t2.col
> should work - although I guess this means an upgrade from 8.1 to 8.3

Note that while this will give you the right answers, it will probably
be dog-slow because the planner has exactly zero intelligence about it.
You'd be much better advised to reconsider your table design and get
rid of the assumption that NULLs should be considered "equal".

            regards, tom lane

Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'

From
"Richard Broersma"
Date:
On Wed, Apr 2, 2008 at 11:43 AM, Ian Sillitoe
<ian.sillitoe@googlemail.com> wrote:
> I completely take your points - so maybe I should be asking for advice on
> database design instead.

There are a couple of choice you can use:

First is the PostgreSQL contribe Module Ltree:
http://www.postgresql.org/docs/current/static/ltree.html

There is the common adjancy list model:
http://www.sqlsummit.com/AdjacencyList.htm

Celko's Nested set model: { probably provides the most functionality
but with hi costs for INSERT/UPDATE ops)
http://www.intelligententerprise.com/001020/celko.jhtml

and Lastly the Path enurmation model:
http://www.onlamp.com/pub/a/onlamp/2004/08/05/hierarchical_sql.html

One of these model should hopefully be a best fit for you.

--
Regards,
Richard Broersma Jr.

Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'

From
Andrew Sullivan
Date:
On Wed, Apr 02, 2008 at 05:49:37PM +0100, Ian Sillitoe wrote:
> Unless I've missed something, the docs on
> http://www.postgresql.org/docs/8.1/static/functions-comparison.html seem to
> suggest that the concept is an example of bad programming and the workaround
> (of switching on the 'transform_null_equals' config) is a hack. Is this all
> true or did my logic just get screwed up at some point?

Nope, it's all true.

The problem you are having is that one NULL is not the same as (and is not
not the same as) another NULL.  NULL values are not equivalent to any other
value, incuding other NULLs.  This is why some database people don't like to
allow NULLs at all.

> Unless I've just
> missed something obvious, it seems useful to be able to join two tables
> based on a condition where they share a NULL column - is there another way
> of doing this?

. . .AND t1.column IS NULL AND t2.othercolumn IS NULL.

Alternatively, you can use coalesce and join on some value, like this:

. . .AND coalesce(t1.column, 0) = coalesce(t2.othercolumn,0);

This is a bit of a hack, and won't work in every case (if you don't have a
value that you know can't be in either table, you're out of luck).

A

Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'

From
"Ian Sillitoe"
Date:
Thanks very much for this and to all the reponses. As a side note, I've only joined this mailing list this afternoon, but it already seems like an good choice (albeit one I should have made about 2 years ago :P) - very active and very helpful - excellent work, many thanks indeed.

Cheers,

Ian

On Wed, Apr 2, 2008 at 8:08 PM, Richard Broersma <richard.broersma@gmail.com> wrote:
On Wed, Apr 2, 2008 at 11:43 AM, Ian Sillitoe
<ian.sillitoe@googlemail.com> wrote:
> I completely take your points - so maybe I should be asking for advice on
> database design instead.

There are a couple of choice you can use:

First is the PostgreSQL contribe Module Ltree:
http://www.postgresql.org/docs/current/static/ltree.html

There is the common adjancy list model:
http://www.sqlsummit.com/AdjacencyList.htm

Celko's Nested set model: { probably provides the most functionality
but with hi costs for INSERT/UPDATE ops)
http://www.intelligententerprise.com/001020/celko.jhtml

and Lastly the Path enurmation model:
http://www.onlamp.com/pub/a/onlamp/2004/08/05/hierarchical_sql.html

One of these model should hopefully be a best fit for you.

--
Regards,
Richard Broersma Jr.

Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'

From
Ted Byers
Date:
--- Ian Sillitoe <ian.sillitoe@googlemail.com> wrote:

> I completely take your points - so maybe I should be
> asking for advice on
> database design instead.
>
> We are annotating nodes on a hierarchical structure
> where NULL implied an

I don't mean to be rude, but yuck.  Why provide a
record for data that isn't there?

I recently put together a database (at present at a
very early prototype stage) to handle biological data.
 Considering ONLY the taxonomic portion of it, I opted
for a general hierarchical model.  Maybe not the most
efficient, yet, but no waste, yet.

In what is an over simplification, I created a taxon
table, with columns for a unique ID number, taxonomic
level (species, genus, &c. with all the glorious
subcategories taxonomists of varius tripes are wont to
create/define).  The taxonomic levels are predefined
(taken from my references that deal with such
matters), in a lookup table.  Then, I have columns to
hold parent taxon ID number.

Of course, there is, in a middle layer, constraints
that prevents recording a species as a parent of a
genus, and other silliness (no linking a species
epithet directly to a class or order).  But you get
the idea.

An object oriented programming metaphore might be that
of a singly linked list.  And of course, I have
deliberately obfuscated the complexity arising from
having to handle synonyms both usefully and
gracefully, but the core idea is simple, and there are
no nulls, except for taxa representing a whole
kingdom.  Last I checked, there were no taxa more
general than the kingdom, and there's only a handful
of kingdoms.  If you don't have data on subclass or
superfamily or subspecies, you just don't put it in.
Therefore no nulls!

I have no idea if this model would work for you, but
maybe it will help.

Cheers,

Ted

Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'

From
"Ian Sillitoe"
Date:

> We are annotating nodes on a hierarchical structure
> where NULL implied an

I don't mean to be rude, but yuck.  Why provide a
record for data that isn't there?

No offence taken - I'm trying to improve an old (partially inherited) system, hence the original post.

I have no idea if this model would work for you, but
maybe it will help.

That approach all sounds sensible, however going through the links that Richard Broersma sent over in a previous post - I'm currently leaning towards the flexibility and intuitive interface that ltree contrib module appears to offer.

http://www.postgresql.org/docs/8.3/interactive/ltree.html

Cheers,

Ian