Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL' - Mailing list pgsql-general

From Ian Sillitoe
Subject Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'
Date
Msg-id c6ff42340804021143p7e688b6ch8c0f4f98feaec0ea@mail.gmail.com
Whole thread Raw
In response to Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'  (Ted Byers <r.ted.byers@rogers.com>)
Responses Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'  ("Richard Broersma" <richard.broersma@gmail.com>)
Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'  (Ted Byers <r.ted.byers@rogers.com>)
List pgsql-general
I completely take your points - so maybe I should be asking for advice on database design instead.

We are annotating nodes on a hierarchical structure where NULL implied an absence of a value in this depth of the hierarchy. As a method of enforcing this view, we use constraints to make sure we only get one combination of rows for a given node of the hierarchy (whether the columns are integers or NULLs). I'm not suggesting this was 'correct', I'm just saying this was what we were thinking.

We considered using a flag of some description instead, however this also seemed a bit messy (although in hindsight probably far less messy than the job I now face trying to recode this table).

psql> select * from tree;
 id | depth1 | depth2 | depth3 |           name
----+--------+--------+--------+--------------------------
  1 |      1 |        |        | Name for node_id "1"
  1 |      1 |      1 |        | Name for node_id "1.1"
  2 |      1 |      2 |        | Name for node_id "1.2"
  3 |      1 |      2 |      3 | Name for node_id "1.2.3"
(3 rows)

psql> select * from leaf_node;
   id   | depth1 | depth2 | depth3 | depth4 | depth5
--------+--------+--------+--------+--------+--------
 leaf_1 |      1 |      2 |      3 |      1 |      1
 leaf_2 |      1 |      2 |      3 |      1 |      2
 leaf_3 |      1 |      2 |      3 |      1 |      3
 leaf_4 |      1 |      2 |      3 |      2 |      1
 leaf_5 |      1 |      2 |      3 |      2 |      2
(5 rows)

So, is there a generally accept method of modelling nodes in hierarchical structures (i.e. the 'tree' table in the example above) that doesn't use NULLs? We started off using ids such as '1.1.2' as the PRIMARY KEY but concatenating strings to go between columns and ids ended up being too painful.

Would it be sensible to use an ARRAY of depths as the primary key? I guess this would mean it would be possible to join

psql> select * from tree;
  id  | depth1 | depth2 | depth3 |           name
------+--------+--------+--------+--------------------------
[1]   |      1 |        |        | Name for node_id "1"
[1,2] |      1 |      1 |        | Name for node_id "1.1"

Advice most welcome.

Cheers,

Ian

On Wed, Apr 2, 2008 at 6:48 PM, Ted Byers <r.ted.byers@rogers.com> wrote:
--- Martijn van Oosterhout <kleptog@svana.org> wrote:
> 'transform_null_equals' won't help you at all here
> since it only help
> in the very specific case of comparing with a
> constant. The easiest is
> to think of NULL as meaning 'unknown'. Clearly you
> can't compare that
> usefully with anything.
>
Not even a null in another record ... (hence my
question below).  If the value is unknown, then it
could be anything, and (thinking as a mathematician
considering real numbers) the probability of two
records having null having their true, but unknown
values be the same is indistinguishable from 0. (with
integers or decimal numbers or floating point numbers,
that would be qualified with the clause, for practical
purposes :)

> Perhaps you can use a marker like -1 to achieve the
> effect you want?
>
Is that really valid, though, especially in a join? I
mean, if the column in question has multiple nulls, in
each of the tables, then how do you, of the DB, figure
out which of the rows containing nulls in the one
table match up with rows in the other table containing
nulls?  Or is the resultset supposed to be the product
of the two sets (match each row with a null in the one
table with each row with a null in the other)?  That,
for me, creates a nightmare situation where some of my
tables have tens of millions of rows, and if even 1%
of the rows contains null in the relevant column, I
don't even want to think about processing the
resultset that would be produced from such an idea
using these tables.

My joins always only involve primary keys, or indeces
on columns that prohibit nulls, so this problem
doesn't crop up in my code, but would I be not too far
from right in expecting that the rational thing to do
when creating a join on columns that allow nulls is to
exclude ALL rows, in either table, where the columns
involved are null?

Cheers,

Ted

pgsql-general by date:

Previous
From: "Brent Wood"
Date:
Subject: Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'
Next
From: Tom Lane
Date:
Subject: Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'