Thread: (FAQ?) JOIN condition - 'WHERE NULL = NULL'
This is probably a stupid question that has a very quick answer, however it would be great if someone could put me out of my misery...
I'm trying to JOIN two tables (well a table and a resultset from a PL/pgsql function) where a joining column can be NULL
-- for a given id, return a bunch of columns that I can use for joins
psql> select * from get_colnames_for_id('1.10.8');
depth1 | depth2 | depth3 | depth4 | depth5 |
--------+--------+--------+--------+--------+
1 | 10 | 8 | | |
(1 row)
--
psql> select * from table where depth1 = 1 AND depth2 = 10 AND depth3 = 8 AND depth4 IS NULL;
id | depth1 | depth2 | depth3 | depth4 | depth5 | name |
---+--------+--------+--------+--------+--------+----------------------+
1 | 1 | 10 | 8 | | | name for node 1.10.8 |
(1 row)
-- I (wrongly) expected the following to be equivalent
psql> select c.* from get_cathcode('1.10.8') c JOIN table t USING(depth1, depth2, depth3, depth4);
id | depth1 | depth2 | depth3 | depth4 | depth5 | name |
---+--------+--------+--------+--------+--------+-----------+
(0 rows)
-- Whereas the following works...
psql> select * from get_colnames_for_id('1.10.8.10');
depth1 | depth2 | depth3 | depth4 | depth5 |
--------+--------+--------+--------+--------+
1 | 10 | 8 | 10 | |
(1 row)
psql> select c.* from get_cathcode('1.10.8.10') c JOIN table t USING(depth1, depth2, depth3, depth4);
id | depth1 | depth2 | depth3 | depth4 | depth5 | name |
---+--------+--------+--------+--------+--------+-------------------------+
2 | 1 | 10 | 8 | 10 | | name for node 1.10.8.10 |
(0 rows)
So, I'm currently assuming from all this that joining on t1.col = t2.col doesn't make any sense when t1.col and t2.col are both NULL - since:
psql> SELECT (NULL = NULL) IS TRUE;
?column?
----------
f
(1 row)
psql> SELECT (NULL IS NULL) IS TRUE;
?column?
----------
t
(1 row)
Unless I've missed something, the docs on http://www.postgresql.org/docs/8.1/static/functions-comparison.html seem to suggest that the concept is an example of bad programming and the workaround (of switching on the 'transform_null_equals' config) is a hack. Is this all true or did my logic just get screwed up at some point? Unless I've just missed something obvious, it seems useful to be able to join two tables based on a condition where they share a NULL column - is there another way of doing this?
Many thanks,
Ian
I'm trying to JOIN two tables (well a table and a resultset from a PL/pgsql function) where a joining column can be NULL
-- for a given id, return a bunch of columns that I can use for joins
psql> select * from get_colnames_for_id('1.10.8');
depth1 | depth2 | depth3 | depth4 | depth5 |
--------+--------+--------+--------+--------+
1 | 10 | 8 | | |
(1 row)
--
psql> select * from table where depth1 = 1 AND depth2 = 10 AND depth3 = 8 AND depth4 IS NULL;
id | depth1 | depth2 | depth3 | depth4 | depth5 | name |
---+--------+--------+--------+--------+--------+----------------------+
1 | 1 | 10 | 8 | | | name for node 1.10.8 |
(1 row)
-- I (wrongly) expected the following to be equivalent
psql> select c.* from get_cathcode('1.10.8') c JOIN table t USING(depth1, depth2, depth3, depth4);
id | depth1 | depth2 | depth3 | depth4 | depth5 | name |
---+--------+--------+--------+--------+--------+-----------+
(0 rows)
-- Whereas the following works...
psql> select * from get_colnames_for_id('1.10.8.10');
depth1 | depth2 | depth3 | depth4 | depth5 |
--------+--------+--------+--------+--------+
1 | 10 | 8 | 10 | |
(1 row)
psql> select c.* from get_cathcode('1.10.8.10') c JOIN table t USING(depth1, depth2, depth3, depth4);
id | depth1 | depth2 | depth3 | depth4 | depth5 | name |
---+--------+--------+--------+--------+--------+-------------------------+
2 | 1 | 10 | 8 | 10 | | name for node 1.10.8.10 |
(0 rows)
So, I'm currently assuming from all this that joining on t1.col = t2.col doesn't make any sense when t1.col and t2.col are both NULL - since:
psql> SELECT (NULL = NULL) IS TRUE;
?column?
----------
f
(1 row)
psql> SELECT (NULL IS NULL) IS TRUE;
?column?
----------
t
(1 row)
Unless I've missed something, the docs on http://www.postgresql.org/docs/8.1/static/functions-comparison.html seem to suggest that the concept is an example of bad programming and the workaround (of switching on the 'transform_null_equals' config) is a hack. Is this all true or did my logic just get screwed up at some point? Unless I've just missed something obvious, it seems useful to be able to join two tables based on a condition where they share a NULL column - is there another way of doing this?
Many thanks,
Ian
Apologies - just reread my post and I've confused matters with typos during the abtraction of my code. For the purposes of the example given, please read "get_colnames_for_id()" rather than "get_cathcode()"...
Cheers,
Ian
Cheers,
Ian
On Wed, Apr 2, 2008 at 5:49 PM, Ian Sillitoe <ian.sillitoe@googlemail.com> wrote:
This is probably a stupid question that has a very quick answer, however it would be great if someone could put me out of my misery...
I'm trying to JOIN two tables (well a table and a resultset from a PL/pgsql function) where a joining column can be NULL
-- for a given id, return a bunch of columns that I can use for joins
psql> select * from get_colnames_for_id('1.10.8');
depth1 | depth2 | depth3 | depth4 | depth5 |
--------+--------+--------+--------+--------+
1 | 10 | 8 | | |
(1 row)
--
psql> select * from table where depth1 = 1 AND depth2 = 10 AND depth3 = 8 AND depth4 IS NULL;
id | depth1 | depth2 | depth3 | depth4 | depth5 | name |
---+--------+--------+--------+--------+--------+----------------------+
1 | 1 | 10 | 8 | | | name for node 1.10.8 |
(1 row)
-- I (wrongly) expected the following to be equivalent
psql> select c.* from get_cathcode('1.10.8') c JOIN table t USING(depth1, depth2, depth3, depth4);
id | depth1 | depth2 | depth3 | depth4 | depth5 | name |
---+--------+--------+--------+--------+--------+-----------+
(0 rows)
-- Whereas the following works...
psql> select * from get_colnames_for_id('1.10.8.10');
depth1 | depth2 | depth3 | depth4 | depth5 |
--------+--------+--------+--------+--------+
1 | 10 | 8 | 10 | |
(1 row)
psql> select c.* from get_cathcode('1.10.8.10') c JOIN table t USING(depth1, depth2, depth3, depth4);
id | depth1 | depth2 | depth3 | depth4 | depth5 | name |
---+--------+--------+--------+--------+--------+-------------------------+
2 | 1 | 10 | 8 | 10 | | name for node 1.10.8.10 |
(0 rows)
So, I'm currently assuming from all this that joining on t1.col = t2.col doesn't make any sense when t1.col and t2.col are both NULL - since:
psql> SELECT (NULL = NULL) IS TRUE;
?column?
----------
f
(1 row)
psql> SELECT (NULL IS NULL) IS TRUE;
?column?
----------
t
(1 row)
Unless I've missed something, the docs on http://www.postgresql.org/docs/8.1/static/functions-comparison.html seem to suggest that the concept is an example of bad programming and the workaround (of switching on the 'transform_null_equals' config) is a hack. Is this all true or did my logic just get screwed up at some point? Unless I've just missed something obvious, it seems useful to be able to join two tables based on a condition where they share a NULL column - is there another way of doing this?
Many thanks,
Ian
On Wed, Apr 02, 2008 at 05:49:37PM +0100, Ian Sillitoe wrote: > This is probably a stupid question that has a very quick answer, however it > would be great if someone could put me out of my misery... > > I'm trying to JOIN two tables (well a table and a resultset from a PL/pgsql > function) where a joining column can be NULL You can't, NULL is not a value like other values. > Unless I've missed something, the docs on > http://www.postgresql.org/docs/8.1/static/functions-comparison.html seem to > suggest that the concept is an example of bad programming and the workaround > (of switching on the 'transform_null_equals' config) is a hack. Is this all > true or did my logic just get screwed up at some point? Unless I've just > missed something obvious, it seems useful to be able to join two tables > based on a condition where they share a NULL column - is there another way > of doing this? 'transform_null_equals' won't help you at all here since it only help in the very specific case of comparing with a constant. The easiest is to think of NULL as meaning 'unknown'. Clearly you can't copare that usefully with anything. Perhaps you can use a marker like -1 to acheive the effect you want? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
Ian Sillitoe wrote: > This is probably a stupid question that has a very quick answer, however it > would be great if someone could put me out of my misery... > > I'm trying to JOIN two tables (well a table and a resultset from a PL/pgsql > function) where a joining column can be NULL > Sounds like you might want something like: SELECT * FROM tablea INNER JOIN tableb ON (NOT tablea.id IS DISTINCT FROM tableb.tablea_id_fk); which can also be written as: SELECT * FROM tablea, tableb WHERE NOT tablea.id IS DISTINCT FROM tableb.tableid_id_fk ; There's been lots of recent discussion of IS DISTINCT FROM, which is why it comes straight to mind. If that's not what you meant (by NULL = NULL) then might you be looking for an OUTER JOIN ? -- Craig Ringer
Thanks for the reply - after a bit more poking around it seems that:
t1.col IS NOT DISTINCT FROM t2.col
should work - although I guess this means an upgrade from 8.1 to 8.3
Cheers,
Ian
t1.col IS NOT DISTINCT FROM t2.col
should work - although I guess this means an upgrade from 8.1 to 8.3
Cheers,
Ian
On Wed, Apr 2, 2008 at 6:23 PM, Martijn van Oosterhout <kleptog@svana.org> wrote:
On Wed, Apr 02, 2008 at 05:49:37PM +0100, Ian Sillitoe wrote:You can't, NULL is not a value like other values.
> This is probably a stupid question that has a very quick answer, however it
> would be great if someone could put me out of my misery...
>
> I'm trying to JOIN two tables (well a table and a resultset from a PL/pgsql
> function) where a joining column can be NULL'transform_null_equals' won't help you at all here since it only help
> Unless I've missed something, the docs on
> http://www.postgresql.org/docs/8.1/static/functions-comparison.html seem to
> suggest that the concept is an example of bad programming and the workaround
> (of switching on the 'transform_null_equals' config) is a hack. Is this all
> true or did my logic just get screwed up at some point? Unless I've just
> missed something obvious, it seems useful to be able to join two tables
> based on a condition where they share a NULL column - is there another way
> of doing this?
in the very specific case of comparing with a constant. The easiest is
to think of NULL as meaning 'unknown'. Clearly you can't copare that
usefully with anything.
Perhaps you can use a marker like -1 to acheive the effect you want?
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFH88EBIB7bNG8LQkwRAh7CAJ9ffmMnyE/OeJrTepSaOURb2WSRhACeMYql
tnrzLDVLyFfHhDqKiY02QOM=
=dhZf
-----END PGP SIGNATURE-----
--- Martijn van Oosterhout <kleptog@svana.org> wrote: > 'transform_null_equals' won't help you at all here > since it only help > in the very specific case of comparing with a > constant. The easiest is > to think of NULL as meaning 'unknown'. Clearly you > can't compare that > usefully with anything. > Not even a null in another record ... (hence my question below). If the value is unknown, then it could be anything, and (thinking as a mathematician considering real numbers) the probability of two records having null having their true, but unknown values be the same is indistinguishable from 0. (with integers or decimal numbers or floating point numbers, that would be qualified with the clause, for practical purposes :) > Perhaps you can use a marker like -1 to achieve the > effect you want? > Is that really valid, though, especially in a join? I mean, if the column in question has multiple nulls, in each of the tables, then how do you, of the DB, figure out which of the rows containing nulls in the one table match up with rows in the other table containing nulls? Or is the resultset supposed to be the product of the two sets (match each row with a null in the one table with each row with a null in the other)? That, for me, creates a nightmare situation where some of my tables have tens of millions of rows, and if even 1% of the rows contains null in the relevant column, I don't even want to think about processing the resultset that would be produced from such an idea using these tables. My joins always only involve primary keys, or indeces on columns that prohibit nulls, so this problem doesn't crop up in my code, but would I be not too far from right in expecting that the rational thing to do when creating a join on columns that allow nulls is to exclude ALL rows, in either table, where the columns involved are null? Cheers, Ted
Ah - thanks and apologies for not finding those previous discussions. Does anyone else feel this might be useful as a point on the NULL section of the FAQ (it certainly would have saved me an afternoon)?
Cheers,
Ian
Cheers,
Ian
On Wed, Apr 2, 2008 at 6:24 PM, Craig Ringer <craig@postnewspapers.com.au> wrote:
Ian Sillitoe wrote:Sounds like you might want something like:This is probably a stupid question that has a very quick answer, however it
would be great if someone could put me out of my misery...
I'm trying to JOIN two tables (well a table and a resultset from a PL/pgsql
function) where a joining column can be NULL
SELECT * FROM tablea INNER JOIN tableb ON (NOT tablea.id IS DISTINCT FROM tableb.tablea_id_fk);
which can also be written as:
SELECT * FROM tablea, tableb WHERE NOT tablea.id IS DISTINCT FROM tableb.tableid_id_fk ;
There's been lots of recent discussion of IS DISTINCT FROM, which is why it comes straight to mind.
If that's not what you meant (by NULL = NULL) then might you be looking for an OUTER JOIN ?
--
Craig Ringer
>>> "Ian Sillitoe" <ian.sillitoe@googlemail.com> 03/04/08 5:49 AM >>> I'm trying to JOIN two tables (well a table and a resultset from a PL/pgsql function) where a joining column can be NULL In a join, no value can be ascribed to a null field, so the equivalence fails. You can do tests like IS NULL, which strictlyspeaking is test for meeting a condition (that of not having any value), not a test for equivalence. As (simplistically)the condition NULL does equal the condition NULL, (NULL = NULL) is true. The simplest approach is perhaps to have a value which does not occur naturally (like -1), as a substitute for nulls in therelevant columns. I believe this can be achieved via a view in your case, (pun intended :-), but which may be less efficientif performance is an issue: create view depth_v as select d.id, d.name, case when (d.depth1 is null) then -1 else d.depth1 end as depth1, case when (d.depth2 is null) then -1 else d.depth2 end as depth2, case when (d.depth3 is null) then -1 else d.depth3 end as depth3, case when (d.depth4 is null) then -1 else d.depth4 end as depth4, case when (d.depth5 is null) then -1 else d.depth5 end as depth5 from depth_table d; You could then join against this view instead of your underlying table, eg: select c.* from get_cathcode('1.10.8') c JOIN depth_v t USING(depth1, depth2, depth3, depth4); The view will not have any NULL values in the depth fields, so the join should work. see: http://www.postgresql.org/docs/8.2/static/functions-conditional.html (Incidentally, if you are storing bathymetry or CTD data, I'd be interested in seeing your db structures, as I may be doingsome work in that area soon :-) HTH, Brent Wood
I completely take your points - so maybe I should be asking for advice on database design instead.
We are annotating nodes on a hierarchical structure where NULL implied an absence of a value in this depth of the hierarchy. As a method of enforcing this view, we use constraints to make sure we only get one combination of rows for a given node of the hierarchy (whether the columns are integers or NULLs). I'm not suggesting this was 'correct', I'm just saying this was what we were thinking.
We considered using a flag of some description instead, however this also seemed a bit messy (although in hindsight probably far less messy than the job I now face trying to recode this table).
psql> select * from tree;
id | depth1 | depth2 | depth3 | name
----+--------+--------+--------+--------------------------
1 | 1 | | | Name for node_id "1"
1 | 1 | 1 | | Name for node_id "1.1"
2 | 1 | 2 | | Name for node_id "1.2"
3 | 1 | 2 | 3 | Name for node_id "1.2.3"
(3 rows)
psql> select * from leaf_node;
id | depth1 | depth2 | depth3 | depth4 | depth5
--------+--------+--------+--------+--------+--------
leaf_1 | 1 | 2 | 3 | 1 | 1
leaf_2 | 1 | 2 | 3 | 1 | 2
leaf_3 | 1 | 2 | 3 | 1 | 3
leaf_4 | 1 | 2 | 3 | 2 | 1
leaf_5 | 1 | 2 | 3 | 2 | 2
(5 rows)
So, is there a generally accept method of modelling nodes in hierarchical structures (i.e. the 'tree' table in the example above) that doesn't use NULLs? We started off using ids such as '1.1.2' as the PRIMARY KEY but concatenating strings to go between columns and ids ended up being too painful.
Would it be sensible to use an ARRAY of depths as the primary key? I guess this would mean it would be possible to join
psql> select * from tree;
id | depth1 | depth2 | depth3 | name
------+--------+--------+--------+--------------------------
[1] | 1 | | | Name for node_id "1"
[1,2] | 1 | 1 | | Name for node_id "1.1"
Advice most welcome.
Cheers,
Ian
We are annotating nodes on a hierarchical structure where NULL implied an absence of a value in this depth of the hierarchy. As a method of enforcing this view, we use constraints to make sure we only get one combination of rows for a given node of the hierarchy (whether the columns are integers or NULLs). I'm not suggesting this was 'correct', I'm just saying this was what we were thinking.
We considered using a flag of some description instead, however this also seemed a bit messy (although in hindsight probably far less messy than the job I now face trying to recode this table).
psql> select * from tree;
id | depth1 | depth2 | depth3 | name
----+--------+--------+--------+--------------------------
1 | 1 | | | Name for node_id "1"
1 | 1 | 1 | | Name for node_id "1.1"
2 | 1 | 2 | | Name for node_id "1.2"
3 | 1 | 2 | 3 | Name for node_id "1.2.3"
(3 rows)
psql> select * from leaf_node;
id | depth1 | depth2 | depth3 | depth4 | depth5
--------+--------+--------+--------+--------+--------
leaf_1 | 1 | 2 | 3 | 1 | 1
leaf_2 | 1 | 2 | 3 | 1 | 2
leaf_3 | 1 | 2 | 3 | 1 | 3
leaf_4 | 1 | 2 | 3 | 2 | 1
leaf_5 | 1 | 2 | 3 | 2 | 2
(5 rows)
So, is there a generally accept method of modelling nodes in hierarchical structures (i.e. the 'tree' table in the example above) that doesn't use NULLs? We started off using ids such as '1.1.2' as the PRIMARY KEY but concatenating strings to go between columns and ids ended up being too painful.
Would it be sensible to use an ARRAY of depths as the primary key? I guess this would mean it would be possible to join
psql> select * from tree;
id | depth1 | depth2 | depth3 | name
------+--------+--------+--------+--------------------------
[1] | 1 | | | Name for node_id "1"
[1,2] | 1 | 1 | | Name for node_id "1.1"
Advice most welcome.
Cheers,
Ian
On Wed, Apr 2, 2008 at 6:48 PM, Ted Byers <r.ted.byers@rogers.com> wrote:
--- Martijn van Oosterhout <kleptog@svana.org> wrote:> can't compare that
> 'transform_null_equals' won't help you at all here
> since it only help
> in the very specific case of comparing with a
> constant. The easiest is
> to think of NULL as meaning 'unknown'. Clearly you
> usefully with anything.
>
Not even a null in another record ... (hence my
question below). If the value is unknown, then it
could be anything, and (thinking as a mathematician
considering real numbers) the probability of two
records having null having their true, but unknown
values be the same is indistinguishable from 0. (with
integers or decimal numbers or floating point numbers,
that would be qualified with the clause, for practical
purposes :)
> Perhaps you can use a marker like -1 to achieve the
> effect you want?
>
Is that really valid, though, especially in a join? I
mean, if the column in question has multiple nulls, in
each of the tables, then how do you, of the DB, figure
out which of the rows containing nulls in the one
table match up with rows in the other table containing
nulls? Or is the resultset supposed to be the product
of the two sets (match each row with a null in the one
table with each row with a null in the other)? That,
for me, creates a nightmare situation where some of my
tables have tens of millions of rows, and if even 1%
of the rows contains null in the relevant column, I
don't even want to think about processing the
resultset that would be produced from such an idea
using these tables.
My joins always only involve primary keys, or indeces
on columns that prohibit nulls, so this problem
doesn't crop up in my code, but would I be not too far
from right in expecting that the rational thing to do
when creating a join on columns that allow nulls is to
exclude ALL rows, in either table, where the columns
involved are null?
Cheers,
Ted
"Ian Sillitoe" <ian.sillitoe@googlemail.com> writes: > Thanks for the reply - after a bit more poking around it seems that: > t1.col IS NOT DISTINCT FROM t2.col > should work - although I guess this means an upgrade from 8.1 to 8.3 Note that while this will give you the right answers, it will probably be dog-slow because the planner has exactly zero intelligence about it. You'd be much better advised to reconsider your table design and get rid of the assumption that NULLs should be considered "equal". regards, tom lane
On Wed, Apr 2, 2008 at 11:43 AM, Ian Sillitoe <ian.sillitoe@googlemail.com> wrote: > I completely take your points - so maybe I should be asking for advice on > database design instead. There are a couple of choice you can use: First is the PostgreSQL contribe Module Ltree: http://www.postgresql.org/docs/current/static/ltree.html There is the common adjancy list model: http://www.sqlsummit.com/AdjacencyList.htm Celko's Nested set model: { probably provides the most functionality but with hi costs for INSERT/UPDATE ops) http://www.intelligententerprise.com/001020/celko.jhtml and Lastly the Path enurmation model: http://www.onlamp.com/pub/a/onlamp/2004/08/05/hierarchical_sql.html One of these model should hopefully be a best fit for you. -- Regards, Richard Broersma Jr.
On Wed, Apr 02, 2008 at 05:49:37PM +0100, Ian Sillitoe wrote: > Unless I've missed something, the docs on > http://www.postgresql.org/docs/8.1/static/functions-comparison.html seem to > suggest that the concept is an example of bad programming and the workaround > (of switching on the 'transform_null_equals' config) is a hack. Is this all > true or did my logic just get screwed up at some point? Nope, it's all true. The problem you are having is that one NULL is not the same as (and is not not the same as) another NULL. NULL values are not equivalent to any other value, incuding other NULLs. This is why some database people don't like to allow NULLs at all. > Unless I've just > missed something obvious, it seems useful to be able to join two tables > based on a condition where they share a NULL column - is there another way > of doing this? . . .AND t1.column IS NULL AND t2.othercolumn IS NULL. Alternatively, you can use coalesce and join on some value, like this: . . .AND coalesce(t1.column, 0) = coalesce(t2.othercolumn,0); This is a bit of a hack, and won't work in every case (if you don't have a value that you know can't be in either table, you're out of luck). A
Thanks very much for this and to all the reponses. As a side note, I've only joined this mailing list this afternoon, but it already seems like an good choice (albeit one I should have made about 2 years ago :P) - very active and very helpful - excellent work, many thanks indeed.
Cheers,
Ian
Cheers,
Ian
On Wed, Apr 2, 2008 at 8:08 PM, Richard Broersma <richard.broersma@gmail.com> wrote:
On Wed, Apr 2, 2008 at 11:43 AM, Ian SillitoeThere are a couple of choice you can use:
<ian.sillitoe@googlemail.com> wrote:
> I completely take your points - so maybe I should be asking for advice on
> database design instead.
First is the PostgreSQL contribe Module Ltree:
http://www.postgresql.org/docs/current/static/ltree.html
There is the common adjancy list model:
http://www.sqlsummit.com/AdjacencyList.htm
Celko's Nested set model: { probably provides the most functionality
but with hi costs for INSERT/UPDATE ops)
http://www.intelligententerprise.com/001020/celko.jhtml
and Lastly the Path enurmation model:
http://www.onlamp.com/pub/a/onlamp/2004/08/05/hierarchical_sql.html
One of these model should hopefully be a best fit for you.
--
Regards,
Richard Broersma Jr.
--- Ian Sillitoe <ian.sillitoe@googlemail.com> wrote: > I completely take your points - so maybe I should be > asking for advice on > database design instead. > > We are annotating nodes on a hierarchical structure > where NULL implied an I don't mean to be rude, but yuck. Why provide a record for data that isn't there? I recently put together a database (at present at a very early prototype stage) to handle biological data. Considering ONLY the taxonomic portion of it, I opted for a general hierarchical model. Maybe not the most efficient, yet, but no waste, yet. In what is an over simplification, I created a taxon table, with columns for a unique ID number, taxonomic level (species, genus, &c. with all the glorious subcategories taxonomists of varius tripes are wont to create/define). The taxonomic levels are predefined (taken from my references that deal with such matters), in a lookup table. Then, I have columns to hold parent taxon ID number. Of course, there is, in a middle layer, constraints that prevents recording a species as a parent of a genus, and other silliness (no linking a species epithet directly to a class or order). But you get the idea. An object oriented programming metaphore might be that of a singly linked list. And of course, I have deliberately obfuscated the complexity arising from having to handle synonyms both usefully and gracefully, but the core idea is simple, and there are no nulls, except for taxa representing a whole kingdom. Last I checked, there were no taxa more general than the kingdom, and there's only a handful of kingdoms. If you don't have data on subclass or superfamily or subspecies, you just don't put it in. Therefore no nulls! I have no idea if this model would work for you, but maybe it will help. Cheers, Ted
> We are annotating nodes on a hierarchical structureI don't mean to be rude, but yuck. Why provide a
> where NULL implied an
record for data that isn't there?
No offence taken - I'm trying to improve an old (partially inherited) system, hence the original post.
I have no idea if this model would work for you, but
maybe it will help.
That approach all sounds sensible, however going through the links that Richard Broersma sent over in a previous post - I'm currently leaning towards the flexibility and intuitive interface that ltree contrib module appears to offer.
http://www.postgresql.org/docs/8.3/interactive/ltree.html
Cheers,
Ian