Thread: Tree structure table normalization problem (do I need a trigger?)

Tree structure table normalization problem (do I need a trigger?)

From
Frank Joerdens
Date:
In a recent thread (How to represent a tree-structure in a relational
database) I asked how to do a tree structure in SQL, and got lots of
suggestions (thanks!), of which I chose the one below:

create table Category (
CategoryID       int4  not null  primary key,
ParentCategoryID int4  not null  REFERENCES Category (CategoryID),
CategoryName     varchar(100)
);

The one described in Joe Celko's article (the one with the worm that
travels along the edge of the tree . . . ) seemed more evolved but
requires fairly complex SQL stuff, I thought, for simple operations that
are straighforward with the above model. However, I have a problem now
which seems non-trivial: I am at some point in the tree, say 3 nodes
down from the root, but I don't know where I am exactly (across which
nodes would I travel along the shortest path to the top?) and would like
to find out. This is, again, not really difficult if I know how deep
into the tree I am, in which case I can simply do (I know that I am 3
nodes from the root and that my current node number is x):

SELECT A1.CategoryID, A2.CategoryID, A3.CategoryID FROM Category AS A1,
Category AS A2, Category AS A3 WHERE A3.CategoryID=x AND
A3.ParentCategoryID=A2.CategoryID AND A2.ParentCategoryID=A1.CategoryID;

(This is probably very expensive if the tree gets really deep, but I
don't expect that to happen in my database anytime soon.)

So I introduced a 'level' column into the above schema, where I store
the information about how deep I am into the tree to have it readily
available when I want to compute the path to the top. Unfortunately,
this is dangerous because the information is already implicit in the
original schema. The problem is that the only way I can see that you
would get at the information is by walking up the tree step-by-step
until you get a zero value (which is assigned to the root). This means
you need a loop control structure which means you have to write a
PL/pgSQL procedure (or some other procedure) that is run by a trigger to
update the level column on insert or update, as in

WHILE (CategoryID is not null) LOOP   Run SQL statement to get the next higher-up node's CategoryID and
increment a counter.
END LOOP;
Return counter and insert value into level column.

This seems to feasible but not really as straightforward as one might
hope. Is there an easier way?

- Frank


Re: Tree structure table normalization problem (do I need a trigger?)

From
Tulassay Zsolt
Date:
On Tue, 19 Dec 2000, Frank Joerdens wrote:

> In a recent thread (How to represent a tree-structure in a relational
> database) I asked how to do a tree structure in SQL, and got lots of
> suggestions (thanks!), of which I chose the one below:
> 
> create table Category (
> CategoryID       int4  not null  primary key,
> ParentCategoryID int4  not null  REFERENCES Category (CategoryID),
> CategoryName     varchar(100)
> );
> 
> The one described in Joe Celko's article (the one with the worm that
> travels along the edge of the tree . . . ) seemed more evolved but
> requires fairly complex SQL stuff, I thought, for simple operations that
> are straighforward with the above model. However, I have a problem now
> which seems non-trivial: I am at some point in the tree, say 3 nodes
> down from the root, but I don't know where I am exactly (across which
> nodes would I travel along the shortest path to the top?) and would like
> to find out. This is, again, not really difficult if I know how deep
> into the tree I am, in which case I can simply do (I know that I am 3
> nodes from the root and that my current node number is x):

That's exactly what the 'worm' stuff is all about...
If you use the structure above, you have to use recursion in order to
find out the path, which is AFAIK missing from SQL (not counting some
DB2 feature which allows recursion, in a strange way, but we're talking
Postgres here). This solution also does not require any client-side
programming, since you only have to use one single statement,
regardless of which level you are on in the tree.

The SQL stuff of that nested set structure is fairly easy, I wrote some
quick'n'dirty plpgsql functions that will do inserts, updates, deletes
from the tree, display level number etc.
I can send it to you if you like (please allow a few days since I
have several exams at the university this week).

Zsolt Tulassay



> 
> SELECT A1.CategoryID, A2.CategoryID, A3.CategoryID FROM Category AS A1,
> Category AS A2, Category AS A3 WHERE A3.CategoryID=x AND
> A3.ParentCategoryID=A2.CategoryID AND A2.ParentCategoryID=A1.CategoryID;
> 
> (This is probably very expensive if the tree gets really deep, but I
> don't expect that to happen in my database anytime soon.)
> 
> So I introduced a 'level' column into the above schema, where I store
> the information about how deep I am into the tree to have it readily
> available when I want to compute the path to the top. Unfortunately,
> this is dangerous because the information is already implicit in the
> original schema. The problem is that the only way I can see that you
> would get at the information is by walking up the tree step-by-step
> until you get a zero value (which is assigned to the root). This means
> you need a loop control structure which means you have to write a
> PL/pgSQL procedure (or some other procedure) that is run by a trigger to
> update the level column on insert or update, as in
> 
> WHILE (CategoryID is not null) LOOP
>     Run SQL statement to get the next higher-up node's CategoryID and
> increment a counter.
> END LOOP;
> Return counter and insert value into level column.
> 
> This seems to feasible but not really as straightforward as one might
> hope. Is there an easier way?
> 
> - Frank
> 



Re: Tree structure table normalization problem (do I need atrigger?)

From
Frank Joerdens
Date:
Tulassay Zsolt wrote:
[ . . . ]
> The SQL stuff of that nested set structure is fairly easy, I wrote some
> quick'n'dirty plpgsql functions that will do inserts, updates, deletes
> from the tree, display level number etc.

What scared me about it in particular was one scenario where you try to delete a subtree.
This would normally leave a gap, since the structure is based on the worm's ability to get
from one node to the next with an increment of just 1. Once you had a subtree deleted,
you'd either would have to have the worm leap-frog (a leaping frog-worm?!! :)) the gap or
update an entire half of the tree to close it . . . then my brain started to hurt and I
gave up.

> I can send it to you if you like (please allow a few days since I
> have several exams at the university this week).

Sure, I'd like to have a look at it!

Thanks, Frank


Re: Tree structure table normalization problem (do I need atrigger?)

From
Neil Thompson
Date:

On Tue, 19 Dec 2000, Frank Joerdens wrote:

> Tulassay Zsolt wrote:
> [ . . . ]
> > I can send it to you if you like (please allow a few days since I
> > have several exams at the university this week).
>
> Sure, I'd like to have a look at it!

I'd like to have a look at it as well, please.

Cheers! (Relax...have a homebrew)

Neil



Re: Tree structure table normalization problem (do I need a trigger?)

From
"Josh Berkus"
Date:
Frank,

> However, I have
> a problem now
> which seems non-trivial: I am at some point in the tree,
> say 3 nodes
> down from the root, but I don't know where I am exactly
> (across which
> nodes would I travel along the shortest path to the top?)
> and would like
> to find out. This is, again, not really difficult if I
> know how deep
> into the tree I am, in which case I can simply do (I know
> that I am 3
> nodes from the root and that my current node number is
> x):

This is exactly why my model includes a "Level" column.  It
was more important to me to have the easy queriability of
the "redundant" level info than to have the fluid
flexibility of a tree without it.  The choice sorta depends
on what you're storing in the tree.

> (This is probably very expensive if the tree gets really
> deep, but I
> don't expect that to happen in my database anytime soon.)

Not really.  You're querying (hopefully) two indexed fields
within the same table, refrenced to itself.  Once you've run
it a few times, even the elaborate UNION query I posted will
run very quickly - on my table (~300 items) it runs <2
seconds.

> This means
> you need a loop control structure which means you have to
> write a
> PL/pgSQL procedure (or some other procedure) that is run
> by a trigger to
> update the level column on insert or update, as in

> This seems to feasible but not really as straightforward
> as one might
> hope. Is there an easier way?

Hmmm.  I don't know, Frank.  That strikes me as a really
good, straightforward workaround to your problem.  I'm not
sure what you could do that would be simpler.  This is
practically a textbook example of why triggers are necessary
to retain relational integrity.

-Josh Berkus



Re: Tree structure table normalization problem (do I need atrigger?)

From
Tulassay Zsolt
Date:

On Tue, 19 Dec 2000, Frank Joerdens wrote:

> Tulassay Zsolt wrote:
> [ . . . ]
> > The SQL stuff of that nested set structure is fairly easy, I wrote some
> > quick'n'dirty plpgsql functions that will do inserts, updates, deletes
> > from the tree, display level number etc.
> 
> What scared me about it in particular was one scenario where you try to delete a subtree.
> This would normally leave a gap, since the structure is based on the worm's ability to get
> from one node to the next with an increment of just 1. Once you had a subtree deleted,
> you'd either would have to have the worm leap-frog (a leaping frog-worm?!! :)) the gap or
> update an entire half of the tree to close it . . . then my brain started to hurt and I
> gave up.
> 
that's exactly why i wrote a deletion function which updates the nodes'
"left" and "right" values to close the gap.
sure it causes a bit of overhead to update many nodes, but if you
don't delete subtrees all the time, you can live with this.

Zsolt Tulassay




Re: Tree structure table normalization problem (do I need a trigger?)

From
Frank Joerdens
Date:
Josh Berkus wrote:
[ . . . ]
> This is exactly why my model includes a "Level" column.

I looked at your post from a few days ago again; you did indeed explain about the level
column. I missed that somehow and had to reinvent the wheel . . .

> > This means
> > you need a loop control structure which means you have to
> > write a
> > PL/pgSQL procedure (or some other procedure) that is run
> > by a trigger to
> > update the level column on insert or update, as in
> 
> > This seems to feasible but not really as straightforward
> > as one might
> > hope. Is there an easier way?
> 
> Hmmm.  I don't know, Frank.  That strikes me as a really
> good, straightforward workaround to your problem.  I'm not
> sure what you could do that would be simpler.  This is
> practically a textbook example of why triggers are necessary
> to retain relational integrity.

Cool. And I didn't consult a textbook ;). Actually, it's even simpler than I described
above: The function that you run when the trigger fires is plain vanilla sql with a littel
subselect thrown in:

create function update_level(int4)
returns int4
as 'update index set level=(A.level+1) from index as A where A.id = (select parentid from
index where id = $1 ) and index.id = $1; select 1 as ignore_this;'
LANGUAGE 'sql';
. . . i.e. you just get the level from the higher-up node's level plus 1, rather than
walking to the top of the tree and counting the steps. This _doesn't_ work though if you
move an entire subtree within the hierarchy to another level. Then you'd need to have a
function that walks through the entire subtree to update the level column for every single
node . . . hmmm. I'll think about it. I don't think I'll need it for the current project
since I'll only allow the moving around of end nodes.

Cheers,
Frank


Re: Tree structure table normalization problem (do I need a trigger?)

From
Ron Peterson
Date:
Frank Joerdens wrote:
> 
> In a recent thread (How to represent a tree-structure in a relational
> database) I asked how to do a tree structure in SQL, and got lots of
> suggestions (thanks!), of which I chose the one below:
> 
> create table Category (
> CategoryID       int4  not null  primary key,
> ParentCategoryID int4  not null  REFERENCES Category (CategoryID),
> CategoryName     varchar(100)
> );
> 
> The one described in Joe Celko's article (the one with the worm that
> travels along the edge of the tree . . . ) seemed more evolved but
> requires fairly complex SQL stuff, I thought, for simple operations that
> are straighforward with the above model.

SQL99 (which is what SQL3 became) defines a recursive data model that
allows you to implement these types of structures.  IBM's DB2 implements
at least a subset of this standard (this is based on hearsay, not
personal experience).  Oracle also provides some SQL extensions to allow
you to create recursive queries, but they are nonstandard (CONNECT BY,
LEVELS, ...).

I don't find any of the solutions to this problem using SQL92
satisfactory.  Celko's tree structure can require updates to every node
in the tree for operations on a single node.  And once you start writing
procedural code, you're obviating SQL itself.

So for myself, I've basically decided to hold my horses and find other
interesting things to do until the SQL99 standard finds widespread
adoption.  I realize this might not be a satisfactory answer, but if you
can afford to wait, a better solution should be on the way.

-Ron-