Thread: Tree structure table normalization problem (do I need a trigger?)
In a recent thread (How to represent a tree-structure in a relational database) I asked how to do a tree structure in SQL, and got lots of suggestions (thanks!), of which I chose the one below: create table Category ( CategoryID int4 not null primary key, ParentCategoryID int4 not null REFERENCES Category (CategoryID), CategoryName varchar(100) ); The one described in Joe Celko's article (the one with the worm that travels along the edge of the tree . . . ) seemed more evolved but requires fairly complex SQL stuff, I thought, for simple operations that are straighforward with the above model. However, I have a problem now which seems non-trivial: I am at some point in the tree, say 3 nodes down from the root, but I don't know where I am exactly (across which nodes would I travel along the shortest path to the top?) and would like to find out. This is, again, not really difficult if I know how deep into the tree I am, in which case I can simply do (I know that I am 3 nodes from the root and that my current node number is x): SELECT A1.CategoryID, A2.CategoryID, A3.CategoryID FROM Category AS A1, Category AS A2, Category AS A3 WHERE A3.CategoryID=x AND A3.ParentCategoryID=A2.CategoryID AND A2.ParentCategoryID=A1.CategoryID; (This is probably very expensive if the tree gets really deep, but I don't expect that to happen in my database anytime soon.) So I introduced a 'level' column into the above schema, where I store the information about how deep I am into the tree to have it readily available when I want to compute the path to the top. Unfortunately, this is dangerous because the information is already implicit in the original schema. The problem is that the only way I can see that you would get at the information is by walking up the tree step-by-step until you get a zero value (which is assigned to the root). This means you need a loop control structure which means you have to write a PL/pgSQL procedure (or some other procedure) that is run by a trigger to update the level column on insert or update, as in WHILE (CategoryID is not null) LOOP Run SQL statement to get the next higher-up node's CategoryID and increment a counter. END LOOP; Return counter and insert value into level column. This seems to feasible but not really as straightforward as one might hope. Is there an easier way? - Frank
On Tue, 19 Dec 2000, Frank Joerdens wrote: > In a recent thread (How to represent a tree-structure in a relational > database) I asked how to do a tree structure in SQL, and got lots of > suggestions (thanks!), of which I chose the one below: > > create table Category ( > CategoryID int4 not null primary key, > ParentCategoryID int4 not null REFERENCES Category (CategoryID), > CategoryName varchar(100) > ); > > The one described in Joe Celko's article (the one with the worm that > travels along the edge of the tree . . . ) seemed more evolved but > requires fairly complex SQL stuff, I thought, for simple operations that > are straighforward with the above model. However, I have a problem now > which seems non-trivial: I am at some point in the tree, say 3 nodes > down from the root, but I don't know where I am exactly (across which > nodes would I travel along the shortest path to the top?) and would like > to find out. This is, again, not really difficult if I know how deep > into the tree I am, in which case I can simply do (I know that I am 3 > nodes from the root and that my current node number is x): That's exactly what the 'worm' stuff is all about... If you use the structure above, you have to use recursion in order to find out the path, which is AFAIK missing from SQL (not counting some DB2 feature which allows recursion, in a strange way, but we're talking Postgres here). This solution also does not require any client-side programming, since you only have to use one single statement, regardless of which level you are on in the tree. The SQL stuff of that nested set structure is fairly easy, I wrote some quick'n'dirty plpgsql functions that will do inserts, updates, deletes from the tree, display level number etc. I can send it to you if you like (please allow a few days since I have several exams at the university this week). Zsolt Tulassay > > SELECT A1.CategoryID, A2.CategoryID, A3.CategoryID FROM Category AS A1, > Category AS A2, Category AS A3 WHERE A3.CategoryID=x AND > A3.ParentCategoryID=A2.CategoryID AND A2.ParentCategoryID=A1.CategoryID; > > (This is probably very expensive if the tree gets really deep, but I > don't expect that to happen in my database anytime soon.) > > So I introduced a 'level' column into the above schema, where I store > the information about how deep I am into the tree to have it readily > available when I want to compute the path to the top. Unfortunately, > this is dangerous because the information is already implicit in the > original schema. The problem is that the only way I can see that you > would get at the information is by walking up the tree step-by-step > until you get a zero value (which is assigned to the root). This means > you need a loop control structure which means you have to write a > PL/pgSQL procedure (or some other procedure) that is run by a trigger to > update the level column on insert or update, as in > > WHILE (CategoryID is not null) LOOP > Run SQL statement to get the next higher-up node's CategoryID and > increment a counter. > END LOOP; > Return counter and insert value into level column. > > This seems to feasible but not really as straightforward as one might > hope. Is there an easier way? > > - Frank >
Tulassay Zsolt wrote: [ . . . ] > The SQL stuff of that nested set structure is fairly easy, I wrote some > quick'n'dirty plpgsql functions that will do inserts, updates, deletes > from the tree, display level number etc. What scared me about it in particular was one scenario where you try to delete a subtree. This would normally leave a gap, since the structure is based on the worm's ability to get from one node to the next with an increment of just 1. Once you had a subtree deleted, you'd either would have to have the worm leap-frog (a leaping frog-worm?!! :)) the gap or update an entire half of the tree to close it . . . then my brain started to hurt and I gave up. > I can send it to you if you like (please allow a few days since I > have several exams at the university this week). Sure, I'd like to have a look at it! Thanks, Frank
On Tue, 19 Dec 2000, Frank Joerdens wrote: > Tulassay Zsolt wrote: > [ . . . ] > > I can send it to you if you like (please allow a few days since I > > have several exams at the university this week). > > Sure, I'd like to have a look at it! I'd like to have a look at it as well, please. Cheers! (Relax...have a homebrew) Neil
Frank, > However, I have > a problem now > which seems non-trivial: I am at some point in the tree, > say 3 nodes > down from the root, but I don't know where I am exactly > (across which > nodes would I travel along the shortest path to the top?) > and would like > to find out. This is, again, not really difficult if I > know how deep > into the tree I am, in which case I can simply do (I know > that I am 3 > nodes from the root and that my current node number is > x): This is exactly why my model includes a "Level" column. It was more important to me to have the easy queriability of the "redundant" level info than to have the fluid flexibility of a tree without it. The choice sorta depends on what you're storing in the tree. > (This is probably very expensive if the tree gets really > deep, but I > don't expect that to happen in my database anytime soon.) Not really. You're querying (hopefully) two indexed fields within the same table, refrenced to itself. Once you've run it a few times, even the elaborate UNION query I posted will run very quickly - on my table (~300 items) it runs <2 seconds. > This means > you need a loop control structure which means you have to > write a > PL/pgSQL procedure (or some other procedure) that is run > by a trigger to > update the level column on insert or update, as in > This seems to feasible but not really as straightforward > as one might > hope. Is there an easier way? Hmmm. I don't know, Frank. That strikes me as a really good, straightforward workaround to your problem. I'm not sure what you could do that would be simpler. This is practically a textbook example of why triggers are necessary to retain relational integrity. -Josh Berkus
On Tue, 19 Dec 2000, Frank Joerdens wrote: > Tulassay Zsolt wrote: > [ . . . ] > > The SQL stuff of that nested set structure is fairly easy, I wrote some > > quick'n'dirty plpgsql functions that will do inserts, updates, deletes > > from the tree, display level number etc. > > What scared me about it in particular was one scenario where you try to delete a subtree. > This would normally leave a gap, since the structure is based on the worm's ability to get > from one node to the next with an increment of just 1. Once you had a subtree deleted, > you'd either would have to have the worm leap-frog (a leaping frog-worm?!! :)) the gap or > update an entire half of the tree to close it . . . then my brain started to hurt and I > gave up. > that's exactly why i wrote a deletion function which updates the nodes' "left" and "right" values to close the gap. sure it causes a bit of overhead to update many nodes, but if you don't delete subtrees all the time, you can live with this. Zsolt Tulassay
Josh Berkus wrote: [ . . . ] > This is exactly why my model includes a "Level" column. I looked at your post from a few days ago again; you did indeed explain about the level column. I missed that somehow and had to reinvent the wheel . . . > > This means > > you need a loop control structure which means you have to > > write a > > PL/pgSQL procedure (or some other procedure) that is run > > by a trigger to > > update the level column on insert or update, as in > > > This seems to feasible but not really as straightforward > > as one might > > hope. Is there an easier way? > > Hmmm. I don't know, Frank. That strikes me as a really > good, straightforward workaround to your problem. I'm not > sure what you could do that would be simpler. This is > practically a textbook example of why triggers are necessary > to retain relational integrity. Cool. And I didn't consult a textbook ;). Actually, it's even simpler than I described above: The function that you run when the trigger fires is plain vanilla sql with a littel subselect thrown in: create function update_level(int4) returns int4 as 'update index set level=(A.level+1) from index as A where A.id = (select parentid from index where id = $1 ) and index.id = $1; select 1 as ignore_this;' LANGUAGE 'sql'; . . . i.e. you just get the level from the higher-up node's level plus 1, rather than walking to the top of the tree and counting the steps. This _doesn't_ work though if you move an entire subtree within the hierarchy to another level. Then you'd need to have a function that walks through the entire subtree to update the level column for every single node . . . hmmm. I'll think about it. I don't think I'll need it for the current project since I'll only allow the moving around of end nodes. Cheers, Frank
Frank Joerdens wrote: > > In a recent thread (How to represent a tree-structure in a relational > database) I asked how to do a tree structure in SQL, and got lots of > suggestions (thanks!), of which I chose the one below: > > create table Category ( > CategoryID int4 not null primary key, > ParentCategoryID int4 not null REFERENCES Category (CategoryID), > CategoryName varchar(100) > ); > > The one described in Joe Celko's article (the one with the worm that > travels along the edge of the tree . . . ) seemed more evolved but > requires fairly complex SQL stuff, I thought, for simple operations that > are straighforward with the above model. SQL99 (which is what SQL3 became) defines a recursive data model that allows you to implement these types of structures. IBM's DB2 implements at least a subset of this standard (this is based on hearsay, not personal experience). Oracle also provides some SQL extensions to allow you to create recursive queries, but they are nonstandard (CONNECT BY, LEVELS, ...). I don't find any of the solutions to this problem using SQL92 satisfactory. Celko's tree structure can require updates to every node in the tree for operations on a single node. And once you start writing procedural code, you're obviating SQL itself. So for myself, I've basically decided to hold my horses and find other interesting things to do until the SQL99 standard finds widespread adoption. I realize this might not be a satisfactory answer, but if you can afford to wait, a better solution should be on the way. -Ron-