Thread: Optimizing the implementation of an optimized tree
Hi, Recently I needed to store a hiearchy of pages on a website in Postgres in a way that would allow for fast extraction of parts of the tree for menu generation and "path" specification (which nodes in the tree make up the path from the root to my position?). This can be accomplished by letting each node in the tree have an l and r value with values determined by traversing the edge of the tree and assigning the value of integer that is incremented at each node visit to l and doing the same thing for r, this time traversing the edge of the tree backwards. The tree then becomes: CREATE TABLE tree ( id serial PRIMARY KEY, parent int REFERENCES tree, l int DEFAULT NULL, r int DEFAULT NULL, ); The parent id strictly not needed, but I chose to include it for convenience. I can easily extract a complete branch like this: SELECT treeNode.id FROM tree treeNode, tree treeParent WHERE treeNode.l BETWEEN treeParent.l AND treeParent.r AND treeParent.id = $1 ORDER BY treeNode.l And the "path" from the root to a particular node like this: SELECT treeNode.id FROM tree treeNode, tree currentNode WHERE currentNode.r BETWEEN treeNode.l AND treeNode.r AND currentNode.id = $1 ORDER BY treeNode.l; Now, in order to avoid having to maintain the values of l and r for each node when the tree is modified I created triggers in PL/PGSQL that handle INSERTs of new nodes into the tree, DELETEs of existing nodes and UPDATEs of a node's position in the tree (i.e. the parant field). The implementation is included as an attachment. I am very interested in comments on the implementation - especially hints on possible optimizations. It seems to me that the PL/PGSQL-definition of each function is parsed at each call. Is this true? Is it possible to avoid this? It is my understanding that Postgres does not yet support triggers that fire on the update of a paricular column. Therefore I have a check in the UPDATE trigger that checks if the parent-field was modified. Is there any was to do this in a nicer manner? Regards, Christian CREATE TABLE tree ( id serial PRIMARY KEY, parent int REFERENCES tree, l int DEFAULT NULL, r int DEFAULT NULL, ); -- Are these a good idea? CREATE INDEX tree_l_idx ON tree (l); CREATE INDEX tree_r_idx ON tree (r); CREATE OR REPLACE FUNCTION handle_tree_insert() RETURNS opaque AS ' DECLARE parentR INTEGER; BEGIN IF NEW.l ISNULL THEN parentR := ( SELECT r FROM tree WHERE id = NEW.parent ); UPDATE tree SET l=l+2 WHERE l >= parentR; UPDATE tree SET r=r+2 WHERE r >= parentR; NEW.l := parentR; NEW.r := parentR + 1; END IF; RETURN NEW; END; ' LANGUAGE PLPGSQL; CREATE TRIGGER on_tree_insert BEFORE INSERT ON tree FOR EACH ROW EXECUTE PROCEDURE handle_tree_insert(); CREATE OR REPLACE FUNCTION handle_tree_delete() RETURNS opaque AS ' DECLARE decr INTEGER; BEGIN -- RAISE NOTICE ''Gets here, ID %'', OLD.id; IF NOT OLD.l ISNULL THEN decr := (((OLD.r - OLD.l - 1) / 2 ) + 1) * 2; -- RAISE NOTICE ''...and executes stuff (decrementing with %)'', decr; UPDATE tree SET parent = NULL, l = NULL, r = NULL WHERE l > OLD.l AND r < OLD.r; DELETE FROM tree WHERE parent ISNULL AND l ISNULL AND r ISNULL; UPDATE tree SET l = l - decr WHERE l > OLD.l; UPDATE tree SET r = r - decr WHERE r > OLD.r; END IF; RETURN OLD; END; ' LANGUAGE PLPGSQL; CREATE TRIGGER on_tree_delete AFTER DELETE ON tree FOR EACH ROW EXECUTE PROCEDURE handle_tree_delete(); CREATE OR REPLACE FUNCTION handle_tree_update() RETURNS opaque AS ' DECLARE parentR INTEGER; span INTEGER; dist INTEGER; BEGIN IF (NOT OLD.parent ISNULL) AND (NOT NEW.parent ISNULL) AND (OLD.parent <> NEW.parent) THEN span := OLD.r - OLD.l + 1; -- Get the R value of the new parent parentR := ( SELECT r FROM tree WHERE id = NEW.parent ); -- Determine the distance dist = parentR - OLD.l; -- Make the gab UPDATE tree SET l = l + span WHERE l >= parentR; UPDATE tree SET r = r + span WHERE r >= parentR; -- Move our branch to the newly created gab -- Decrement nodes after the old position to close the old gab -- If dist is < 0 then our own branch was moved <span> steps - this must be taken into consideration IF (dist > 0) THEN UPDATE tree SET l = l + (dist), r = r + (dist) WHERE l >= OLD.l AND r <= OLD.r; UPDATE tree SET l = l - span WHERE l >= OLD.l; UPDATE tree SET r = r - span WHERE r >= OLD.r; ELSE UPDATE tree SET l = l + (dist - span), r = r + (dist - span) WHERE l >= (OLD.l + span) AND r <= (OLD.r +span); UPDATE tree SET l = l - span WHERE l >= (OLD.l + span); UPDATE tree SET r = r - span WHERE r >= (OLD.r + span); END IF; END IF; RETURN NULL; END; ' LANGUAGE PLPGSQL; CREATE TRIGGER on_tree_update AFTER UPDATE ON tree FOR EACH ROW EXECUTE PROCEDURE handle_tree_update();
Why not try Oleg and Teodor's tree module? http://cddb.sai.msu.su/~megera/postgres/gist/ You have to expend a little effort in implementing it as the README's in Russian :) Still has examples tho. Chris > -----Original Message----- > From: pgsql-sql-owner@postgresql.org > [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Christian Rishoej > Sent: Tuesday, 7 May 2002 9:02 AM > To: pgsql-sql@postgresql.org > Cc: Anders Johannsen > Subject: [SQL] Optimizing the implementation of an optimized tree > > > > > Hi, > > Recently I needed to store a hiearchy of pages on a website in Postgres > in a way that would allow for fast extraction of parts of the tree for > menu generation and "path" specification (which nodes in the tree make > up the path from the root to my position?). > > This can be accomplished by letting each node in the tree have an l and > r value with values determined by traversing the edge of the tree and > assigning the value of integer that is incremented at each node visit to > l and doing the same thing for r, this time traversing the edge of the > tree backwards. > > The tree then becomes: > > CREATE TABLE tree ( > id serial PRIMARY KEY, > parent int REFERENCES tree, > l int DEFAULT NULL, > r int DEFAULT NULL, > ); > > The parent id strictly not needed, but I chose to include it for > convenience. > > I can easily extract a complete branch like this: > > SELECT treeNode.id > FROM tree treeNode, tree treeParent > WHERE treeNode.l BETWEEN treeParent.l AND treeParent.r > AND treeParent.id = $1 > ORDER BY treeNode.l > > And the "path" from the root to a particular node like this: > > SELECT treeNode.id > FROM tree treeNode, tree currentNode > WHERE currentNode.r BETWEEN treeNode.l AND treeNode.r > AND currentNode.id = $1 > ORDER BY treeNode.l; > > Now, in order to avoid having to maintain the values of l and r for each > node when the tree is modified I created triggers in PL/PGSQL that > handle INSERTs of new nodes into the tree, DELETEs of existing nodes and > UPDATEs of a node's position in the tree (i.e. the parant field). > > The implementation is included as an attachment. > > I am very interested in comments on the implementation - especially > hints on possible optimizations. > > It seems to me that the PL/PGSQL-definition of each function is parsed > at each call. Is this true? Is it possible to avoid this? > > It is my understanding that Postgres does not yet support triggers that > fire on the update of a paricular column. Therefore I have a check in > the UPDATE trigger that checks if the parent-field was modified. Is > there any was to do this in a nicer manner? > > Regards, > Christian >
On Tue, 2002-05-07 at 04:00, Christopher Kings-Lynne wrote: > Why not try Oleg and Teodor's tree module? > > http://cddb.sai.msu.su/~megera/postgres/gist/ > > You have to expend a little effort in implementing it as the README's in > Russian :) Still has examples tho. As far as I can tell Oleg and Teotor are doing *balanced* trees, and not trees usable for containing a hiearchy of nodes where each node can have 0-N children. It is my understanding that I cannot rely on the organization of a balanced tree - hence the name "balanced" (nodes are inserted where some algotithm thinks it fits in, not where I want it. Imagine if your directory-hiearchy on your filesystem was a balanced tree...). Does anyone have a clue on how to optimize triggers? Can I use another language than PL/PGSQL? Would C be crazy when I do lots of SQL's and no calculation in the trigger? Christian
On 7 May 2002, Christian Rishoej wrote: > On Tue, 2002-05-07 at 04:00, Christopher Kings-Lynne wrote: > > > Why not try Oleg and Teodor's tree module? > > > > http://cddb.sai.msu.su/~megera/postgres/gist/ > > > > You have to expend a little effort in implementing it as the README's in > > Russian :) Still has examples tho. > > As far as I can tell Oleg and Teotor are doing *balanced* trees, and not > trees usable for containing a hiearchy of nodes where each node can have > 0-N children. > > It is my understanding that I cannot rely on the organization of a > balanced tree - hence the name "balanced" (nodes are inserted where some > algotithm thinks it fits in, not where I want it. Imagine if your > directory-hiearchy on your filesystem was a balanced tree...). Hmm, you decide where to insert new node. It's my fault we don't have english documentation for our module. But it was designed to work with catalog like structure. Here is the link to my original posting: http://fts.postgresql.org/db/mw/msg.html?mid=1352122 Test data (snapshot DMOZ) is available from http://www.sai.msu.su/~megera/postgres/gist/tree/dmoz-full.sql.gz There are functions entree_level(entree) and bitree_level(bitree) which returns next level of the node. To add node: select entree_next(tid) from dmoz where tid <* '1.2.3.*.0' order by tid desc limit 1; but you're free to add node by hand: test=# create table tree ( node bitree ); CREATE test=# insert into tree values ('22.9.15.19.12'); INSERT 41728161 1 est=# select * from tree; node ---------------22.9.15.19.12 (1 row) > > Does anyone have a clue on how to optimize triggers? Can I use another > language than PL/PGSQL? Would C be crazy when I do lots of SQL's and no > calculation in the trigger? > > Christian > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83