Thread: Trees: maintaining pathnames
My existing tree implementation reflects the files contained on disk. The full pathname to a particlar file is obtained from the path to the parent directory. I am now considering putting this information into a field in the table. Attached you will find the pg_dump from my test database (2.4k) if you want to test with this setup and in case what I have pasted below contains an error. Here is the table and the test data: create table tree(id int not null, parent_id int, name text not null, pathname text not null, primary key (id)); insert into tree (id, name, pathname) values (1, 'usr', '/usr'); insert into tree (id, name, parent_id, pathname) values (2, 'ports', 1, '/usr/ports'); insert into tree values (3, 2, 'security', 'test'); select * from tree; test=# select * from tree;id | parent_id | name | pathname ----+-----------+----------+--------------------- 1 | | usr | /usr 2 | 1 | ports | /usr/ports 3| 2 | security | /usr/ports/security (3 rows) The goal is to ensure that pathname always contains the correct value. Here are the functions/triggers which I created in order to attain that goal. This function ensures that the pathname is set correctly when a row is inserted or changed. create or replace function tree_pathname_set() returns opaque as ' DECLARE parent_pathname text; BEGIN RAISE NOTICE \'into tree_pathname_set with %:%:%\', new.id, new.name, new.pathname; select pathname into parent_pathname from tree where id = new.parent_id; if found then new.pathname = parent_pathname || \'/\' || new.name; else new.pathname= \'/\' || new.name; end if; RETURN new; END;' language 'plpgsql';\ create trigger tree_pathname_set before insert or update on tree for each row execute procedure tree_pathname_set(); This function ensures that any childre of a recently modified row are also kept up to date. create or replace function tree_pathname_set_children() returns opaque as 'BEGIN RAISE NOTICE \'into tree_pathname_set_children with %:%:%\', new.id, new.name, new.pathname; update tree set pathname = new.pathname || \'/\' || name where parent_id = new.id; RETURN new; END;' language 'plpgsql'; create trigger tree_pathname_set_children after insert or update on tree for each row execute procedure tree_pathname_set_children(); NOTE: the above is "insert or update" but as I typed this I realize that only update is sufficent. A change to the top level row is shown below: test=# update tree set name = 'dan' where id = 1; NOTICE: into tree_pathname_set with 1:dan:/usr NOTICE: into tree_pathname_set_children with 1:dan:/dan NOTICE: into tree_pathname_set with 2:ports:/dan/ports NOTICE: into tree_pathname_set_children with 2:ports:/dan/ports NOTICE: into tree_pathname_set with 3:security:/dan/ports/security NOTICE: into tree_pathname_set_children with 3:security:/dan/ports/security UPDATE 1 test=# select * from tree;id | parent_id | name | pathname ----+-----------+----------+--------------------- 1 | | dan | /dan 2 | 1 | ports | /dan/ports 3| 2 | security | /dan/ports/security (3 rows) test=# Suggestions, comment, open ridicule, most welcome. thanks.
Dan, > My existing tree implementation reflects the files contained on disk. > The > full pathname to a particlar file is obtained from the path to the > parent > directory. I am now considering putting this information into a > field in > the table. <snip> > Suggestions, comment, open ridicule, most welcome. thanks. This is a fine implementation using the adjacency list model of tree design. However, I think you may find that the string-based tree implementation in /contrib/ltree is more suited to your purposes, and easier to maintain. -Josh Berkus
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 NotDashEscaped: You need GnuPG to verify this message Instead of storing the path in each row, why not let Postgres take care of computing it with a function? Then make a view and you've got the same table, without all the triggers. CREATE TABLE tree (id INTEGER NOT NULL,parent_id INTEGER,"name" TEXT NOT NULL,PRIMARY KEY (id) ); INSERT INTO tree VALUES (1,NULL,''); INSERT INTO tree VALUES (2,1,'usr'); INSERT INTO tree VALUES (3,1,'tmp'); INSERT INTO tree VALUES (4,1,'home'); INSERT INTO tree VALUES (5,4,'greg'); INSERT INTO tree VALUES (6,5,'etc'); CREATE OR REPLACE FUNCTION pathname(INTEGER) RETURNS TEXT AS ' DECLARE mypath TEXT; myname TEXT; myid INTEGER; BEGIN SELECT parent_id,name FROM tree WHERE id=$1 INTO myid,mypath; IF mypath IS NULL THEN RETURN ''No such id\n''; END IF; LOOP SELECT parent_id,name FROM tree WHERE id=myid INTO myid,myname; mypath := ''/'' || mypath; EXIT WHEN myid ISNULL; mypath := myname || mypath; END LOOP; RETURN mypath; END; ' LANGUAGE 'plpgsql'; CREATE VIEW mytree AS SELECT *, PATHNAME(id) AS path FROM tree; SELECT * FROM tree ORDER BY id; id | parent_id | name ----+-----------+------ 1 | | 2 | 1 | usr 3 | 1 | tmp 4 | 1 | home 5 | 4 | greg6 | 5 | etc (6 rows) SELECT * FROM mytree ORDER BY id; id | parent_id | name | path ----+-----------+------+---------------- 1 | | | / 2 | 1 | usr | /usr 3 | 1 | tmp | /tmp4 | 1 | home | /home 5 | 4 | greg | /home/greg 6 | 5 | etc | /home/greg/etc (6 rows) UPDATE tree SET name='users' WHERE id=4; SELECT * FROM mytree ORDER BY id; id | parent_id | name | path ----+-----------+-------+----------------- 1 | | | / 2 | 1 | usr | /usr 3 | 1 | tmp |/tmp 4 | 1 | users | /users 5 | 4 | greg | /users/greg 6 | 5 | etc | /users/greg/etc (6 rows) Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200211172015 -----BEGIN PGP SIGNATURE----- Comment: http://www.turnstep.com/pgp.html iD8DBQE92D9RvJuQZxSWSsgRAn2oAKDyIcrtgB8v1fAMY3B/ITKZ+lBlYgCfXRMe W/xntabEsfuEdseo44cAXbY= =MANm -----END PGP SIGNATURE-----
On 18 Nov 2002 at 1:09, greg@turnstep.com wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > NotDashEscaped: You need GnuPG to verify this message > > > Instead of storing the path in each row, why not let Postgres > take care of computing it with a function? Then make a view > and you've got the same table, without all the triggers. This is how it is now done. I wanted to be able to so this fairly quickly: select * from tree where pathname like '/usr/local/%' in order to get the subtree below a given point. Sorry I didn't mention that before. > > CREATE TABLE tree ( > id INTEGER NOT NULL, > parent_id INTEGER, > "name" TEXT NOT NULL, > PRIMARY KEY (id) > ); > > > INSERT INTO tree VALUES (1,NULL,''); > INSERT INTO tree VALUES (2,1,'usr'); > INSERT INTO tree VALUES (3,1,'tmp'); > INSERT INTO tree VALUES (4,1,'home'); > INSERT INTO tree VALUES (5,4,'greg'); > INSERT INTO tree VALUES (6,5,'etc'); > > CREATE OR REPLACE FUNCTION pathname(INTEGER) > RETURNS TEXT AS > ' > > DECLARE > mypath TEXT; > myname TEXT; > myid INTEGER; > > BEGIN > > SELECT parent_id,name FROM tree WHERE id=$1 INTO myid,mypath; > IF mypath IS NULL THEN > RETURN ''No such id\n''; > END IF; > > LOOP > SELECT parent_id,name FROM tree WHERE id=myid INTO myid,myname; > mypath := ''/'' || mypath; > EXIT WHEN myid IS NULL; > mypath := myname || mypath; > END LOOP; > > RETURN mypath; > > END; > ' LANGUAGE 'plpgsql'; > > CREATE VIEW mytree AS SELECT *, PATHNAME(id) AS path FROM tree; > > SELECT * FROM tree ORDER BY id; > > id | parent_id | name > ----+-----------+------ > 1 | | > 2 | 1 | usr > 3 | 1 | tmp > 4 | 1 | home > 5 | 4 | greg > 6 | 5 | etc > (6 rows) > > SELECT * FROM mytree ORDER BY id; > > id | parent_id | name | path > ----+-----------+------+---------------- > 1 | | | / > 2 | 1 | usr | /usr > 3 | 1 | tmp | /tmp > 4 | 1 | home | /home > 5 | 4 | greg | /home/greg > 6 | 5 | etc | /home/greg/etc > (6 rows) > > UPDATE tree SET name='users' WHERE id=4; > > SELECT * FROM mytree ORDER BY id; > > id | parent_id | name | path > ----+-----------+-------+----------------- > 1 | | | / > 2 | 1 | usr | /usr > 3 | 1 | tmp | /tmp > 4 | 1 | users | /users > 5 | 4 | greg | /users/greg > 6 | 5 | etc | /users/greg/etc > (6 rows) That's good. Thank you. -- Dan Langille : http://www.langille.org/
On 17 Nov 2002 at 14:51, Josh Berkus wrote: > Dan, > > > My existing tree implementation reflects the files contained on disk. > > The > > full pathname to a particlar file is obtained from the path to the > > parent > > directory. I am now considering putting this information into a > > field in > > the table. > <snip> > > Suggestions, comment, open ridicule, most welcome. thanks. > > This is a fine implementation using the adjacency list model of tree > design. However, I think you may find that the string-based tree > implementation in /contrib/ltree is more suited to your purposes, and > easier to maintain. That looks interesting. I have installed that onto a test server and I'm playing around with it.[1] The contrib/ltree project implements a tree via text parsing. Below I show the test data it created. For my usage, I'm not sure I need it. I have implemented the "Adjacency List" tree implementation (that's what I've been told). In short, my tree contains three basic fields: id, name, parent_id. Given that I'm considering adding a new field path_name to the tree, I can't see the ltree package will give me anything more than I can get from like. My main reason for adding path_name was doing queries such as: select * from tree where path_name like '/path/to/parent/%' which will return me all the descendants of a give node (in this case '/path/to/parent/'.[2] I have discussed [offlist] the option of using a secondary table to store the pathname (i.e. a cach table) which would be updated using a loop in the tigger instead of using cascading triggers. I would prefer to keep the pathname in the same table. In my application, I have about 120,000 nodes in the tree. I am using PL/pgSQL quite a lot. Perhaps moving the triggers to C at a later date may provide a speed increase if the tree expands considerably. Also, it is noted that those triggers set the pathname twice, once in the before, and once in the after trigger. I'll try to optimize that for a future "release". ltreetest=# \d List of relationsName | Type | Owner ------+-------+-------test | table | dan (1 row) ltreetest=# select * from test; path -----------------------------------------------TopTop.ScienceTop.Science.AstronomyTop.Science.Astronomy.AstrophysicsTop.Science.Astronomy.CosmologyTop.HobbiesTop.Hobbies.Amateurs_AstronomyTop.CollectionsTop.Collections.PicturesTop.Collections.Pictures.AstronomyTop.Collections.Pictures.Astronomy.StarsTop.Collections.Pictures.Astronomy.GalaxiesTop.Collections.Pictures.Astronomy.Astronauts (13 rows) [1] - For other following on, I had to do the following: - downloaded the 7.2 version of the code from http://www.sai.msu.su/~megera/postgres/gist/ltree/ - installed using gmake not make - grabbed the sample file from http://developer.postgresql.org/cvsweb.cgi/pgsql- server/contrib/ltree/ltreetest.sql [2] - My application involves mirroring a file system (directories and files). FWIW, in this instances, files are not renamed, they are deleted and recreated elsewhere. -- Dan Langille : http://www.langille.org/
Dan Langille wrote: > Given that I'm considering adding a new field path_name to the tree, > I can't see the ltree package will give me anything more than I can > get from like. My main reason for adding path_name was doing queries > such as: > > select * from tree where path_name like '/path/to/parent/%' > > which will return me all the descendants of a give node (in this case > '/path/to/parent/'.[2] FWIW, you could also do this with connectby() in contrib/tablefunc (new in 7.3; see the README for syntax details): test=# SELECT t.* FROM connectby('tree', 'id', 'parent_id', '1', 0, '~') AS c(id int, parent_id int, level int, branch text), tree t WHERE t.id = c.id; id | parent_id | name ----+-----------+-------------------- 1 | | Top 2 | 1 | Science 3 | 2 | Astronomy 4 | 3 | Astrophysics 5 | 3 | Cosmology 6 | 1 | Hobbies 7 | 6 | Amateurs_Astronomy 8 | 1 | Collections 9 | 8 | Pictures 10 | 9 | Astronomy 11 | 10 | Stars 12 | 10 | Galaxies 13| 10 | Astronauts (13 rows) test=# SELECT t.* FROM connectby('tree', 'id', 'parent_id', '6', 0, '~') AS c(id int, parent_id int, level int, branch text), tree t WHERE t.id = c.id; id | parent_id | name ----+-----------+-------------------- 6 | 1 | Hobbies 7 | 6 | Amateurs_Astronomy (2 rows) test=# SELECT t.* FROM connectby('tree', 'id', 'parent_id', '8', 0, '~') AS c(id int, parent_id int, level int, branch text), tree t WHERE t.id = c.id; id | parent_id | name ----+-----------+------------- 8 | 1 | Collections 9 | 8 | Pictures 10 | 9 | Astronomy 11 | 10 | Stars 12 | 10 | Galaxies 13 | 10 | Astronauts You could also do: CREATE OR REPLACE FUNCTION node_id(text) returns int as 'select id from tree where name = $1' language 'sql'; test=# SELECT t.* FROM connectby('tree', 'id', 'parent_id', node_id('Science'), 0) AS c(id int, parent_id int, level int), tree t WHERE t.id = c.id; id | parent_id | name ----+-----------+-------------- 2 | 1 | Science 3 | 2 | Astronomy 4 | 3 | Astrophysics 5 | 3 | Cosmology (4 rows) > > I have discussed [offlist] the option of using a secondary table to > store the pathname (i.e. a cach table) which would be updated using a > loop in the tigger instead of using cascading triggers. I would > prefer to keep the pathname in the same table. > > In my application, I have about 120,000 nodes in the tree. I am > using PL/pgSQL quite a lot. Perhaps moving the triggers to C at a > later date may provide a speed increase if the tree expands > considerably. I've tested connectby() on a table with about 220,000 nodes. It is pretty fast (about 1 sec to return a branch with 3500 nodes), and is entirely dynamic (requires no triggers). Joe
On 20 Nov 2002 at 15:20, Dan Langille wrote: > On 17 Nov 2002 at 14:51, Josh Berkus wrote: > > > Dan, > > > > > My existing tree implementation reflects the files contained on > > > disk. > > > The > > > full pathname to a particlar file is obtained from the path to the > > > parent directory. I am now considering putting this information > > > into a field in the table. > > <snip> > > > Suggestions, comment, open ridicule, most welcome. thanks. > > > > This is a fine implementation using the adjacency list model of tree > > design. However, I think you may find that the string-based tree > > implementation in /contrib/ltree is more suited to your purposes, > > and easier to maintain. > > That looks interesting. I have installed that onto a test server and > I'm playing around with it. FWIW, the ltree seems to implement a tree through text manipulation. I already have a tree (using a sinble table with id, parent_id). Therefore, I think ltree is not an option in this situation. My creation of the pathname was to save processing time. I'll talk more about that in my next post. -- Dan Langille : http://www.langille.org/
On 17 Nov 2002 at 11:39, Dan Langille wrote: > My existing tree implementation reflects the files contained on disk. > The full pathname to a particlar file is obtained from the path to the > parent directory. I am now considering putting this information into > a field in the table. > > Attached you will find the pg_dump from my test database (2.4k) if you > want to test with this setup and in case what I have pasted below > contains an error. > > Here is the table and the test data: > > create table tree(id int not null, parent_id int, name text not null, > pathname text not null, primary key (id)); > > insert into tree (id, name, pathname) values (1, 'usr', '/usr'); > insert into tree (id, name, parent_id, pathname) values (2, 'ports', > 1, '/usr/ports'); insert into tree values (3, 2, 'security', 'test'); > > select * from tree; > > test=# select * from tree; > id | parent_id | name | pathname > ----+-----------+----------+--------------------- > 1 | | usr | /usr > 2 | 1 | ports | /usr/ports > 3 | 2 | security | /usr/ports/security > (3 rows) > > > The goal is to ensure that pathname always contains the correct value. I am now trying another method, which involves the use of a cache table. In short, we store the pathname in another table. create table tree_pathnames ( id int4 not null, pathname text not null, primary key(id), foreign key (id) referencestree(id) on delete cascade on update cascade ); I populated this table with the following: insert into tree_pathnames select id, pathname from tree; My next task was to create a function which would cascade a change to tree.name throughout tree_pathname. Here is what I came up with: create or replace function tree_pathname_set_children(int4, text) returns int as 'DECLARE node ALIAS for $1;path ALIAS for $2;children record; BEGIN FOR children IN SELECT ep.id, ep.pathname, e.name FROM element_pathnames ep, element e WHERE ep.id = e.id AND e.parent_id = node LOOP -- children.pathname = path || ''/'' || children.name; RAISE NOTICE ''in tree_pathname_set_children %/%'',path, children.name ; UPDATE element_pathnames set pathname = path || ''/'' || children.name where id = children.id; perform tree_pathname_set_children(children.id, path || ''/'' || children.name); END LOOP; return 0;END;' language 'plpgsql'; This function is invoked from within the trigger on tree: create or replace function tree_pathnames() returns opaque as ' DECLARE parent_pathname text; my_pathname text; BEGIN if old.name <> new.name then select pathname into parent_pathname from tree_pathnames where id = new.parent_id; if found then my_pathname = parent_pathname || \'/\' ||new.name; else my_pathname = \'/\' || new.name; end if; new.pathname = my_pathname; update tree_pathnames set pathname = my_pathname where id = new.id; perform tree_pathname_set_children(new.id,my_pathname); end if; RETURN new; END;' language 'plpgsql'; drop trigger tree_pathnames on element; create trigger tree_pathnames before update on element for each row execute procedure tree_pathnames(); I have done only preliminary testing on this, but it seems to work fine for my application. Comments please. -- Dan Langille : http://www.langille.org/
Dan, Looks good to me. It's the same thing I do for the Celko tree structures in one application -- I have a cache table holding such things as level and parent_id for each node, values which can only be generated from the source tables through slow aggregates. Also, the use of a child table to hold the pathnames should cure your "cascading trigger" problem. -Josh Berkus