Thread: Trees: maintaining pathnames

Trees: maintaining pathnames

From
Dan Langille
Date:
My existing tree implementation reflects the files contained on disk.  The
full pathname to a particlar file is obtained from the path to the parent
directory.  I am now considering putting this information into a field in
the table.

Attached you will find the pg_dump from my test database (2.4k) if you
want to test with this setup and in case what I have pasted below
contains an error.

Here is the table and the test data:

create table tree(id int not null, parent_id int, name text not null,
pathname text not null, primary key (id));

insert into tree (id, name, pathname) values (1, 'usr', '/usr');
insert into tree (id, name, parent_id, pathname) values (2, 'ports', 1,
'/usr/ports');
insert into tree values (3, 2, 'security', 'test');

select * from tree;

test=# select * from tree;id | parent_id |   name   |      pathname
----+-----------+----------+--------------------- 1 |           | usr      | /usr 2 |         1 | ports    | /usr/ports
3|         2 | security | /usr/ports/security
 
(3 rows)


The goal is to ensure that pathname always contains the correct value.
Here are the functions/triggers which I created in order to attain that
goal.

This function ensures that the pathname is set correctly when a row is
inserted or changed.

create or replace function tree_pathname_set()
returns opaque
as '

DECLARE      parent_pathname   text;
BEGIN       RAISE NOTICE \'into tree_pathname_set with %:%:%\', new.id,
new.name, new.pathname;       select pathname       into parent_pathname       from tree       where id =
new.parent_id;      if found then          new.pathname = parent_pathname || \'/\' || new.name;       else
new.pathname= \'/\' || new.name;       end if;       RETURN new;   END;'
 
language 'plpgsql';\

create trigger tree_pathname_set before insert or update on tree
for each row execute procedure tree_pathname_set();


This function ensures that any childre of a recently modified row are also
kept up to date.

create or replace function tree_pathname_set_children()
returns opaque
as 'BEGIN       RAISE NOTICE \'into tree_pathname_set_children with %:%:%\',
new.id, new.name, new.pathname;
       update tree set pathname = new.pathname || \'/\' || name where
parent_id = new.id;
       RETURN new;   END;'
language 'plpgsql';

create trigger tree_pathname_set_children after insert or update on tree
for each row execute procedure tree_pathname_set_children();

NOTE: the above is "insert or update" but as I typed this I realize that
only update is sufficent.

A change to the top level row is shown below:

test=# update tree set name = 'dan' where id = 1;
NOTICE:  into tree_pathname_set with 1:dan:/usr
NOTICE:  into tree_pathname_set_children with 1:dan:/dan
NOTICE:  into tree_pathname_set with 2:ports:/dan/ports
NOTICE:  into tree_pathname_set_children with 2:ports:/dan/ports
NOTICE:  into tree_pathname_set with 3:security:/dan/ports/security
NOTICE:  into tree_pathname_set_children with
3:security:/dan/ports/security
UPDATE 1
test=# select * from tree;id | parent_id |   name   |      pathname
----+-----------+----------+--------------------- 1 |           | dan      | /dan 2 |         1 | ports    | /dan/ports
3|         2 | security | /dan/ports/security
 
(3 rows)

test=#

Suggestions, comment, open ridicule, most welcome.  thanks.

Re: Trees: maintaining pathnames

From
"Josh Berkus"
Date:
Dan,

> My existing tree implementation reflects the files contained on disk.
>  The
> full pathname to a particlar file is obtained from the path to the
> parent
> directory.  I am now considering putting this information into a
> field in
> the table.
<snip>
> Suggestions, comment, open ridicule, most welcome.  thanks.

This is a fine implementation using the adjacency list model of tree
design.  However, I think you may find that the string-based tree
implementation in /contrib/ltree is more suited to your purposes, and
easier to maintain.

-Josh Berkus


Re: Trees: maintaining pathnames

From
greg@turnstep.com
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message


Instead of storing the path in each row, why not let Postgres 
take care of computing it with a function? Then make a view 
and you've got the same table, without all the triggers.

CREATE TABLE tree (id        INTEGER NOT NULL,parent_id INTEGER,"name"    TEXT NOT NULL,PRIMARY KEY (id)
);


INSERT INTO tree VALUES (1,NULL,'');
INSERT INTO tree VALUES (2,1,'usr');
INSERT INTO tree VALUES (3,1,'tmp');
INSERT INTO tree VALUES (4,1,'home');
INSERT INTO tree VALUES (5,4,'greg');
INSERT INTO tree VALUES (6,5,'etc');

CREATE OR REPLACE FUNCTION pathname(INTEGER)
RETURNS TEXT AS
'

DECLARE  mypath TEXT; myname TEXT; myid   INTEGER;

BEGIN
 SELECT parent_id,name FROM tree WHERE id=$1 INTO myid,mypath; IF mypath IS NULL THEN   RETURN ''No such id\n''; END
IF;
 LOOP   SELECT parent_id,name FROM tree WHERE id=myid INTO myid,myname;   mypath := ''/'' || mypath;   EXIT WHEN myid
ISNULL;   mypath := myname || mypath; END LOOP;
 

RETURN mypath;

END;
' LANGUAGE 'plpgsql';

CREATE VIEW mytree AS SELECT *, PATHNAME(id) AS path FROM tree;

SELECT * FROM tree ORDER BY id;
id | parent_id | name 
----+-----------+------ 1 |           |  2 |         1 | usr 3 |         1 | tmp 4 |         1 | home 5 |         4 |
greg6 |         5 | etc
 
(6 rows)

SELECT * FROM mytree ORDER BY id;
id | parent_id | name |      path      
----+-----------+------+---------------- 1 |           |      | / 2 |         1 | usr  | /usr 3 |         1 | tmp  |
/tmp4 |         1 | home | /home 5 |         4 | greg | /home/greg 6 |         5 | etc  | /home/greg/etc
 
(6 rows)

UPDATE tree SET name='users' WHERE id=4;

SELECT * FROM mytree ORDER BY id;
id | parent_id | name  |      path       
----+-----------+-------+----------------- 1 |           |       | / 2 |         1 | usr   | /usr 3 |         1 | tmp
|/tmp 4 |         1 | users | /users 5 |         4 | greg  | /users/greg 6 |         5 | etc   | /users/greg/etc
 
(6 rows)


Greg Sabino Mullane  greg@turnstep.com
PGP Key: 0x14964AC8 200211172015

-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE92D9RvJuQZxSWSsgRAn2oAKDyIcrtgB8v1fAMY3B/ITKZ+lBlYgCfXRMe
W/xntabEsfuEdseo44cAXbY=
=MANm
-----END PGP SIGNATURE-----




Re: Trees: maintaining pathnames

From
"Dan Langille"
Date:
On 18 Nov 2002 at 1:09, greg@turnstep.com wrote:

> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> NotDashEscaped: You need GnuPG to verify this message
> 
> 
> Instead of storing the path in each row, why not let Postgres 
> take care of computing it with a function? Then make a view 
> and you've got the same table, without all the triggers.

This is how it is now done.  I wanted to be able to so this fairly 
quickly:
  select * from tree where pathname like '/usr/local/%'

in order to get the subtree below a given point.  Sorry I didn't 
mention that before.

> 
> CREATE TABLE tree (
>  id        INTEGER NOT NULL,
>  parent_id INTEGER,
>  "name"    TEXT NOT NULL,
>  PRIMARY KEY (id)
> );
> 
> 
> INSERT INTO tree VALUES (1,NULL,'');
> INSERT INTO tree VALUES (2,1,'usr');
> INSERT INTO tree VALUES (3,1,'tmp');
> INSERT INTO tree VALUES (4,1,'home');
> INSERT INTO tree VALUES (5,4,'greg');
> INSERT INTO tree VALUES (6,5,'etc');
> 
> CREATE OR REPLACE FUNCTION pathname(INTEGER)
> RETURNS TEXT AS
> '
> 
> DECLARE 
>   mypath TEXT;
>   myname TEXT;
>   myid   INTEGER;
> 
> BEGIN
> 
>   SELECT parent_id,name FROM tree WHERE id=$1 INTO myid,mypath;
>   IF mypath IS NULL THEN
>     RETURN ''No such id\n'';
>   END IF;
> 
>   LOOP
>     SELECT parent_id,name FROM tree WHERE id=myid INTO myid,myname;
>     mypath := ''/'' || mypath;
>     EXIT WHEN myid IS NULL;
>     mypath := myname || mypath;
>   END LOOP;
> 
> RETURN mypath;
> 
> END;
> ' LANGUAGE 'plpgsql';
> 
> CREATE VIEW mytree AS SELECT *, PATHNAME(id) AS path FROM tree;
> 
> SELECT * FROM tree ORDER BY id;
> 
>  id | parent_id | name 
> ----+-----------+------
>   1 |           | 
>   2 |         1 | usr
>   3 |         1 | tmp
>   4 |         1 | home
>   5 |         4 | greg
>   6 |         5 | etc
> (6 rows)
> 
> SELECT * FROM mytree ORDER BY id;
> 
>  id | parent_id | name |      path      
> ----+-----------+------+----------------
>   1 |           |      | /
>   2 |         1 | usr  | /usr
>   3 |         1 | tmp  | /tmp
>   4 |         1 | home | /home
>   5 |         4 | greg | /home/greg
>   6 |         5 | etc  | /home/greg/etc
> (6 rows)
> 
> UPDATE tree SET name='users' WHERE id=4;
> 
> SELECT * FROM mytree ORDER BY id;
> 
>  id | parent_id | name  |      path       
> ----+-----------+-------+-----------------
>   1 |           |       | /
>   2 |         1 | usr   | /usr
>   3 |         1 | tmp   | /tmp
>   4 |         1 | users | /users
>   5 |         4 | greg  | /users/greg
>   6 |         5 | etc   | /users/greg/etc
> (6 rows)

That's good.  Thank you.
-- 
Dan Langille : http://www.langille.org/



Re: Trees: maintaining pathnames

From
"Dan Langille"
Date:
On 17 Nov 2002 at 14:51, Josh Berkus wrote:

> Dan,
> 
> > My existing tree implementation reflects the files contained on disk.
> >  The
> > full pathname to a particlar file is obtained from the path to the
> > parent
> > directory.  I am now considering putting this information into a
> > field in
> > the table.
> <snip>
> > Suggestions, comment, open ridicule, most welcome.  thanks.
> 
> This is a fine implementation using the adjacency list model of tree
> design.  However, I think you may find that the string-based tree
> implementation in /contrib/ltree is more suited to your purposes, and
> easier to maintain.

That looks interesting.  I have installed that onto a test server and 
I'm playing around with it.[1]  The contrib/ltree project implements 
a tree via text parsing.  Below I show the test data it created.

For my usage, I'm not sure I need it.  I have implemented the 
"Adjacency List" tree implementation (that's what I've been told).  
In short, my tree contains three basic fields: id, name, parent_id.

Given that I'm considering adding a new field path_name to the tree, 
I can't see the ltree package will give me anything more than I can 
get from like. My main reason for adding path_name was doing queries 
such as:
  select * from tree where path_name like '/path/to/parent/%'

which will return me all the descendants of a give node (in this case 
'/path/to/parent/'.[2]

I have discussed [offlist] the option of using a secondary table to 
store the pathname (i.e. a cach table) which would be updated using a 
loop in the tigger instead of using cascading triggers.  I would 
prefer to keep the pathname in the same table.

In my application, I have about 120,000 nodes in the tree.  I am 
using PL/pgSQL quite a lot.  Perhaps moving the triggers to C at a 
later date may provide a speed increase if the tree expands 
considerably.

Also, it is noted that those triggers set the pathname twice, once in 
the before, and once in the after trigger.  I'll try to optimize that 
for a future "release".

ltreetest=# \d List of relationsName | Type  | Owner
------+-------+-------test | table | dan
(1 row)

ltreetest=# select * from test;                    path

-----------------------------------------------TopTop.ScienceTop.Science.AstronomyTop.Science.Astronomy.AstrophysicsTop.Science.Astronomy.CosmologyTop.HobbiesTop.Hobbies.Amateurs_AstronomyTop.CollectionsTop.Collections.PicturesTop.Collections.Pictures.AstronomyTop.Collections.Pictures.Astronomy.StarsTop.Collections.Pictures.Astronomy.GalaxiesTop.Collections.Pictures.Astronomy.Astronauts
(13 rows)



[1] - For other following on, I had to do the following:

- downloaded the 7.2 version of the code from 
http://www.sai.msu.su/~megera/postgres/gist/ltree/

- installed using gmake not make
- grabbed the sample file from 
http://developer.postgresql.org/cvsweb.cgi/pgsql-
server/contrib/ltree/ltreetest.sql

[2] - My application involves mirroring a file system (directories 
and files).  FWIW, in this instances, files are not renamed, they are 
deleted and recreated elsewhere.
-- 
Dan Langille : http://www.langille.org/



Re: Trees: maintaining pathnames

From
Joe Conway
Date:
Dan Langille wrote:
> Given that I'm considering adding a new field path_name to the tree, 
> I can't see the ltree package will give me anything more than I can 
> get from like. My main reason for adding path_name was doing queries 
> such as:
> 
>    select * from tree where path_name like '/path/to/parent/%'
> 
> which will return me all the descendants of a give node (in this case 
> '/path/to/parent/'.[2]

FWIW, you could also do this with connectby() in contrib/tablefunc (new in 
7.3; see the README for syntax details):

test=# SELECT t.* FROM connectby('tree', 'id', 'parent_id', '1', 0, '~') AS 
c(id int, parent_id int, level int, branch text), tree t WHERE t.id = c.id; id | parent_id |        name
----+-----------+--------------------  1 |           | Top  2 |         1 | Science  3 |         2 | Astronomy  4 |
   3 | Astrophysics  5 |         3 | Cosmology  6 |         1 | Hobbies  7 |         6 | Amateurs_Astronomy  8 |
1 | Collections  9 |         8 | Pictures 10 |         9 | Astronomy 11 |        10 | Stars 12 |        10 | Galaxies
13|        10 | Astronauts
 
(13 rows)

test=# SELECT t.* FROM connectby('tree', 'id', 'parent_id', '6', 0, '~') AS 
c(id int, parent_id int, level int, branch text), tree t WHERE t.id = c.id; id | parent_id |        name
----+-----------+--------------------  6 |         1 | Hobbies  7 |         6 | Amateurs_Astronomy
(2 rows)

test=# SELECT t.* FROM connectby('tree', 'id', 'parent_id', '8', 0, '~') AS 
c(id int, parent_id int, level int, branch text), tree t WHERE t.id = c.id; id | parent_id |    name
----+-----------+-------------  8 |         1 | Collections  9 |         8 | Pictures 10 |         9 | Astronomy 11 |
    10 | Stars 12 |        10 | Galaxies 13 |        10 | Astronauts
 


You could also do:

CREATE OR REPLACE FUNCTION node_id(text) returns int as 'select id from tree 
where name = $1' language 'sql';

test=# SELECT t.* FROM connectby('tree', 'id', 'parent_id', 
node_id('Science'), 0) AS c(id int, parent_id int, level int), tree t WHERE 
t.id = c.id; id | parent_id |     name
----+-----------+--------------  2 |         1 | Science  3 |         2 | Astronomy  4 |         3 | Astrophysics  5 |
      3 | Cosmology
 
(4 rows)


> 
> I have discussed [offlist] the option of using a secondary table to 
> store the pathname (i.e. a cach table) which would be updated using a 
> loop in the tigger instead of using cascading triggers.  I would 
> prefer to keep the pathname in the same table.
> 
> In my application, I have about 120,000 nodes in the tree.  I am 
> using PL/pgSQL quite a lot.  Perhaps moving the triggers to C at a 
> later date may provide a speed increase if the tree expands 
> considerably.

I've tested connectby() on a table with about 220,000 nodes. It is pretty fast 
(about 1 sec to return a branch with 3500 nodes), and is entirely dynamic 
(requires no triggers).

Joe



Re: Trees: maintaining pathnames

From
"Dan Langille"
Date:
On 20 Nov 2002 at 15:20, Dan Langille wrote:

> On 17 Nov 2002 at 14:51, Josh Berkus wrote:
> 
> > Dan,
> > 
> > > My existing tree implementation reflects the files contained on
> > > disk.
> > >  The
> > > full pathname to a particlar file is obtained from the path to the
> > > parent directory.  I am now considering putting this information
> > > into a field in the table.
> > <snip>
> > > Suggestions, comment, open ridicule, most welcome.  thanks.
> > 
> > This is a fine implementation using the adjacency list model of tree
> > design.  However, I think you may find that the string-based tree
> > implementation in /contrib/ltree is more suited to your purposes,
> > and easier to maintain.
> 
> That looks interesting.  I have installed that onto a test server and
> I'm playing around with it.

FWIW, the ltree seems to implement a tree through text manipulation.  
I already have a tree (using a sinble table with id, parent_id).  
Therefore, I think ltree is not an option in this situation.

My creation of the pathname was to save processing time.  I'll talk 
more about that in my next post.
-- 
Dan Langille : http://www.langille.org/



Re: Trees: maintaining pathnames

From
"Dan Langille"
Date:
On 17 Nov 2002 at 11:39, Dan Langille wrote:

> My existing tree implementation reflects the files contained on disk. 
> The full pathname to a particlar file is obtained from the path to the
> parent directory.  I am now considering putting this information into
> a field in the table.
> 
> Attached you will find the pg_dump from my test database (2.4k) if you
> want to test with this setup and in case what I have pasted below
> contains an error.
> 
> Here is the table and the test data:
> 
> create table tree(id int not null, parent_id int, name text not null,
> pathname text not null, primary key (id));
> 
> insert into tree (id, name, pathname) values (1, 'usr', '/usr');
> insert into tree (id, name, parent_id, pathname) values (2, 'ports',
> 1, '/usr/ports'); insert into tree values (3, 2, 'security', 'test');
> 
> select * from tree;
> 
> test=# select * from tree;
>  id | parent_id |   name   |      pathname
> ----+-----------+----------+---------------------
>   1 |           | usr      | /usr
>   2 |         1 | ports    | /usr/ports
>   3 |         2 | security | /usr/ports/security
> (3 rows)
> 
> 
> The goal is to ensure that pathname always contains the correct value.

I am now trying another method, which involves the use of a cache 
table.  In short, we store the pathname in another table.

create table tree_pathnames (   id int4 not null,   pathname text not null,   primary key(id),   foreign key (id)
referencestree(id)    on delete cascade on update cascade
 
);

I populated this table with the following:
  insert into tree_pathnames select id, pathname from tree;

My next task was to create a function which would cascade a change to 
tree.name throughout tree_pathname.  Here is what I came up with:

create or replace function tree_pathname_set_children(int4, text) 
returns int as 
'DECLARE
node ALIAS for $1;path ALIAS for $2;children record;
BEGIN    FOR children IN SELECT ep.id, ep.pathname, e.name                      FROM element_pathnames ep, element e
                WHERE ep.id       = e.id                       AND e.parent_id = node LOOP
 
--           children.pathname = path ||  ''/'' || children.name;        RAISE NOTICE ''in tree_pathname_set_children
%/%'',path, 
 
children.name ;       UPDATE element_pathnames set pathname = path ||  ''/'' || 
children.name where id = children.id;       perform tree_pathname_set_children(children.id, path ||  ''/'' 
|| children.name);    END LOOP;
    return 0;END;'

language 'plpgsql';

This function is invoked from within the trigger on tree:

create or replace function tree_pathnames() returns opaque as '  DECLARE     parent_pathname   text;     my_pathname
  text;  BEGIN     if old.name <> new.name then        select pathname          into parent_pathname          from
tree_pathnames        where id = new.parent_id;        if found then           my_pathname =  parent_pathname || \'/\'
||new.name;       else           my_pathname = \'/\' || new.name;        end if;
 
        new.pathname = my_pathname;        update tree_pathnames set pathname = my_pathname where id = 
new.id;        perform tree_pathname_set_children(new.id,my_pathname);     end if;
     RETURN new;  END;'

language 'plpgsql';

 drop trigger tree_pathnames on element;
create trigger tree_pathnames before update on element for each row
execute procedure tree_pathnames();

I have done only preliminary testing on this, but it seems to work 
fine for my application.

Comments please.
-- 
Dan Langille : http://www.langille.org/



Re: Trees: maintaining pathnames

From
"Josh Berkus"
Date:
Dan,

Looks good to me.  It's the same thing I do for the Celko tree
structures in one application -- I have a cache table holding such
things as level and parent_id for each node, values which can only be
generated from the source tables through slow aggregates.

Also, the use of a child table to hold the pathnames should cure your
"cascading trigger" problem.

-Josh Berkus