Thread: How to represent a tree-structure in a relational database

How to represent a tree-structure in a relational database

From

Frank Joerdens

Date:

13 December 2000, 14:02:48

I am just thinking about the data model for a little content management system that I am
currently planning. Individual articles are sorted under different categories which branch
into subcategories, sub-subcategories etc. up to a depth of about 6 or 7 levels. The
structure should be extensible, i.e. it must be possible to add levels. What I am thinking
now is that you would keep the index in a separate index table (linked with the primary
key in the articles table), which would have 6 or 7 fields initially, and that you'd add
columns with the alter table command, if need be, to make the structure deeper. Is this
the recommended way to go about it? It feels pretty 'right' to me now but since the
problem should be fairly common, there must be other people who have thought and written
about it and there might even be a recognized 'optimal' solution to the problem.

Comments?

- Frank

Re: How to represent a tree-structure in a relational database

From

Josh Berkus

Date:

13 December 2000, 14:48:14

Frank,
Please look in the list archives.  About 2 months ago this topic came
up and was discussed extensively (including a creative solution by yours
truly).
                -Josh Berkus
-- 
______AGLIO DATABASE SOLUTIONS___________________________                                       Josh Berkus  Complete
informationtechnology      josh@agliodbs.com   and data management solutions       (415) 436-9166  for law firms, small
businesses      fax  436-0137   and non-profit organizations.       pager 338-4078                               San
Francisco

Re: How to represent a tree-structure in a relational database

From

Mathijs Brands

Date:

13 December 2000, 14:51:31

On Wed, Dec 13, 2000 at 04:48:47PM +0100, Frank Joerdens allegedly wrote:
> I am just thinking about the data model for a little content management system that I am
> currently planning. Individual articles are sorted under different categories which branch
> into subcategories, sub-subcategories etc. up to a depth of about 6 or 7 levels. The
> structure should be extensible, i.e. it must be possible to add levels. What I am thinking
> now is that you would keep the index in a separate index table (linked with the primary
> key in the articles table), which would have 6 or 7 fields initially, and that you'd add
> columns with the alter table command, if need be, to make the structure deeper. Is this
> the recommended way to go about it? It feels pretty 'right' to me now but since the
> problem should be fairly common, there must be other people who have thought and written
> about it and there might even be a recognized 'optimal' solution to the problem.
> 
> Comments?

Yeah. I've built something similar.

The way I've done it: Give each record a unique ID (generated with a sequence) and store the records in a table. Create
asecond table in which you store parent id-child id combinations.

 So:
   1 - Automotive transport   2 - Cars   3 - Motorcycles
   Store in the table:     1-2     1-3
 There's one main category (Automotive transport) which has two sub-categories:   Cars & Motorcyles

The way I'd do it if I had to do it again: Give each record a unique id, generated by the application. Denote levels
withextra letters.

 So:
  AA   - Automotive transport  AAAA - Cars  AAAB - Motorcycles
 The structures has the added bonus of making it very easy to determine all the sub-categories of a category, no matter
howdeep the tree is below the category you're looking at. With the first approach it is not possible to do this in a
singleSQL query. You could do this with a function, I guess.

I hope this is of some use to you.

Cheers,

Mathijs
-- 
"Borrowers of books -- those mutilators of collections, spoilers of thesymmetry of shelves, and creators of odd
volumes."       Charles Lamb (1775-1834)

Re: How to represent a tree-structure in a relational database

From

miguel sofer

Date:

13 December 2000, 15:01:12

I once started writing a small paper on this subject; it is still in a 
rather preliminary state.

You can download the draft (and some ill documented code, 53kB) from  http://www.utdt.edu/~mig/sql-trees

Miguel

>>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<<

On 12/13/00, 12:48:47 PM, Frank Joerdens <frank@joerdens.de> wrote 
regarding [SQL] How to represent a tree-structure in a relational database:


> I am just thinking about the data model for a little content management 
system that I am
> currently planning. Individual articles are sorted under different 
categories which branch
> into subcategories, sub-subcategories etc. up to a depth of about 6 or 7 
levels. The
> structure should be extensible, i.e. it must be possible to add levels. 
What I am thinking
> now is that you would keep the index in a separate index table (linked 
with the primary
> key in the articles table), which would have 6 or 7 fields initially, and 
that you'd add
> columns with the alter table command, if need be, to make the structure 
deeper. Is this
> the recommended way to go about it? It feels pretty 'right' to me now but 
since the
> problem should be fairly common, there must be other people who have 
thought and written
> about it and there might even be a recognized 'optimal' solution to the 
problem.

> Comments?

> - Frank

Re: How to represent a tree-structure in a relational database

From

Frank Joerdens

Date:

13 December 2000, 17:06:09

On Wed, Dec 13, 2000 at 11:38:18AM -0800, Stuart Statman wrote:
[ . . . ]
> I would suggest, instead, to create a table that represents your hierarchy
> without adding columns. For example :
> 
> create table Category (
> CategoryID       int4  not null  primary key,
> ParentCategoryID int4  not null  REFERENCES Category (CategoryID),
> CategoryName     varchar(100)
> );
> 
> Add a CategoryID with an FK reference to this table, and your work is done.
> 
> Then adding, inserting, removing, or moving layers in the hierarchy becomes
> quite simple. This also preserves hierarchical integrity, where subcategory
> a of subcategory b will also remain a subcategory of category c if
> subcategory b is a subcategory of subcategory c, where I'm not sure your
> model will preserve or guarantee that. (Does that sentence deserve a prize?)

Cool. That looks like my solution. I had actually seen it someplace
before, but didn't make the connection with my problem. 

Ta, Frank

Re: How to represent a tree-structure in a relational database

From

Frank Joerdens

Date:

13 December 2000, 17:09:14

On Wed, Dec 13, 2000 at 11:04:13AM -0800, Josh Berkus wrote:
> Frank,
> 
>     Please look in the list archives.  About 2 months ago this topic came
> up and was discussed extensively (including a creative solution by yours
> truly).

Hm, neither my archives nor a search on the postgresql.org page turned
up the thread you mention. Do you recall which list it was and what the
title of the thread was?

Thanks, Frank

Re: How to represent a tree-structure in a relational database

From

clayton cottingham

Date:

13 December 2000, 17:16:32

Frank Joerdens wrote:
> 
> On Wed, Dec 13, 2000 at 11:04:13AM -0800, Josh Berkus wrote:
> > Frank,
> >
> >       Please look in the list archives.  About 2 months ago this topic came
> > up and was discussed extensively (including a creative solution by yours
> > truly).
> 
> Hm, neither my archives nor a search on the postgresql.org page turned
> up the thread you mention. Do you recall which list it was and what the
> title of the thread was?
> 
> Thanks, Frank


yes i  recall!!

i managed to implement something to that effect

a lot was done using postgres and perl

anyone need code fragements?

RE: How to represent a tree-structure in a relational database

From

"Stuart Statman"

Date:

13 December 2000, 18:00:27

> The way I'd do it if I had to do it again:
>   Give each record a unique id, generated by the application.
> Denote levels with extra letters.
>
>   So:
>
>    AA   - Automotive transport
>    AAAA - Cars
>    AAAB - Motorcycles
>
> The structures has the added bonus of making it very easy to
> determine all the
> sub-categories of a category, no matter how deep the tree is
> below the category
> you're looking at. With the first approach it is not possible
> to do this in a
> single SQL query. You could do this with a function, I guess.

The problem with this method is if you need to insert a category, or move a
category. You'll need to re-id a bunch of categories, and bubble those
changes out to every table that refers to this table.

Stuart Statman
Director of Software Development
Slam Media, Inc.

Attachment

Stuart Statman.vcf

RE: How to represent a tree-structure in a relational database

From

"Stuart Statman"

Date:

13 December 2000, 18:03:31

> What I am thinking now is that you would keep the index
> in a separate index table linked with the primary
> key in the articles table), which would have 6 or 7 fields
> initially, and that you'd add columns with the alter table
> command, if need be, to make the structure deeper.

I would suggest, instead, to create a table that represents your hierarchy
without adding columns. For example :

create table Category (
CategoryID       int4  not null  primary key,
ParentCategoryID int4  not null  REFERENCES Category (CategoryID),
CategoryName     varchar(100)
);

Add a CategoryID with an FK reference to this table, and your work is done.

Then adding, inserting, removing, or moving layers in the hierarchy becomes
quite simple. This also preserves hierarchical integrity, where subcategory
a of subcategory b will also remain a subcategory of category c if
subcategory b is a subcategory of subcategory c, where I'm not sure your
model will preserve or guarantee that. (Does that sentence deserve a prize?)

In general, if you know that you will need to periodically alter a table to
add columns, you should come up with a different model that doesn't require
adding columns.

Stuart Statman
Director of Software Development
Slam Media, Inc.

Attachment

Stuart Statman.vcf

Re: How to represent a tree-structure in a relational database

From

Josh Berkus

Date:

13 December 2000, 18:12:18

Frank, etc:

> > create table Category (
> > CategoryID       int4  not null  primary key,
> > ParentCategoryID int4  not null  REFERENCES Category (CategoryID),
> > CategoryName     varchar(100)
> > );

That was it.  I also gave an example of a UNION query that would display
the whole category tree in ASCII format:

I've done this before for one project.  Here's what you do:

CREATE TABLE sample_heirarchy (       unique_id       SERIAL CONSTRAINT PRIMARY KEY,       node_linkup     INT4,
node_level     INT2,       label           VARCHAR(30)       data            whatever       );
 

Then you use the unique_id and node_linkup fields to create a heirarchy
of data nodes, with an indefinite number of levels, where the
node_linkup of each lower level equals the id of its parent record.  For
example:

id      linkup  level           label           data
3       0       1               Node1           Node1
4       3       2               Node1.1         Node1.1
6       3       2               Node1.2         Node1.2
7       6       3               Node1.2.1       Node1.2.1
5       0       1               Node2           Node2

etc.

You can then access the whole heirarchy through moderately complex, but
very fast-executing UNION queries.  The one drawback is that you need to
know in advance the maximum number of levels (3 in this example), but
I'm sure someone on this list can find a way around that:

SELECT n1.unique_id, n1.label, n1.data, n1.node_level, n1.unique_id AS
level1,       0 AS level2, 0 AS level3
FROM sample_heirarchy n1
WHERE n1.node_level = 1
UNION ALL
SELECT n2.unique_id, n2.label, n2.data, n2.node_level, n1.unique_id,        n2.unique_id, 0
FROM sample_heirarchy n2, sample_heirarchy n1
WHERE n1.unique_id = n2.node_linkup       AND n2.node_level = 2
UNION ALL
SELECT n3.unique_id, n3.label, n3.data, n3.node_level, n1.unique_id,        n2.unique_id, n3.unique_id
FROM sample_heirarchy n1, sample_heirarchy n2, sample_heirarchy n3
WHERE n1.unique_id = n2.node_linkup AND       n2.unique_id = n3.node_linkup       AND n3.node_level = 3
ORDER BY level1, level2, level3

Should produce this output (pardon any parsing errors; I'm not at a
PGSQL terminal right now):

unique_id       label           data    level   level1  level2  level3
3               Node1           Node1     1     3       0       0
4               Node1.1         Node1.1   2     3       4       0
6               Node1.2         Node1.2   2     3       6       0
7               Node1.2.1       Node1.2.1 3     3       6       7
5               Node2           Node2     1     7       0       0
etc.

This sorts them in numerical (id) order, but one could just as easily
substitute the labels or data for the various levels and sort them
alphabetically (although you do need to allow for NULL sort order on
your database, and any label duplicates).

The advantages of this structure are:
1. It allows you to create, assign, and re-assign nodes freely all over
the heirarchy ... just change the level and/or linkup.
2. Aside from the Union query above, the table structure allows for any
number of levels, unlike a set or relationally linked tables.
3. Because the display query is entirely once table linking to itself on
(hopefully) indexed fields, in my expreience it runs very, very fast.
4. My PHP developer has reprogrammed the easily available PHP Tree
Control to uses this table structure (I don't know if he's giving it
out, but he said it wasn't very difficult).

-Josh Berkus

-- 
______AGLIO DATABASE SOLUTIONS___________________________                                       Josh Berkus  Complete
informationtechnology      josh@agliodbs.com   and data management solutions       (415) 436-9166  for law firms, small
businesses      fax  436-0137   and non-profit organizations.       pager 338-4078                               San
Francisco

RE: How to represent a tree-structure in a relational database

From

"Stuart Statman"

Date:

13 December 2000, 19:29:58

[Josh Berkus]

> I've done this before for one project.  Here's what you do:
>
> CREATE TABLE sample_heirarchy (
>         unique_id       SERIAL CONSTRAINT PRIMARY KEY,
>         node_linkup     INT4,
>         node_level      INT2,
>         label           VARCHAR(30)
>         data            whatever
>         );
>
> Then you use the unique_id and node_linkup fields to create a heirarchy
> of data nodes, with an indefinite number of levels, where the
> node_linkup of each lower level equals the id of its parent record.  For
> example:
>
> id      linkup  level           label           data
> 3       0       1               Node1           Node1
> 4       3       2               Node1.1         Node1.1
> 6       3       2               Node1.2         Node1.2
> 7       6       3               Node1.2.1       Node1.2.1
> 5       0       1               Node2           Node2

I don't think I'd be comfortable with having the node_level column in the
table structure. First, because you can derive that value using a function,
it's duplicate data. Second, if you decide to take an entire segment of your
hierarchy and move it under another node (by changing the value of
node_linkup/ParentCategoryID), you'll need to recalculate all of those
node_level values. And all the node_level values underneath it.

> You can then access the whole heirarchy through moderately complex, but
> very fast-executing UNION queries.  The one drawback is that you need to
> know in advance the maximum number of levels (3 in this example), but
> I'm sure someone on this list can find a way around that:

I can think of another way to do this, though it would be a little complex
and would involve temp tables. Select all of your top level nodes into a
temp table. Create a new table with a new column for the new level. Select
the children of the top level nodes into the temp table, followed by those
top level nodes themselves, with a 0 in the new column and a flag indicating
not to expand again. Create a new temp table just like the last but with
another column for the new level, and repeat the above process from the
first temp table to the second, only expanding the latest children, but
copying all records over. Keep doing it until there are no more new
children.

Alternately, if you didn't need each level to have it's own column, but
didn't mind an x.x.x.x kind of notation, you could use one temp table, and
just append '.0' to the end of every copied-over parent node.

Basically, both methods are simulations of recursing the tree, but you get
to do each level all at once using an insert ... select. If you wanted, you
could even use a counter, to identify which level each node appeared in.

Clearly, this could also be done with cursors and recursive

> 4. My PHP developer has reprogrammed the easily available PHP Tree
> Control to uses this table structure (I don't know if he's giving it
> out, but he said it wasn't very difficult).

We've done a similar thing for Java. It was ridiculously easy to create a
TreeModel wrapped around this data. Almost too easy; it made me feel dirty.

Stuart Statman
Director of Software Development
Slam Media, Inc.

Attachment

Stuart Statman.vcf

Re: How to represent a tree-structure in a relational database

From

Josh Berkus

Date:

13 December 2000, 19:51:46

Stuart,

> I don't think I'd be comfortable with having the node_level column in the
> table structure. First, because you can derive that value using a function,
> it's duplicate data. Second, if you decide to take an entire segment of your
> hierarchy and move it under another node (by changing the value of
> node_linkup/ParentCategoryID), you'll need to recalculate all of those
> node_level values. And all the node_level values underneath it.

I can see that.  I suppose it depends on the data you're storing.  The
project I was working on tracked grocery inventory for a delivery
service, and thus each item had a fixed "level" in the heirarcy (Food
Class, Food Type, Manufacturer, and Item) and thus while items might get
reassigned *across* the heirarcy, they did not get re-assigned *up and
down* the heirarcy.

Also, I can't think of a way to represent the tree in pure SQL without
having the level identifiers (and a fixed number of levels).

> We've done a similar thing for Java. It was ridiculously easy to create a
> TreeModel wrapped around this data. Almost too easy; it made me feel dirty.

Great.  Maybe I'll buy it from you if I ever need to use Java :-)
                -Josh

-- 
______AGLIO DATABASE SOLUTIONS___________________________                                       Josh Berkus  Complete
informationtechnology      josh@agliodbs.com   and data management solutions       (415) 436-9166  for law firms, small
businesses      fax  436-0137   and non-profit organizations.       pager 338-4078                               San
Francisco

Re: How to represent a tree-structure in a relational database

From

Mathijs Brands

Date:

13 December 2000, 19:51:52

On Wed, Dec 13, 2000 at 12:09:06PM -0800, Stuart Statman allegedly wrote:
> > The way I'd do it if I had to do it again:
> >   Give each record a unique id, generated by the application.
> > Denote levels with extra letters.
> >
> >   So:
> >
> >    AA   - Automotive transport
> >    AAAA - Cars
> >    AAAB - Motorcycles
> >
> > The structures has the added bonus of making it very easy to
> > determine all the
> > sub-categories of a category, no matter how deep the tree is
> > below the category
> > you're looking at. With the first approach it is not possible
> > to do this in a
> > single SQL query. You could do this with a function, I guess.
> 
> The problem with this method is if you need to insert a category, or move a
> category. You'll need to re-id a bunch of categories, and bubble those
> changes out to every table that refers to this table.

You can solve the last problem by using an extra table that maps unique
record id's (numerical) to hierarchical id's (text).

Inserting a category, in my case, does not require me to start updating
lots of records, since I would only use these strings to store
hierarchical information. I'm sorting categories based on alphabet,
which you can overrule by increasing the 'weight' of a category, which
is a numerical value attached to every category and which normally has a
vaue of 1.

However, changing the level of a category would require me to modify all
categories below that. In my case, this wouldn't be a problem.  We're
using this stuff for a Yahoo style directory which (atm) has about 2500
different categories. I'm generating a complete tree of all categories
and the websites in them once a day, storing them in a souped up DBM
style database. For each record I store the children, not the parent. If
changing the underlying structure takes a couple of minutes, than this
is acceptable.

As you can see my number of categories is rather small. If you're going
to use this for a forum or something similar, you may run into problems.
However, how often do you want to move a thread...

Cheers,

Mathijs
--
"Where is human nature so weak as in a bookstore!"        Henry Ward Beecher  (1813-1887)

Re: How to represent a tree-structure in a relational database

From

Mathijs Brands

Date:

13 December 2000, 20:04:07

On Wed, Dec 13, 2000 at 04:49:51PM -0800, Josh Berkus allegedly wrote:
> Stuart,
> 
> > I don't think I'd be comfortable with having the node_level column in the
> > table structure. First, because you can derive that value using a function,
> > it's duplicate data. Second, if you decide to take an entire segment of your
> > hierarchy and move it under another node (by changing the value of
> > node_linkup/ParentCategoryID), you'll need to recalculate all of those
> > node_level values. And all the node_level values underneath it.
> 
> I can see that.  I suppose it depends on the data you're storing.  The
> project I was working on tracked grocery inventory for a delivery
> service, and thus each item had a fixed "level" in the heirarcy (Food
> Class, Food Type, Manufacturer, and Item) and thus while items might get
> reassigned *across* the heirarcy, they did not get re-assigned *up and
> down* the heirarcy.

Indeed. If the structure 'rarely' changes, having to do an expensive
update may be acceptable, if it increase the overall performance
significantly.

> Also, I can't think of a way to represent the tree in pure SQL without
> having the level identifiers (and a fixed number of levels).

Storing only the parent for a record doesn't require you to keep track
of levels, since this information can be reconstructed by following the
chain of parent id's until you reach the top of your tree.

Storing the children for each record (like I'm doing) works exactly the
same. Just follow the 'path' (for instance 'Automotive Transport/Cars')
to find the category you're looking for.

Cheers,

Mathijs
--
"A book is a fragile creature.  It suffers the wear of time,it fears rodents, the elements, clumsy hands."
UmbertoEco

RE: How to represent a tree-structure in a relational database

From

"Opec Kemp \( Ozemail \)"

Date:

13 December 2000, 20:04:26

Don't if this will help but there is a really good book that discuss this problem in details.
The book is called "SQL for Smarties" by Joe Celko. It covers lots of advance topics 
(tree being one of them). Very good book. Check out on Amazon:

http://www.amazon.com/exec/obidos/ASIN/1558605762/qid=976755796/sr=1-1/106-0241434-0557209

Just me $0.02 

> -----Original Message-----
> From: pgsql-sql-owner@postgresql.org
> [mailto:    Behalf Of Josh Berkus
> Sent: Thursday, December 14, 2000 10:50 AM
> To: sqllist
> Subject: Re: [SQL] How to represent a tree-structure in a relational
> database
> 
> 
> Stuart,
> 
> > I don't think I'd be comfortable with having the node_level column in the
> > table structure. First, because you can derive that value using a function,
> > it's duplicate data. Second, if you decide to take an entire segment of your
> > hierarchy and move it under another node (by changing the value of
> > node_linkup/ParentCategoryID), you'll need to recalculate all of those
> > node_level values. And all the node_level values underneath it.
> 
> I can see that.  I suppose it depends on the data you're storing.  The
> project I was working on tracked grocery inventory for a delivery
> service, and thus each item had a fixed "level" in the heirarcy (Food
> Class, Food Type, Manufacturer, and Item) and thus while items might get
> reassigned *across* the heirarcy, they did not get re-assigned *up and
> down* the heirarcy.
> 
> Also, I can't think of a way to represent the tree in pure SQL without
> having the level identifiers (and a fixed number of levels).
> 
> > We've done a similar thing for Java. It was ridiculously easy to create a
> > TreeModel wrapped around this data. Almost too easy; it made me feel dirty.
> 
> Great.  Maybe I'll buy it from you if I ever need to use Java :-)
> 
>                     -Josh
> 
> -- 
> ______AGLIO DATABASE SOLUTIONS___________________________
>                                         Josh Berkus
>    Complete information technology      josh@agliodbs.com
>     and data management solutions       (415) 436-9166
>    for law firms, small businesses       fax  436-0137
>     and non-profit organizations.       pager 338-4078
>                                 San Francisco

Re: How to represent a tree-structure in a relational database

From

clayton cottingham

Date:

13 December 2000, 20:20:01

Mathijs Brands wrote:
> 
> On Wed, Dec 13, 2000 at 04:49:51PM -0800, Josh Berkus allegedly wrote:
> > Stuart,
> >
> > > I don't think I'd be comfortable with having the node_level column in the
> > > table structure. First, because you can derive that value using a function,
> > > it's duplicate data. Second, if you decide to take an entire segment of your
> > > hierarchy and move it under another node (by changing the value of
> > > node_linkup/ParentCategoryID), you'll need to recalculate all of those
> > > node_level values. And all the node_level values underneath it.
> >
> > I can see that.  I suppose it depends on the data you're storing.  The
> > project I was working on tracked grocery inventory for a delivery
> > service, and thus each item had a fixed "level" in the heirarcy (Food
> > Class, Food Type, Manufacturer, and Item) and thus while items might get
> > reassigned *across* the heirarcy, they did not get re-assigned *up and
> > down* the heirarcy.
> 
> Indeed. If the structure 'rarely' changes, having to do an expensive
> update may be acceptable, if it increase the overall performance
> significantly.
> 
> > Also, I can't think of a way to represent the tree in pure SQL without
> > having the level identifiers (and a fixed number of levels).
> 
> Storing only the parent for a record doesn't require you to keep track
> of levels, since this information can be reconstructed by following the
> chain of parent id's until you reach the top of your tree.
> 
> Storing the children for each record (like I'm doing) works exactly the
> same. Just follow the 'path' (for instance 'Automotive Transport/Cars')
> to find the category you're looking for.
> 
> Cheers,
> 
> Mathijs
> --
> "A book is a fragile creature.  It suffers the wear of time,
>  it fears rodents, the elements, clumsy hands."
>         Umberto Eco



this is the way i implemented too!

i used perl and modperl/apache to deploy

so i did use a perl module to store the top level of categories

so they where always avail to the starting page

mostly it ended up looking like the bidders edge horizontal line type
thing

again if anyone needs the code lemme know!

Re: How to represent a tree-structure in a relational database

From

"Robert B. Easter"

Date:

13 December 2000, 23:04:35

On Wednesday 13 December 2000 18:05, Josh Berkus wrote:
> Frank, etc:
> > > create table Category (
> > > CategoryID       int4  not null  primary key,
> > > ParentCategoryID int4  not null  REFERENCES Category (CategoryID),
> > > CategoryName     varchar(100)
> > > );

I made a message board with a hierarchy for topics/boards under which  
messages go into a post/reply hierarchy at www.comptechnews.com.  I used the 
parent_id idea like in the table above.

It's working OK and, yes, I can easily move nodes around without problems.  I 
can move/reparent Topic/boards and posts/replies (posts and replies are 
treated nearly the same, just that a post has a parent-post-id of 0 while a 
reply message's parent-post-id is non-zero).  This arrangement combined with 
the use of PL/pgSQL trigger functions can go a long way.

I did not use any foreign keys but just had the PL/pgSQL triggers do any 
checks I wanted myself.  So, its easy to make triggers that do things like 
automatically delete an entire hierarchy when a node(message or topic) is 
deleted.  Like, if you delete a post, it deletes all its replies 
automatically.  If you delete a reply, it deletes any child/replies that it 
might have etc.  If you delete a topic/board, it deletes all messages that 
were in that topic and any subtopics and messages in those subtopics 
recursively.  The PL/pgSQL can be used to RAISE EXCEPTION when something you 
don't want to happen is attempted (like deleting the root topic!), which then 
automatically ABORTs the transaction.

You will want to make heavy use of procedures in the database to ensure 
integrity.  You might be able to use FOREIGN KEY triggers also but with 
careful use of BEFORE and AFTER PL/pgSQL triggers. In my case, I had somekind 
of conflict between what my PL/pgSQL triggers where doing and what the 
foreign key triggers were doing. The stuff my PL/pgSQL were doing caused 
referential integrity violations sometimes.  You have to be real careful what 
goes into a BEFORE trigger and what goes into an AFTER trigger.  Trying to 
make a BEFORE trigger or an AFTER trigger do too much will end up in trouble 
so you have to split what you want to do into two triggers for a table.  The 
CONSTRAINT TRIGGERs that get created automatically by FOREIGN KEYs are called 
AFTER INSERT OR UPDATE|DELETE and are NOT DEFERRABLE and INITIALLY IMMEDIATE. 
You have to keep that in mind in writing your procedures.  In the end, I 
judged the contraint triggers to interfere with what I wanted to do and 
removed them.

>
> That was it.  I also gave an example of a UNION query that would display
> the whole category tree in ASCII format:
>
> I've done this before for one project.  Here's what you do:
>
> CREATE TABLE sample_heirarchy (
>         unique_id       SERIAL CONSTRAINT PRIMARY KEY,
>         node_linkup     INT4,
>         node_level      INT2,
>         label           VARCHAR(30)
>         data            whatever
>         );
>
> Then you use the unique_id and node_linkup fields to create a heirarchy
> of data nodes, with an indefinite number of levels, where the
> node_linkup of each lower level equals the id of its parent record.  For
> example:
>
> id      linkup  level           label           data
> 3       0       1               Node1           Node1
> 4       3       2               Node1.1         Node1.1
> 6       3       2               Node1.2         Node1.2
> 7       6       3               Node1.2.1       Node1.2.1
> 5       0       1               Node2           Node2
>
> etc.
>
> You can then access the whole heirarchy through moderately complex, but
> very fast-executing UNION queries.  The one drawback is that you need to
> know in advance the maximum number of levels (3 in this example), but
> I'm sure someone on this list can find a way around that:
>
> SELECT n1.unique_id, n1.label, n1.data, n1.node_level, n1.unique_id AS
> level1,
>         0 AS level2, 0 AS level3
> FROM sample_heirarchy n1
> WHERE n1.node_level = 1
> UNION ALL
> SELECT n2.unique_id, n2.label, n2.data, n2.node_level, n1.unique_id,
>         n2.unique_id, 0
> FROM sample_heirarchy n2, sample_heirarchy n1
> WHERE n1.unique_id = n2.node_linkup
>         AND n2.node_level = 2
> UNION ALL
> SELECT n3.unique_id, n3.label, n3.data, n3.node_level, n1.unique_id,
>         n2.unique_id, n3.unique_id
> FROM sample_heirarchy n1, sample_heirarchy n2, sample_heirarchy n3
> WHERE n1.unique_id = n2.node_linkup AND
>         n2.unique_id = n3.node_linkup
>         AND n3.node_level = 3
> ORDER BY level1, level2, level3
>
> Should produce this output (pardon any parsing errors; I'm not at a
> PGSQL terminal right now):
>
> unique_id       label           data    level   level1  level2  level3
> 3               Node1           Node1     1     3       0       0
> 4               Node1.1         Node1.1   2     3       4       0
> 6               Node1.2         Node1.2   2     3       6       0
> 7               Node1.2.1       Node1.2.1 3     3       6       7
> 5               Node2           Node2     1     7       0       0
> etc.
>
> This sorts them in numerical (id) order, but one could just as easily
> substitute the labels or data for the various levels and sort them
> alphabetically (although you do need to allow for NULL sort order on
> your database, and any label duplicates).
>
> The advantages of this structure are:
> 1. It allows you to create, assign, and re-assign nodes freely all over
> the heirarchy ... just change the level and/or linkup.
> 2. Aside from the Union query above, the table structure allows for any
> number of levels, unlike a set or relationally linked tables.
> 3. Because the display query is entirely once table linking to itself on
> (hopefully) indexed fields, in my expreience it runs very, very fast.
> 4. My PHP developer has reprogrammed the easily available PHP Tree
> Control to uses this table structure (I don't know if he's giving it
> out, but he said it wasn't very difficult).
>
> -Josh Berkus

-- 
-------- Robert B. Easter  reaster@comptechnews.com ---------
- CompTechNews Message Board   http://www.comptechnews.com/ -
- CompTechServ Tech Services   http://www.comptechserv.com/ -
---------- http://www.comptechnews.com/~reaster/ ------------

Re: How to represent a tree-structure in a relational database

From

Tulassay Zsolt

Date:

14 December 2000, 08:12:14

There actually is a model of tree structures in SQL databases which is
different from those mentioned earlier in that it represents the tree
as nested sets (ie. nodes are subsets of parent sets (parent nodes)).

There is a huge advantage in this model as it eliminates the need for
recursion. For example, if you want to get a specific node's parents,
grandparents and all other parents up the tree, you can do this in
_one single_ SELECT query. If you just stored the the id of the parent
of each node, you would need (n-1) queries if the node is at the n-th
level.

However, inserting or deleting nodes is more complex this way since you
have to keep the data structure "balanced". But in most cases you won't
insert new nodes all the time so you won't suffer from this overhead. And
on the other hand, the performance gain on queries might be enormous.

You can find the article dealing with this at
http://www.utdt.edu/~mig/sql-trees

Actually what I did was to implement _both_ models at the same time. I
took the nested set structure, and besides stored the parent id of all
nodes as well. The benefit of this is that by accepting a little more
overhead during inserts, I was able to use the advantages of both models.

If somebody is interested in it, I might share the code (a few tables
and some plpgsql functions). But the article is surely worth a read.

Zsolt

ps: I found the link a few months ago in the pgsql-sql archive :-)

On Wed, 13 Dec 2000, Josh Berkus wrote:

> Stuart,
> 
> > I don't think I'd be comfortable with having the node_level column in the
> > table structure. First, because you can derive that value using a function,
> > it's duplicate data. Second, if you decide to take an entire segment of your
> > hierarchy and move it under another node (by changing the value of
> > node_linkup/ParentCategoryID), you'll need to recalculate all of those
> > node_level values. And all the node_level values underneath it.
> 
> I can see that.  I suppose it depends on the data you're storing.  The
> project I was working on tracked grocery inventory for a delivery
> service, and thus each item had a fixed "level" in the heirarcy (Food
> Class, Food Type, Manufacturer, and Item) and thus while items might get
> reassigned *across* the heirarcy, they did not get re-assigned *up and
> down* the heirarcy.
> 
> Also, I can't think of a way to represent the tree in pure SQL without
> having the level identifiers (and a fixed number of levels).
> 
> > We've done a similar thing for Java. It was ridiculously easy to create a
> > TreeModel wrapped around this data. Almost too easy; it made me feel dirty.
> 
> Great.  Maybe I'll buy it from you if I ever need to use Java :-)
> 
>                     -Josh
> 
> -- 
> ______AGLIO DATABASE SOLUTIONS___________________________
>                                         Josh Berkus
>    Complete information technology      josh@agliodbs.com
>     and data management solutions       (415) 436-9166
>    for law firms, small businesses       fax  436-0137
>     and non-profit organizations.       pager 338-4078
>                                 San Francisco
>

Re: How to represent a tree-structure in a relational database

From

hubert depesz lubaczewski

Date:

14 December 2000, 10:39:36

somebody already showed table structure, but i'll ad some more code to this:

table:

CREATE TABLE groups ( id           INT4     NOT NULL DEFAULT NEXTVAL('groups_seq'), parent_id    INT4     NOT NULL
DEFAULT0, name         TEXT     NOT NULL DEFAULT '', active       BOOL     NOT NULL DEFAULT 't'::bool, PRIMARY KEY
(id)
);
INSERT INTO groups   (id)       VALUES (0);
ALTER TABLE groups   ADD FOREIGN KEY (parent_id  ) REFERENCES groups   (id);
CREATE UNIQUE INDEX groups_pn_u   ON groups   (parent_id, name, active);

at this point it seems to be pretty easy and obvious.
in my case i got to the point that i needed some more info about the branch of
tree. so i wrote:

REATE function getgrouppath(int4, text) returns text as ' DECLARE   sep ALIAS FOR $2;   aid int4;   wynik TEXT;   temp
RECORD;  b BOOL; BEGIN   b:=''t'';   wynik:='''';   aid:=$1;   while b loop     SELECT name, parent_id INTO temp FROM
groupsWHERE id=aid;     IF NOT FOUND THEN       return wynik;     END IF;     if wynik = '''' THEN
wynik:=temp.name;    else       wynik:=temp.name||sep||wynik;     END if;     IF temp.parent_id = 0 THEN
b:=''f'';    ELSE       aid:=temp.parent_id;     END if;   end loop;   return wynik; END;
 
' language 'plpgsql';

(sorry for polish variable names)
this function does one nice trick
when having structure like:
=> select id, parent_id, name, active from groups;id | parent_id |         name         | active
----+-----------+----------------------+-------- 0 |         0 |                      | t 1 |         0 | RTV
      | t 2 |         0 | AGD                  | t 3 |         0 | MP3                  | t 4 |         1 | Audio
        | t 5 |         2 | Lodówki              | t 6 |         2 | Kuchenki Mikrofalowe | t 7 |         4 | Sony
          | t 8 |         4 | Panasonic            | t
 
(9 rows)

i can:
=> select id, parent_id, name, active, getgrouppath(id, '/') from
groups;id | parent_id |         name         | active |       getgrouppath
----+-----------+----------------------+--------+-------------------------- 0 |         0 |                      | t
 | 1 |         0 | RTV                  | t      | RTV 2 |         0 | AGD                  | t      | AGD 3 |
0| MP3                  | t      | MP3 4 |         1 | Audio                | t      | RTV/Audio 5 |         2 |
Lodówki             | t      | AGD/Lodówki 6 |         2 | Kuchenki Mikrofalowe | t      | AGD/Kuchenki Mikrofalowe 7 |
       4 | Sony                 | t      | RTV/Audio/Sony 8 |         4 | Panasonic            | t      |
RTV/Audio/Panasonic


since for some reasons (indenting) i needed the level of branch i wrote:

CREATE FUNCTION grouplevel(int4) returns int4 AS ' DECLARE   baseid ALIAS FOR $1;   currid INT4;   reply INT4; BEGIN
reply:=1;  if baseid = 0 then return 0; END if;   SELECT parent_id INTO currid FROM groups where id=baseid;   while
currid>0loop     reply:=reply+1;     SELECT parent_id INTO currid FROM groups where id=currid;   END loop;   return
reply;END;
 
' language 'plpgsql';

which also seems pretty obvious.

to be complete i wrote two triggers which made me happy:

CREATE FUNCTION trg_recurs_act_g() RETURNS OPAQUE AS ' BEGIN   IF NEW.active=''f''::bool and OLD.active=''t''::bool
THEN    UPDATE articles SET active=''f''::bool WHERE group_id=NEW.id;     UPDATE groups SET active=''f''::bool WHERE
parent_id=NEW.idand id<>0;   ELSE     IF NEW.active=''t''::bool and OLD.active=''f''::bool AND NEW.id<>0 THEN
UPDATEgroups SET active=''t''::bool WHERE id=NEW.parent_id;     END IF;   END IF;   RETURN NEW; END;
 
' LANGUAGE 'plpgsql';

CREATE FUNCTION trg_recurs_act_a() RETURNS OPAQUE AS ' BEGIN   IF NEW.active=''t''::bool and OLD.active=''f''::bool
THEN    UPDATE groups SET active=''t''::bool WHERE id=NEW.group_id;   END IF;   RETURN NEW; END;
 
' LANGUAGE 'plpgsql';

CREATE TRIGGER groups_update_trg BEFORE UPDATE ON groups FOR EACH ROW EXECUTE PROCEDURE trg_recurs_act_g();
CREATE TRIGGER articles_update_trg BEFORE UPDATE ON articles FOR EACH ROW EXECUTE PROCEDURE trg_recurs_act_a();

as you can see those triggers use article table which structure is not
important at this moment (let's assume it has id, group_id, name and active).

i hope this code will help you a bit.

depesz

-- 
hubert depesz lubaczewski
------------------------------------------------------------------------    najwspanialszą rzeczą jaką dało nam
nowoczesnespołeczeństwo,     jest niesamowita wręcz łatwość unikania kontaktów z nim ...

Re: How to represent a tree-structure in a relationaldatabase

From

Alvar Freude

Date:

14 December 2000, 21:38:01

Hi,

miguel sofer wrote:
> 
> I once started writing a small paper on this subject; it is still in a
> rather preliminary state.
> 
> You can download the draft (and some ill documented code, 53kB) from
>   http://www.utdt.edu/~mig/sql-trees

ah, this looks very, very nice!
on page 5ff you describe the Postgres implementation, but the URL (page
5 bottom) is't complete -- can i find the files somewhere?
Included is a "tree_general.sql", but this seems not to be complete and
not the same version as the ps-file (First draft, may 6, 2000): in the
draft there is written about an base 160 encoding, tree_general.sql uses
base 159 encoding ;)

Ciao Alvar

-- 
Alvar C.H. Freude  |  alvar.freude@merz-akademie.de
   Demo: http://www.online-demonstration.org/  |  Mach mit!
Blast-DE: http://www.assoziations-blaster.de/   |  Blast-Dich-Fit
Blast-EN: http://www.a-blast.org/               |  Blast/english

Re: How to represent a tree-structure in a relationaldatabase

From

Alvar Freude

Date:

14 December 2000, 22:21:18

Hi,

>
> > I once started writing a small paper on this subject; it is still in a
> > rather preliminary state.
> >
> > You can download the draft (and some ill documented code, 53kB) from
> >   http://www.utdt.edu/~mig/sql-trees
>

i guess, with base 160 encoding there might be a problem: if postgres is
compiled with --enable-locale (e.g. for german umlauts), the ordering
isn't according to the ASCII number of the character, so for this
purpose it's needed to build the encoding table according to the locate
settings. Or simply sort it according the locale settings.


What's against using all characters >= 32, excluding special characters
with special meaninbg in LIKE and regexps? With base 208 encoding it's
possible to have 43264 elements on each level.


Ciao  Alvar


-- 
Alvar C.H. Freude  |  alvar.freude@merz-akademie.de
   Demo: http://www.online-demonstration.org/  |  Mach mit!
Blast-DE: http://www.assoziations-blaster.de/   |  Blast-Dich-Fit
Blast-EN: http://www.a-blast.org/               |  Blast/english

Re: How to represent a tree-structure in a relational database

From

Tulassay Zsolt

Date:

15 December 2000, 04:35:17


On Thu, 14 Dec 2000, Tulassay Zsolt wrote:

> 
> You can find the article dealing with this at
> http://www.utdt.edu/~mig/sql-trees
> 

sorry i pasted in the wrong url (this was mentioned in an earlier post)
the correct one is:

A look at SQL Trees (by Joe Celko)
http://www.dbmsmag.com/9603d06.html

Zsolt Tulassay

Re: How to represent a tree-structure in a relationaldatabase

From

miguel sofer

Date:

15 December 2000, 07:15:04

> > I once started writing a small paper on this subject; it is still in a
> > rather preliminary state.
> >
> > You can download the draft (and some ill documented code, 53kB) from
> >   http://www.utdt.edu/~mig/sql-trees

> ah, this looks very, very nice!

Thanks.

> on page 5ff you describe the Postgres implementation, but the URL (page
> 5 bottom) is't complete -- can i find the files somewhere?
> Included is a "tree_general.sql", but this seems not to be complete and
> not the same version as the ps-file (First draft, may 6, 2000): in the
> draft there is written about an base 160 encoding, tree_general.sql uses
> base 159 encoding ;)

Sorry, I never got around to completing this, or thinking any further. My 
other files are definitely not in a usable state right now. I hope to be 
able 
to improve things over the (southern) summer holidays, so there may be 
news 
soon - but do not hold your breadth!

I can't remember why I switched from base 160 to base 159; my guess now 
is that 
I got confused at coding time between the base and the maximal number 
(base-1): 
ie, it may be a mistake.

> What's against using all characters >= 32, excluding special characters
> with special meaning in LIKE and regexps? With base 208 encoding it's
> possible to have 43264 elements on each level.

Nothing, I guess. I probably got some kind of "start counting at zero" 
blockage when I started, and never really looked back on it, my shame. 
Hey, I told you it was rather preliminary ... Thanks for pointing it out.

> i guess, with base 160 encoding there might be a problem: if postgres is
> compiled with --enable-locale (e.g. for german umlauts), the ordering
> isn't according to the ASCII number of the character, so for this
> purpose it's needed to build the encoding table according to the locate
> settings. Or simply sort it according the locale settings.

Yes indeed; never thought about that one.

Cheers

Miguel

Re: How to represent a tree-structure in a relationaldatabase

From

Alvar Freude

Date:

15 December 2000, 18:15:09

Hi,

miguel sofer wrote:
> 
> Sorry, I never got around to completing this, or thinking any further. My
> other files are definitely not in a usable state right now. I hope to be
> able
> to improve things over the (southern) summer holidays, so there may be
> news
> soon - but do not hold your breadth!

OK, no prob.

I start to implement smth in Perl, so if it works ok and someone is
interested in it i can post it here.

and I'll try to use a kind of base255 encoding; it seams that postgres
only dislike the 0 byte, so if this works there is no real encoding
necessary -- simply start at 1 instead of 0. I hoppe this works :)

Ciao Alvar

-- 
Alvar C.H. Freude  |  alvar.freude@merz-akademie.de
   Demo: http://www.online-demonstration.org/  |  Mach mit!
Blast-DE: http://www.assoziations-blaster.de/   |  Blast-Dich-Fit
Blast-EN: http://www.a-blast.org/               |  Blast/english

Re: How to represent a tree-structure in a relational database

From

Ron Peterson

Date:

28 December 2000, 10:38:02

Ron Peterson wrote:
> 
> This structure is more 'normal' in the sense that nodes without children
> (in a tree, the leaf nodes) don't have records in the edge table.

Phghpth.  Should have had my coffee first.  The first data structure
given would only have a null parent id for the root node, not all the
leaf nodes.  My mistake.  Thought it might be politic to point that out
before someone (correctly) called me an idiot.

-Ron-

Re: How to represent a tree-structure in a relational database

From

Ron Peterson

Date:

29 December 2000, 22:38:05

Ron Peterson wrote:
> 
> CREATE TABLE category_edge (
>         parent  INTEGER
>                 NOT NULL
>                 REFERENCES category_node(id),
> 
>         child   INTEGER
>                 NOT NULL
>                 REFERENCES category_node(id)
> );

Just for the sake of anal-retentive completeness, I'd like to point out
that you'd probably want an id field in this table also.  Plus what the
heck else am I going to do on Christmas break?  ;)

On a completely unrelated topic: getting the PostgreSQL discussions
lists on a news server is great!!!  I was overwhelmed with mail that I
usually don't have time to deal with.  Now when I have a chance, I just
go see what's up on the news server.  Excellent and thanks!

-Ron-

Re: How to represent a tree-structure in a relational database

From

Ron Peterson

Date:

29 December 2000, 22:39:03

Stuart Statman wrote:
> 
> I would suggest, instead, to create a table that represents your hierarchy
> without adding columns. For example :
> 
> create table Category (
> CategoryID       int4  not null  primary key,
> ParentCategoryID int4  not null  REFERENCES Category (CategoryID),
> CategoryName     varchar(100)
> );

Another possibility would be to use two tables to represent the data
structure.

CREATE SEQUENCE category_node_id_seq;
CREATE TABLE category_node (name    TEXT    NOT NULL,
id    INTEGER    DEFAULT NEXTVAL('category_node_id_seq')    PRIMARY KEY
);

CREATE TABLE category_edge (parent    INTEGER    NOT NULL    REFERENCES category_node(id),
child    INTEGER    NOT NULL    REFERENCES category_node(id)
);

This structure is more 'normal' in the sense that nodes without children
(in a tree, the leaf nodes) don't have records in the edge table.

What either of these structures allow to do is create directed graph
structures.  If you'd like to constrain this structure to be a tree, you
have to enforce that restriction with procedural code.

-Ron-