Thread: How to represent a tree-structure in a relational database
I am just thinking about the data model for a little content management system that I am currently planning. Individual articles are sorted under different categories which branch into subcategories, sub-subcategories etc. up to a depth of about 6 or 7 levels. The structure should be extensible, i.e. it must be possible to add levels. What I am thinking now is that you would keep the index in a separate index table (linked with the primary key in the articles table), which would have 6 or 7 fields initially, and that you'd add columns with the alter table command, if need be, to make the structure deeper. Is this the recommended way to go about it? It feels pretty 'right' to me now but since the problem should be fairly common, there must be other people who have thought and written about it and there might even be a recognized 'optimal' solution to the problem. Comments? - Frank
Frank, Please look in the list archives. About 2 months ago this topic came up and was discussed extensively (including a creative solution by yours truly). -Josh Berkus -- ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 436-9166 for law firms, small businesses fax 436-0137 and non-profit organizations. pager 338-4078 San Francisco
On Wed, Dec 13, 2000 at 04:48:47PM +0100, Frank Joerdens allegedly wrote: > I am just thinking about the data model for a little content management system that I am > currently planning. Individual articles are sorted under different categories which branch > into subcategories, sub-subcategories etc. up to a depth of about 6 or 7 levels. The > structure should be extensible, i.e. it must be possible to add levels. What I am thinking > now is that you would keep the index in a separate index table (linked with the primary > key in the articles table), which would have 6 or 7 fields initially, and that you'd add > columns with the alter table command, if need be, to make the structure deeper. Is this > the recommended way to go about it? It feels pretty 'right' to me now but since the > problem should be fairly common, there must be other people who have thought and written > about it and there might even be a recognized 'optimal' solution to the problem. > > Comments? Yeah. I've built something similar. The way I've done it: Give each record a unique ID (generated with a sequence) and store the records in a table. Create asecond table in which you store parent id-child id combinations. So: 1 - Automotive transport 2 - Cars 3 - Motorcycles Store in the table: 1-2 1-3 There's one main category (Automotive transport) which has two sub-categories: Cars & Motorcyles The way I'd do it if I had to do it again: Give each record a unique id, generated by the application. Denote levels withextra letters. So: AA - Automotive transport AAAA - Cars AAAB - Motorcycles The structures has the added bonus of making it very easy to determine all the sub-categories of a category, no matter howdeep the tree is below the category you're looking at. With the first approach it is not possible to do this in a singleSQL query. You could do this with a function, I guess. I hope this is of some use to you. Cheers, Mathijs -- "Borrowers of books -- those mutilators of collections, spoilers of thesymmetry of shelves, and creators of odd volumes." Charles Lamb (1775-1834)
I once started writing a small paper on this subject; it is still in a rather preliminary state. You can download the draft (and some ill documented code, 53kB) from http://www.utdt.edu/~mig/sql-trees Miguel >>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<< On 12/13/00, 12:48:47 PM, Frank Joerdens <frank@joerdens.de> wrote regarding [SQL] How to represent a tree-structure in a relational database: > I am just thinking about the data model for a little content management system that I am > currently planning. Individual articles are sorted under different categories which branch > into subcategories, sub-subcategories etc. up to a depth of about 6 or 7 levels. The > structure should be extensible, i.e. it must be possible to add levels. What I am thinking > now is that you would keep the index in a separate index table (linked with the primary > key in the articles table), which would have 6 or 7 fields initially, and that you'd add > columns with the alter table command, if need be, to make the structure deeper. Is this > the recommended way to go about it? It feels pretty 'right' to me now but since the > problem should be fairly common, there must be other people who have thought and written > about it and there might even be a recognized 'optimal' solution to the problem. > Comments? > - Frank
On Wed, Dec 13, 2000 at 11:38:18AM -0800, Stuart Statman wrote: [ . . . ] > I would suggest, instead, to create a table that represents your hierarchy > without adding columns. For example : > > create table Category ( > CategoryID int4 not null primary key, > ParentCategoryID int4 not null REFERENCES Category (CategoryID), > CategoryName varchar(100) > ); > > Add a CategoryID with an FK reference to this table, and your work is done. > > Then adding, inserting, removing, or moving layers in the hierarchy becomes > quite simple. This also preserves hierarchical integrity, where subcategory > a of subcategory b will also remain a subcategory of category c if > subcategory b is a subcategory of subcategory c, where I'm not sure your > model will preserve or guarantee that. (Does that sentence deserve a prize?) Cool. That looks like my solution. I had actually seen it someplace before, but didn't make the connection with my problem. Ta, Frank
On Wed, Dec 13, 2000 at 11:04:13AM -0800, Josh Berkus wrote: > Frank, > > Please look in the list archives. About 2 months ago this topic came > up and was discussed extensively (including a creative solution by yours > truly). Hm, neither my archives nor a search on the postgresql.org page turned up the thread you mention. Do you recall which list it was and what the title of the thread was? Thanks, Frank
Frank Joerdens wrote: > > On Wed, Dec 13, 2000 at 11:04:13AM -0800, Josh Berkus wrote: > > Frank, > > > > Please look in the list archives. About 2 months ago this topic came > > up and was discussed extensively (including a creative solution by yours > > truly). > > Hm, neither my archives nor a search on the postgresql.org page turned > up the thread you mention. Do you recall which list it was and what the > title of the thread was? > > Thanks, Frank yes i recall!! i managed to implement something to that effect a lot was done using postgres and perl anyone need code fragements?
> The way I'd do it if I had to do it again: > Give each record a unique id, generated by the application. > Denote levels with extra letters. > > So: > > AA - Automotive transport > AAAA - Cars > AAAB - Motorcycles > > The structures has the added bonus of making it very easy to > determine all the > sub-categories of a category, no matter how deep the tree is > below the category > you're looking at. With the first approach it is not possible > to do this in a > single SQL query. You could do this with a function, I guess. The problem with this method is if you need to insert a category, or move a category. You'll need to re-id a bunch of categories, and bubble those changes out to every table that refers to this table. Stuart Statman Director of Software Development Slam Media, Inc.
Attachment
> What I am thinking now is that you would keep the index > in a separate index table linked with the primary > key in the articles table), which would have 6 or 7 fields > initially, and that you'd add columns with the alter table > command, if need be, to make the structure deeper. I would suggest, instead, to create a table that represents your hierarchy without adding columns. For example : create table Category ( CategoryID int4 not null primary key, ParentCategoryID int4 not null REFERENCES Category (CategoryID), CategoryName varchar(100) ); Add a CategoryID with an FK reference to this table, and your work is done. Then adding, inserting, removing, or moving layers in the hierarchy becomes quite simple. This also preserves hierarchical integrity, where subcategory a of subcategory b will also remain a subcategory of category c if subcategory b is a subcategory of subcategory c, where I'm not sure your model will preserve or guarantee that. (Does that sentence deserve a prize?) In general, if you know that you will need to periodically alter a table to add columns, you should come up with a different model that doesn't require adding columns. Stuart Statman Director of Software Development Slam Media, Inc.
Attachment
Frank, etc: > > create table Category ( > > CategoryID int4 not null primary key, > > ParentCategoryID int4 not null REFERENCES Category (CategoryID), > > CategoryName varchar(100) > > ); That was it. I also gave an example of a UNION query that would display the whole category tree in ASCII format: I've done this before for one project. Here's what you do: CREATE TABLE sample_heirarchy ( unique_id SERIAL CONSTRAINT PRIMARY KEY, node_linkup INT4, node_level INT2, label VARCHAR(30) data whatever ); Then you use the unique_id and node_linkup fields to create a heirarchy of data nodes, with an indefinite number of levels, where the node_linkup of each lower level equals the id of its parent record. For example: id linkup level label data 3 0 1 Node1 Node1 4 3 2 Node1.1 Node1.1 6 3 2 Node1.2 Node1.2 7 6 3 Node1.2.1 Node1.2.1 5 0 1 Node2 Node2 etc. You can then access the whole heirarchy through moderately complex, but very fast-executing UNION queries. The one drawback is that you need to know in advance the maximum number of levels (3 in this example), but I'm sure someone on this list can find a way around that: SELECT n1.unique_id, n1.label, n1.data, n1.node_level, n1.unique_id AS level1, 0 AS level2, 0 AS level3 FROM sample_heirarchy n1 WHERE n1.node_level = 1 UNION ALL SELECT n2.unique_id, n2.label, n2.data, n2.node_level, n1.unique_id, n2.unique_id, 0 FROM sample_heirarchy n2, sample_heirarchy n1 WHERE n1.unique_id = n2.node_linkup AND n2.node_level = 2 UNION ALL SELECT n3.unique_id, n3.label, n3.data, n3.node_level, n1.unique_id, n2.unique_id, n3.unique_id FROM sample_heirarchy n1, sample_heirarchy n2, sample_heirarchy n3 WHERE n1.unique_id = n2.node_linkup AND n2.unique_id = n3.node_linkup AND n3.node_level = 3 ORDER BY level1, level2, level3 Should produce this output (pardon any parsing errors; I'm not at a PGSQL terminal right now): unique_id label data level level1 level2 level3 3 Node1 Node1 1 3 0 0 4 Node1.1 Node1.1 2 3 4 0 6 Node1.2 Node1.2 2 3 6 0 7 Node1.2.1 Node1.2.1 3 3 6 7 5 Node2 Node2 1 7 0 0 etc. This sorts them in numerical (id) order, but one could just as easily substitute the labels or data for the various levels and sort them alphabetically (although you do need to allow for NULL sort order on your database, and any label duplicates). The advantages of this structure are: 1. It allows you to create, assign, and re-assign nodes freely all over the heirarchy ... just change the level and/or linkup. 2. Aside from the Union query above, the table structure allows for any number of levels, unlike a set or relationally linked tables. 3. Because the display query is entirely once table linking to itself on (hopefully) indexed fields, in my expreience it runs very, very fast. 4. My PHP developer has reprogrammed the easily available PHP Tree Control to uses this table structure (I don't know if he's giving it out, but he said it wasn't very difficult). -Josh Berkus -- ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 436-9166 for law firms, small businesses fax 436-0137 and non-profit organizations. pager 338-4078 San Francisco
[Josh Berkus] > I've done this before for one project. Here's what you do: > > CREATE TABLE sample_heirarchy ( > unique_id SERIAL CONSTRAINT PRIMARY KEY, > node_linkup INT4, > node_level INT2, > label VARCHAR(30) > data whatever > ); > > Then you use the unique_id and node_linkup fields to create a heirarchy > of data nodes, with an indefinite number of levels, where the > node_linkup of each lower level equals the id of its parent record. For > example: > > id linkup level label data > 3 0 1 Node1 Node1 > 4 3 2 Node1.1 Node1.1 > 6 3 2 Node1.2 Node1.2 > 7 6 3 Node1.2.1 Node1.2.1 > 5 0 1 Node2 Node2 I don't think I'd be comfortable with having the node_level column in the table structure. First, because you can derive that value using a function, it's duplicate data. Second, if you decide to take an entire segment of your hierarchy and move it under another node (by changing the value of node_linkup/ParentCategoryID), you'll need to recalculate all of those node_level values. And all the node_level values underneath it. > You can then access the whole heirarchy through moderately complex, but > very fast-executing UNION queries. The one drawback is that you need to > know in advance the maximum number of levels (3 in this example), but > I'm sure someone on this list can find a way around that: I can think of another way to do this, though it would be a little complex and would involve temp tables. Select all of your top level nodes into a temp table. Create a new table with a new column for the new level. Select the children of the top level nodes into the temp table, followed by those top level nodes themselves, with a 0 in the new column and a flag indicating not to expand again. Create a new temp table just like the last but with another column for the new level, and repeat the above process from the first temp table to the second, only expanding the latest children, but copying all records over. Keep doing it until there are no more new children. Alternately, if you didn't need each level to have it's own column, but didn't mind an x.x.x.x kind of notation, you could use one temp table, and just append '.0' to the end of every copied-over parent node. Basically, both methods are simulations of recursing the tree, but you get to do each level all at once using an insert ... select. If you wanted, you could even use a counter, to identify which level each node appeared in. Clearly, this could also be done with cursors and recursive > 4. My PHP developer has reprogrammed the easily available PHP Tree > Control to uses this table structure (I don't know if he's giving it > out, but he said it wasn't very difficult). We've done a similar thing for Java. It was ridiculously easy to create a TreeModel wrapped around this data. Almost too easy; it made me feel dirty. Stuart Statman Director of Software Development Slam Media, Inc.
Attachment
Stuart, > I don't think I'd be comfortable with having the node_level column in the > table structure. First, because you can derive that value using a function, > it's duplicate data. Second, if you decide to take an entire segment of your > hierarchy and move it under another node (by changing the value of > node_linkup/ParentCategoryID), you'll need to recalculate all of those > node_level values. And all the node_level values underneath it. I can see that. I suppose it depends on the data you're storing. The project I was working on tracked grocery inventory for a delivery service, and thus each item had a fixed "level" in the heirarcy (Food Class, Food Type, Manufacturer, and Item) and thus while items might get reassigned *across* the heirarcy, they did not get re-assigned *up and down* the heirarcy. Also, I can't think of a way to represent the tree in pure SQL without having the level identifiers (and a fixed number of levels). > We've done a similar thing for Java. It was ridiculously easy to create a > TreeModel wrapped around this data. Almost too easy; it made me feel dirty. Great. Maybe I'll buy it from you if I ever need to use Java :-) -Josh -- ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 436-9166 for law firms, small businesses fax 436-0137 and non-profit organizations. pager 338-4078 San Francisco
On Wed, Dec 13, 2000 at 12:09:06PM -0800, Stuart Statman allegedly wrote: > > The way I'd do it if I had to do it again: > > Give each record a unique id, generated by the application. > > Denote levels with extra letters. > > > > So: > > > > AA - Automotive transport > > AAAA - Cars > > AAAB - Motorcycles > > > > The structures has the added bonus of making it very easy to > > determine all the > > sub-categories of a category, no matter how deep the tree is > > below the category > > you're looking at. With the first approach it is not possible > > to do this in a > > single SQL query. You could do this with a function, I guess. > > The problem with this method is if you need to insert a category, or move a > category. You'll need to re-id a bunch of categories, and bubble those > changes out to every table that refers to this table. You can solve the last problem by using an extra table that maps unique record id's (numerical) to hierarchical id's (text). Inserting a category, in my case, does not require me to start updating lots of records, since I would only use these strings to store hierarchical information. I'm sorting categories based on alphabet, which you can overrule by increasing the 'weight' of a category, which is a numerical value attached to every category and which normally has a vaue of 1. However, changing the level of a category would require me to modify all categories below that. In my case, this wouldn't be a problem. We're using this stuff for a Yahoo style directory which (atm) has about 2500 different categories. I'm generating a complete tree of all categories and the websites in them once a day, storing them in a souped up DBM style database. For each record I store the children, not the parent. If changing the underlying structure takes a couple of minutes, than this is acceptable. As you can see my number of categories is rather small. If you're going to use this for a forum or something similar, you may run into problems. However, how often do you want to move a thread... Cheers, Mathijs -- "Where is human nature so weak as in a bookstore!" Henry Ward Beecher (1813-1887)
On Wed, Dec 13, 2000 at 04:49:51PM -0800, Josh Berkus allegedly wrote: > Stuart, > > > I don't think I'd be comfortable with having the node_level column in the > > table structure. First, because you can derive that value using a function, > > it's duplicate data. Second, if you decide to take an entire segment of your > > hierarchy and move it under another node (by changing the value of > > node_linkup/ParentCategoryID), you'll need to recalculate all of those > > node_level values. And all the node_level values underneath it. > > I can see that. I suppose it depends on the data you're storing. The > project I was working on tracked grocery inventory for a delivery > service, and thus each item had a fixed "level" in the heirarcy (Food > Class, Food Type, Manufacturer, and Item) and thus while items might get > reassigned *across* the heirarcy, they did not get re-assigned *up and > down* the heirarcy. Indeed. If the structure 'rarely' changes, having to do an expensive update may be acceptable, if it increase the overall performance significantly. > Also, I can't think of a way to represent the tree in pure SQL without > having the level identifiers (and a fixed number of levels). Storing only the parent for a record doesn't require you to keep track of levels, since this information can be reconstructed by following the chain of parent id's until you reach the top of your tree. Storing the children for each record (like I'm doing) works exactly the same. Just follow the 'path' (for instance 'Automotive Transport/Cars') to find the category you're looking for. Cheers, Mathijs -- "A book is a fragile creature. It suffers the wear of time,it fears rodents, the elements, clumsy hands." UmbertoEco
Don't if this will help but there is a really good book that discuss this problem in details. The book is called "SQL for Smarties" by Joe Celko. It covers lots of advance topics (tree being one of them). Very good book. Check out on Amazon: http://www.amazon.com/exec/obidos/ASIN/1558605762/qid=976755796/sr=1-1/106-0241434-0557209 Just me $0.02 > -----Original Message----- > From: pgsql-sql-owner@postgresql.org > [mailto: Behalf Of Josh Berkus > Sent: Thursday, December 14, 2000 10:50 AM > To: sqllist > Subject: Re: [SQL] How to represent a tree-structure in a relational > database > > > Stuart, > > > I don't think I'd be comfortable with having the node_level column in the > > table structure. First, because you can derive that value using a function, > > it's duplicate data. Second, if you decide to take an entire segment of your > > hierarchy and move it under another node (by changing the value of > > node_linkup/ParentCategoryID), you'll need to recalculate all of those > > node_level values. And all the node_level values underneath it. > > I can see that. I suppose it depends on the data you're storing. The > project I was working on tracked grocery inventory for a delivery > service, and thus each item had a fixed "level" in the heirarcy (Food > Class, Food Type, Manufacturer, and Item) and thus while items might get > reassigned *across* the heirarcy, they did not get re-assigned *up and > down* the heirarcy. > > Also, I can't think of a way to represent the tree in pure SQL without > having the level identifiers (and a fixed number of levels). > > > We've done a similar thing for Java. It was ridiculously easy to create a > > TreeModel wrapped around this data. Almost too easy; it made me feel dirty. > > Great. Maybe I'll buy it from you if I ever need to use Java :-) > > -Josh > > -- > ______AGLIO DATABASE SOLUTIONS___________________________ > Josh Berkus > Complete information technology josh@agliodbs.com > and data management solutions (415) 436-9166 > for law firms, small businesses fax 436-0137 > and non-profit organizations. pager 338-4078 > San Francisco
Mathijs Brands wrote: > > On Wed, Dec 13, 2000 at 04:49:51PM -0800, Josh Berkus allegedly wrote: > > Stuart, > > > > > I don't think I'd be comfortable with having the node_level column in the > > > table structure. First, because you can derive that value using a function, > > > it's duplicate data. Second, if you decide to take an entire segment of your > > > hierarchy and move it under another node (by changing the value of > > > node_linkup/ParentCategoryID), you'll need to recalculate all of those > > > node_level values. And all the node_level values underneath it. > > > > I can see that. I suppose it depends on the data you're storing. The > > project I was working on tracked grocery inventory for a delivery > > service, and thus each item had a fixed "level" in the heirarcy (Food > > Class, Food Type, Manufacturer, and Item) and thus while items might get > > reassigned *across* the heirarcy, they did not get re-assigned *up and > > down* the heirarcy. > > Indeed. If the structure 'rarely' changes, having to do an expensive > update may be acceptable, if it increase the overall performance > significantly. > > > Also, I can't think of a way to represent the tree in pure SQL without > > having the level identifiers (and a fixed number of levels). > > Storing only the parent for a record doesn't require you to keep track > of levels, since this information can be reconstructed by following the > chain of parent id's until you reach the top of your tree. > > Storing the children for each record (like I'm doing) works exactly the > same. Just follow the 'path' (for instance 'Automotive Transport/Cars') > to find the category you're looking for. > > Cheers, > > Mathijs > -- > "A book is a fragile creature. It suffers the wear of time, > it fears rodents, the elements, clumsy hands." > Umberto Eco this is the way i implemented too! i used perl and modperl/apache to deploy so i did use a perl module to store the top level of categories so they where always avail to the starting page mostly it ended up looking like the bidders edge horizontal line type thing again if anyone needs the code lemme know!
On Wednesday 13 December 2000 18:05, Josh Berkus wrote: > Frank, etc: > > > create table Category ( > > > CategoryID int4 not null primary key, > > > ParentCategoryID int4 not null REFERENCES Category (CategoryID), > > > CategoryName varchar(100) > > > ); I made a message board with a hierarchy for topics/boards under which messages go into a post/reply hierarchy at www.comptechnews.com. I used the parent_id idea like in the table above. It's working OK and, yes, I can easily move nodes around without problems. I can move/reparent Topic/boards and posts/replies (posts and replies are treated nearly the same, just that a post has a parent-post-id of 0 while a reply message's parent-post-id is non-zero). This arrangement combined with the use of PL/pgSQL trigger functions can go a long way. I did not use any foreign keys but just had the PL/pgSQL triggers do any checks I wanted myself. So, its easy to make triggers that do things like automatically delete an entire hierarchy when a node(message or topic) is deleted. Like, if you delete a post, it deletes all its replies automatically. If you delete a reply, it deletes any child/replies that it might have etc. If you delete a topic/board, it deletes all messages that were in that topic and any subtopics and messages in those subtopics recursively. The PL/pgSQL can be used to RAISE EXCEPTION when something you don't want to happen is attempted (like deleting the root topic!), which then automatically ABORTs the transaction. You will want to make heavy use of procedures in the database to ensure integrity. You might be able to use FOREIGN KEY triggers also but with careful use of BEFORE and AFTER PL/pgSQL triggers. In my case, I had somekind of conflict between what my PL/pgSQL triggers where doing and what the foreign key triggers were doing. The stuff my PL/pgSQL were doing caused referential integrity violations sometimes. You have to be real careful what goes into a BEFORE trigger and what goes into an AFTER trigger. Trying to make a BEFORE trigger or an AFTER trigger do too much will end up in trouble so you have to split what you want to do into two triggers for a table. The CONSTRAINT TRIGGERs that get created automatically by FOREIGN KEYs are called AFTER INSERT OR UPDATE|DELETE and are NOT DEFERRABLE and INITIALLY IMMEDIATE. You have to keep that in mind in writing your procedures. In the end, I judged the contraint triggers to interfere with what I wanted to do and removed them. > > That was it. I also gave an example of a UNION query that would display > the whole category tree in ASCII format: > > I've done this before for one project. Here's what you do: > > CREATE TABLE sample_heirarchy ( > unique_id SERIAL CONSTRAINT PRIMARY KEY, > node_linkup INT4, > node_level INT2, > label VARCHAR(30) > data whatever > ); > > Then you use the unique_id and node_linkup fields to create a heirarchy > of data nodes, with an indefinite number of levels, where the > node_linkup of each lower level equals the id of its parent record. For > example: > > id linkup level label data > 3 0 1 Node1 Node1 > 4 3 2 Node1.1 Node1.1 > 6 3 2 Node1.2 Node1.2 > 7 6 3 Node1.2.1 Node1.2.1 > 5 0 1 Node2 Node2 > > etc. > > You can then access the whole heirarchy through moderately complex, but > very fast-executing UNION queries. The one drawback is that you need to > know in advance the maximum number of levels (3 in this example), but > I'm sure someone on this list can find a way around that: > > SELECT n1.unique_id, n1.label, n1.data, n1.node_level, n1.unique_id AS > level1, > 0 AS level2, 0 AS level3 > FROM sample_heirarchy n1 > WHERE n1.node_level = 1 > UNION ALL > SELECT n2.unique_id, n2.label, n2.data, n2.node_level, n1.unique_id, > n2.unique_id, 0 > FROM sample_heirarchy n2, sample_heirarchy n1 > WHERE n1.unique_id = n2.node_linkup > AND n2.node_level = 2 > UNION ALL > SELECT n3.unique_id, n3.label, n3.data, n3.node_level, n1.unique_id, > n2.unique_id, n3.unique_id > FROM sample_heirarchy n1, sample_heirarchy n2, sample_heirarchy n3 > WHERE n1.unique_id = n2.node_linkup AND > n2.unique_id = n3.node_linkup > AND n3.node_level = 3 > ORDER BY level1, level2, level3 > > Should produce this output (pardon any parsing errors; I'm not at a > PGSQL terminal right now): > > unique_id label data level level1 level2 level3 > 3 Node1 Node1 1 3 0 0 > 4 Node1.1 Node1.1 2 3 4 0 > 6 Node1.2 Node1.2 2 3 6 0 > 7 Node1.2.1 Node1.2.1 3 3 6 7 > 5 Node2 Node2 1 7 0 0 > etc. > > This sorts them in numerical (id) order, but one could just as easily > substitute the labels or data for the various levels and sort them > alphabetically (although you do need to allow for NULL sort order on > your database, and any label duplicates). > > The advantages of this structure are: > 1. It allows you to create, assign, and re-assign nodes freely all over > the heirarchy ... just change the level and/or linkup. > 2. Aside from the Union query above, the table structure allows for any > number of levels, unlike a set or relationally linked tables. > 3. Because the display query is entirely once table linking to itself on > (hopefully) indexed fields, in my expreience it runs very, very fast. > 4. My PHP developer has reprogrammed the easily available PHP Tree > Control to uses this table structure (I don't know if he's giving it > out, but he said it wasn't very difficult). > > -Josh Berkus -- -------- Robert B. Easter reaster@comptechnews.com --------- - CompTechNews Message Board http://www.comptechnews.com/ - - CompTechServ Tech Services http://www.comptechserv.com/ - ---------- http://www.comptechnews.com/~reaster/ ------------
There actually is a model of tree structures in SQL databases which is different from those mentioned earlier in that it represents the tree as nested sets (ie. nodes are subsets of parent sets (parent nodes)). There is a huge advantage in this model as it eliminates the need for recursion. For example, if you want to get a specific node's parents, grandparents and all other parents up the tree, you can do this in _one single_ SELECT query. If you just stored the the id of the parent of each node, you would need (n-1) queries if the node is at the n-th level. However, inserting or deleting nodes is more complex this way since you have to keep the data structure "balanced". But in most cases you won't insert new nodes all the time so you won't suffer from this overhead. And on the other hand, the performance gain on queries might be enormous. You can find the article dealing with this at http://www.utdt.edu/~mig/sql-trees Actually what I did was to implement _both_ models at the same time. I took the nested set structure, and besides stored the parent id of all nodes as well. The benefit of this is that by accepting a little more overhead during inserts, I was able to use the advantages of both models. If somebody is interested in it, I might share the code (a few tables and some plpgsql functions). But the article is surely worth a read. Zsolt ps: I found the link a few months ago in the pgsql-sql archive :-) On Wed, 13 Dec 2000, Josh Berkus wrote: > Stuart, > > > I don't think I'd be comfortable with having the node_level column in the > > table structure. First, because you can derive that value using a function, > > it's duplicate data. Second, if you decide to take an entire segment of your > > hierarchy and move it under another node (by changing the value of > > node_linkup/ParentCategoryID), you'll need to recalculate all of those > > node_level values. And all the node_level values underneath it. > > I can see that. I suppose it depends on the data you're storing. The > project I was working on tracked grocery inventory for a delivery > service, and thus each item had a fixed "level" in the heirarcy (Food > Class, Food Type, Manufacturer, and Item) and thus while items might get > reassigned *across* the heirarcy, they did not get re-assigned *up and > down* the heirarcy. > > Also, I can't think of a way to represent the tree in pure SQL without > having the level identifiers (and a fixed number of levels). > > > We've done a similar thing for Java. It was ridiculously easy to create a > > TreeModel wrapped around this data. Almost too easy; it made me feel dirty. > > Great. Maybe I'll buy it from you if I ever need to use Java :-) > > -Josh > > -- > ______AGLIO DATABASE SOLUTIONS___________________________ > Josh Berkus > Complete information technology josh@agliodbs.com > and data management solutions (415) 436-9166 > for law firms, small businesses fax 436-0137 > and non-profit organizations. pager 338-4078 > San Francisco >
somebody already showed table structure, but i'll ad some more code to this: table: CREATE TABLE groups ( id INT4 NOT NULL DEFAULT NEXTVAL('groups_seq'), parent_id INT4 NOT NULL DEFAULT0, name TEXT NOT NULL DEFAULT '', active BOOL NOT NULL DEFAULT 't'::bool, PRIMARY KEY (id) ); INSERT INTO groups (id) VALUES (0); ALTER TABLE groups ADD FOREIGN KEY (parent_id ) REFERENCES groups (id); CREATE UNIQUE INDEX groups_pn_u ON groups (parent_id, name, active); at this point it seems to be pretty easy and obvious. in my case i got to the point that i needed some more info about the branch of tree. so i wrote: REATE function getgrouppath(int4, text) returns text as ' DECLARE sep ALIAS FOR $2; aid int4; wynik TEXT; temp RECORD; b BOOL; BEGIN b:=''t''; wynik:=''''; aid:=$1; while b loop SELECT name, parent_id INTO temp FROM groupsWHERE id=aid; IF NOT FOUND THEN return wynik; END IF; if wynik = '''' THEN wynik:=temp.name; else wynik:=temp.name||sep||wynik; END if; IF temp.parent_id = 0 THEN b:=''f''; ELSE aid:=temp.parent_id; END if; end loop; return wynik; END; ' language 'plpgsql'; (sorry for polish variable names) this function does one nice trick when having structure like: => select id, parent_id, name, active from groups;id | parent_id | name | active ----+-----------+----------------------+-------- 0 | 0 | | t 1 | 0 | RTV | t 2 | 0 | AGD | t 3 | 0 | MP3 | t 4 | 1 | Audio | t 5 | 2 | Lodówki | t 6 | 2 | Kuchenki Mikrofalowe | t 7 | 4 | Sony | t 8 | 4 | Panasonic | t (9 rows) i can: => select id, parent_id, name, active, getgrouppath(id, '/') from groups;id | parent_id | name | active | getgrouppath ----+-----------+----------------------+--------+-------------------------- 0 | 0 | | t | 1 | 0 | RTV | t | RTV 2 | 0 | AGD | t | AGD 3 | 0| MP3 | t | MP3 4 | 1 | Audio | t | RTV/Audio 5 | 2 | Lodówki | t | AGD/Lodówki 6 | 2 | Kuchenki Mikrofalowe | t | AGD/Kuchenki Mikrofalowe 7 | 4 | Sony | t | RTV/Audio/Sony 8 | 4 | Panasonic | t | RTV/Audio/Panasonic since for some reasons (indenting) i needed the level of branch i wrote: CREATE FUNCTION grouplevel(int4) returns int4 AS ' DECLARE baseid ALIAS FOR $1; currid INT4; reply INT4; BEGIN reply:=1; if baseid = 0 then return 0; END if; SELECT parent_id INTO currid FROM groups where id=baseid; while currid>0loop reply:=reply+1; SELECT parent_id INTO currid FROM groups where id=currid; END loop; return reply;END; ' language 'plpgsql'; which also seems pretty obvious. to be complete i wrote two triggers which made me happy: CREATE FUNCTION trg_recurs_act_g() RETURNS OPAQUE AS ' BEGIN IF NEW.active=''f''::bool and OLD.active=''t''::bool THEN UPDATE articles SET active=''f''::bool WHERE group_id=NEW.id; UPDATE groups SET active=''f''::bool WHERE parent_id=NEW.idand id<>0; ELSE IF NEW.active=''t''::bool and OLD.active=''f''::bool AND NEW.id<>0 THEN UPDATEgroups SET active=''t''::bool WHERE id=NEW.parent_id; END IF; END IF; RETURN NEW; END; ' LANGUAGE 'plpgsql'; CREATE FUNCTION trg_recurs_act_a() RETURNS OPAQUE AS ' BEGIN IF NEW.active=''t''::bool and OLD.active=''f''::bool THEN UPDATE groups SET active=''t''::bool WHERE id=NEW.group_id; END IF; RETURN NEW; END; ' LANGUAGE 'plpgsql'; CREATE TRIGGER groups_update_trg BEFORE UPDATE ON groups FOR EACH ROW EXECUTE PROCEDURE trg_recurs_act_g(); CREATE TRIGGER articles_update_trg BEFORE UPDATE ON articles FOR EACH ROW EXECUTE PROCEDURE trg_recurs_act_a(); as you can see those triggers use article table which structure is not important at this moment (let's assume it has id, group_id, name and active). i hope this code will help you a bit. depesz -- hubert depesz lubaczewski ------------------------------------------------------------------------ najwspanialszą rzeczą jaką dało nam nowoczesnespołeczeństwo, jest niesamowita wręcz łatwość unikania kontaktów z nim ...
Hi, miguel sofer wrote: > > I once started writing a small paper on this subject; it is still in a > rather preliminary state. > > You can download the draft (and some ill documented code, 53kB) from > http://www.utdt.edu/~mig/sql-trees ah, this looks very, very nice! on page 5ff you describe the Postgres implementation, but the URL (page 5 bottom) is't complete -- can i find the files somewhere? Included is a "tree_general.sql", but this seems not to be complete and not the same version as the ps-file (First draft, may 6, 2000): in the draft there is written about an base 160 encoding, tree_general.sql uses base 159 encoding ;) Ciao Alvar -- Alvar C.H. Freude | alvar.freude@merz-akademie.de Demo: http://www.online-demonstration.org/ | Mach mit! Blast-DE: http://www.assoziations-blaster.de/ | Blast-Dich-Fit Blast-EN: http://www.a-blast.org/ | Blast/english
Hi, > > > I once started writing a small paper on this subject; it is still in a > > rather preliminary state. > > > > You can download the draft (and some ill documented code, 53kB) from > > http://www.utdt.edu/~mig/sql-trees > i guess, with base 160 encoding there might be a problem: if postgres is compiled with --enable-locale (e.g. for german umlauts), the ordering isn't according to the ASCII number of the character, so for this purpose it's needed to build the encoding table according to the locate settings. Or simply sort it according the locale settings. What's against using all characters >= 32, excluding special characters with special meaninbg in LIKE and regexps? With base 208 encoding it's possible to have 43264 elements on each level. Ciao Alvar -- Alvar C.H. Freude | alvar.freude@merz-akademie.de Demo: http://www.online-demonstration.org/ | Mach mit! Blast-DE: http://www.assoziations-blaster.de/ | Blast-Dich-Fit Blast-EN: http://www.a-blast.org/ | Blast/english
On Thu, 14 Dec 2000, Tulassay Zsolt wrote: > > You can find the article dealing with this at > http://www.utdt.edu/~mig/sql-trees > sorry i pasted in the wrong url (this was mentioned in an earlier post) the correct one is: A look at SQL Trees (by Joe Celko) http://www.dbmsmag.com/9603d06.html Zsolt Tulassay
> > I once started writing a small paper on this subject; it is still in a > > rather preliminary state. > > > > You can download the draft (and some ill documented code, 53kB) from > > http://www.utdt.edu/~mig/sql-trees > ah, this looks very, very nice! Thanks. > on page 5ff you describe the Postgres implementation, but the URL (page > 5 bottom) is't complete -- can i find the files somewhere? > Included is a "tree_general.sql", but this seems not to be complete and > not the same version as the ps-file (First draft, may 6, 2000): in the > draft there is written about an base 160 encoding, tree_general.sql uses > base 159 encoding ;) Sorry, I never got around to completing this, or thinking any further. My other files are definitely not in a usable state right now. I hope to be able to improve things over the (southern) summer holidays, so there may be news soon - but do not hold your breadth! I can't remember why I switched from base 160 to base 159; my guess now is that I got confused at coding time between the base and the maximal number (base-1): ie, it may be a mistake. > What's against using all characters >= 32, excluding special characters > with special meaning in LIKE and regexps? With base 208 encoding it's > possible to have 43264 elements on each level. Nothing, I guess. I probably got some kind of "start counting at zero" blockage when I started, and never really looked back on it, my shame. Hey, I told you it was rather preliminary ... Thanks for pointing it out. > i guess, with base 160 encoding there might be a problem: if postgres is > compiled with --enable-locale (e.g. for german umlauts), the ordering > isn't according to the ASCII number of the character, so for this > purpose it's needed to build the encoding table according to the locate > settings. Or simply sort it according the locale settings. Yes indeed; never thought about that one. Cheers Miguel
Hi, miguel sofer wrote: > > Sorry, I never got around to completing this, or thinking any further. My > other files are definitely not in a usable state right now. I hope to be > able > to improve things over the (southern) summer holidays, so there may be > news > soon - but do not hold your breadth! OK, no prob. I start to implement smth in Perl, so if it works ok and someone is interested in it i can post it here. and I'll try to use a kind of base255 encoding; it seams that postgres only dislike the 0 byte, so if this works there is no real encoding necessary -- simply start at 1 instead of 0. I hoppe this works :) Ciao Alvar -- Alvar C.H. Freude | alvar.freude@merz-akademie.de Demo: http://www.online-demonstration.org/ | Mach mit! Blast-DE: http://www.assoziations-blaster.de/ | Blast-Dich-Fit Blast-EN: http://www.a-blast.org/ | Blast/english
Ron Peterson wrote: > > This structure is more 'normal' in the sense that nodes without children > (in a tree, the leaf nodes) don't have records in the edge table. Phghpth. Should have had my coffee first. The first data structure given would only have a null parent id for the root node, not all the leaf nodes. My mistake. Thought it might be politic to point that out before someone (correctly) called me an idiot. -Ron-
Ron Peterson wrote: > > CREATE TABLE category_edge ( > parent INTEGER > NOT NULL > REFERENCES category_node(id), > > child INTEGER > NOT NULL > REFERENCES category_node(id) > ); Just for the sake of anal-retentive completeness, I'd like to point out that you'd probably want an id field in this table also. Plus what the heck else am I going to do on Christmas break? ;) On a completely unrelated topic: getting the PostgreSQL discussions lists on a news server is great!!! I was overwhelmed with mail that I usually don't have time to deal with. Now when I have a chance, I just go see what's up on the news server. Excellent and thanks! -Ron-
Stuart Statman wrote: > > I would suggest, instead, to create a table that represents your hierarchy > without adding columns. For example : > > create table Category ( > CategoryID int4 not null primary key, > ParentCategoryID int4 not null REFERENCES Category (CategoryID), > CategoryName varchar(100) > ); Another possibility would be to use two tables to represent the data structure. CREATE SEQUENCE category_node_id_seq; CREATE TABLE category_node (name TEXT NOT NULL, id INTEGER DEFAULT NEXTVAL('category_node_id_seq') PRIMARY KEY ); CREATE TABLE category_edge (parent INTEGER NOT NULL REFERENCES category_node(id), child INTEGER NOT NULL REFERENCES category_node(id) ); This structure is more 'normal' in the sense that nodes without children (in a tree, the leaf nodes) don't have records in the edge table. What either of these structures allow to do is create directed graph structures. If you'd like to constrain this structure to be a tree, you have to enforce that restriction with procedural code. -Ron-