Thread: Storing a tree
Hello, I have found a problem today to which I am unable to find the solution. I write to this list looking for help. I have been using PostgreSQL for about a year or so, and I manage a quite large database. I usually design extensions to it, create new tables and views, indexes, and many more. In short, it's not my first database application. I would like to store a tree in the database. A tree much like a directory tree, with un unknown depth. However, in my case, the order of the leafs (left to right) is important. I tried to implement the tree as a table: CREATE TABLE tree ( father_id int, son_id int ); Then, I can easily find all sons for a father. Even in correct order: SELECT son_id FROM tree WHERE father_id=1234 ORDER BY son_id; Even if I wish to use another ordering, I could: CREATE TABLE tree (father_id int, son_id int ); CREATE TABLE people ( person_id int, name text, age int ); and then SELECT people.person_id, people.name, people.age FROM tree, people WHERE tree.father_id=1234 AND tree.son_id=people.person_id ORDER BY people.age; For one generation, all works well. I could also extend that up to 2 generations, but not until the sons have no more sons. Could someone help me find a way to output the data in the following way ? Peter, 90 John (Peter's son), 65 Richard (John's son), 44 William (John's son), 45 Philip (William's son), 20 Tony (Peter's son), 70 Other (Tony's son), 50 Two things are crucial: ORDER and MULTIPLE GENERATIONS. The genealogic example is given only to avoid explaining the complexity of our application design. We are a company specialized in web server performance monitoring, so genealogic studies is not our core business ;-) This way, I thought I could simplify the understanding of the problem. I am certain that others have been faced to similar problems, so probably someone may help me. Thank you all for any lights you can shed on this problem. Antonio Fiol P.S. Other similar problems I can think of: Relation Boss --> Employee (though depth is finite in this case). History of people owning an object (though only one "son" per "father", so no ordering issue) A directory tree (without any files).
Antonio Fiol Bonnín wrote: > > I would like to store a tree in the database. A tree much like a > directory tree, with un unknown depth. > > However, in my case, the order of the leafs (left to right) is > important. > > I tried to implement the tree as a table: > > CREATE TABLE tree ( father_id int, son_id int ); There may be a better model for your data: the adjacency list model. Take a look at http://www.intelligententerprise.com/001020/celko.shtml If you want to implement it and you need/want to write stored procdures and/or functions to do the most common list manipulation tasks we might be able to work something out. I am a bit short of time at the moment, but I believe it would be great if this stuff ended up in /contrib/ (or is it there already?), and I am willing to invest time in that (but I am leaving on holiday next week). For more code, look at http://www.codebits.com/ntm/ (stored procedures for using this model with MS SQL Server). Jochem
Instead of the adjacency model, you can try the nested sets one. Here is the Celko's article on this issue: The usual example of a tree structure in SQL books is called an adjacency list model and it looks like this: CREATE TABLE Personnel (emp CHAR(10) NOT NULL PRIMARY KEY, boss CHAR(10) DEFAULT NULL REFERENCES Personnel(emp), salary DECIMAL(6,2) NOT NULL DEFAULT 100.00); Personnel emp boss salary =========================== 'Albert' 'NULL' 1000.00 'Bert' 'Albert' 900.00 'Chuck' 'Albert' 900.00 'Donna' 'Chuck' 800.00 'Eddie' 'Chuck' 700.00 'Fred' 'Chuck' 600.00 Another way of representing trees is to show them as nested sets. Since SQL is a set oriented language, this is a better model than the usual adjacency list approach you see in most text books. Let us define a simple Personnel table like this, ignoring the left (lft) and right (rgt) columns for now. This problem is always given with a column for the employee and one for his boss in the textbooks. This table without the lft and rgt columns is called the adjacency list model, after the graph theory technique of the same name; the pairs of nodes are adjacent to each other. CREATE TABLE Personnel (emp CHAR(10) NOT NULL PRIMARY KEY, lft INTEGER NOT NULL UNIQUE CHECK (lft > 0), rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1), CONSTRAINT order_okay CHECK (lft < rgt) ); Personnel emp lft rgt ====================== 'Albert' 1 12 'Bert' 2 3 'Chuck' 4 11 'Donna' 5 6 'Eddie' 7 8 'Fred' 9 10 The organizational chart would look like this as a directed graph: Albert (1,12) / \ / \ Bert (2,3) Chuck (4,11) / | \ / | \ / | \ / | \ Donna (5,6) Eddie (7,8) Fred (9,10) The first table is denormalized in several ways. We are modeling both the personnel and the organizational chart in one table. But for the sake of saving space, pretend that the names are job titles and that we have another table which describes the personnel that hold those positions. Another problem with the adjacency list model is that the boss and employee columns are the same kind of thing (i.e. names of personnel), and therefore should be shown in only one column in a normalized table. To prove that this is not normalized, assume that "Chuck" changes his name to "Charles"; you have to change his name in both columns and several places. The defining characteristic of a normalized table is that you have one fact, one place, one time. The final problem is that the adjacency list model does not model subordination. Authority flows downhill in a hierarchy, but If I fire Chuck, I disconnect all of his subordinates from Albert. There are situations (i.e. water pipes) where this is true, but that is not the expected situation in this case. To show a tree as nested sets, replace the nodes with ovals, then nest subordinate ovals inside each other. The root will be the largest oval and will contain every other node. The leaf nodes will be the innermost ovals with nothing else inside them and the nesting will show the hierarchical relationship. The rgt and lft columns (I cannot use the reserved words LEFT and RIGHT in SQL) are what shows the nesting. If that mental model does not work, then imagine a little worm crawling anti-clockwise along the tree. Every time he gets to the left or right side of a node, he numbers it. The worm stops when he gets all the way around the tree and back to the top. This is a natural way to model a parts explosion, since a final assembly is made of physically nested assemblies that final break down into separate parts. At this point, the boss column is both redundant and denormalized, so it can be dropped. Also, note that the tree structure can be kept in one table and all the information about a node can be put in a second table and they can be joined on employee number for queries. To convert the graph into a nested sets model think of a little worm crawling along the tree. The worm starts at the top, the root, makes a complete trip around the tree. When he comes to a node, he puts a number in the cell on the side that he is visiting and increments his counter. Each node will get two numbers, one of the right side and one for the left. Computer Science majors will recognize this as a modified preorder tree traversal algorithm. Finally, drop the unneeded Personnel.boss column which used to represent the edges of a graph. This has some predictable results that we can use for building queries. The root is always (left = 1, right = 2 * (SELECT COUNT(*) FROM TreeTable)); leaf nodes always have (left + 1 = right); subtrees are defined by the BETWEEN predicate; etc. Here are two common queries which can be used to build others: 1. An employee and all their Supervisors, no matter how deep the tree. SELECT P2.* FROM Personnel AS P1, Personnel AS P2 WHERE P1.lft BETWEEN P2.lft AND P2.rgt AND P1.emp = :myemployee; 2. The employee and all subordinates. There is a nice symmetry here. SELECT P2.* FROM Personnel AS P1, Personnel AS P2 WHERE P1.lft BETWEEN P2.lft AND P2.rgt AND P2.emp = :myemployee; 3. Add a GROUP BY and aggregate functions to these basic queries and you have hierarchical reports. For example, the total salaries which each employee controls: SELECT P2.emp, SUM(S1.salary) FROM Personnel AS P1, Personnel AS P2, Salaries AS S1 WHERE P1.lft BETWEEN P2.lft AND P2.rgt AND P1.emp = S1.emp GROUP BY P2.emp; 4. To find the level of each node, so you can print the tree as an indented listing via a cursor. DECLARE PrintTree CURSOR FOR SELECT COUNT(P2.emp) AS indentation, P1.lft, P1.emp FROM Personnel AS P1, Personnel AS P2 WHERE P1.lft BETWEEN P2.lft AND P2.rgt GROUP BY P1.emp ORDER BY P1.lft; 5. The nested set model has an implied ordering of siblings which the adjacency list model does not. To insert a new node as the rightmost sibling. BEGIN DECLARE right_most_sibling INTEGER; SET right_most_sibling = (SELECT rgt FROM Personnel WHERE emp = :your_boss); UPDATE Personnel SET lft = CASE WHEN lft > right_most_sibling THEN lft + 2 ELSE lft END, rgt = CASE WHEN rgt >= right_most_sibling THEN rgt + 2 ELSE rgt END WHERE rgt >= right_most_sibling; INSERT INTO Personnel (emp, lft, rgt) VALUES ('New Guy', right_most_sibling, (right_most_sibling + 1)) END; 6. To convert an adjacency list model into a nested set model, use a push down stack algorithm. Assume that we have these tables: -- Tree holds the adjacency model CREATE TABLE Tree (emp CHAR(10) NOT NULL, boss CHAR(10)); INSERT INTO Tree SELECT emp, boss FROM Personnel; -- Stack starts empty, will holds the nested set model CREATE TABLE Stack (stack_top INTEGER NOT NULL, emp CHAR(10) NOT NULL, lft INTEGER, rgt INTEGER); BEGIN ATOMIC DECLARE counter INTEGER; DECLARE max_counter INTEGER; DECLARE current_top INTEGER; SET counter = 2; SET max_counter = 2 * (SELECT COUNT(*) FROM Tree); SET current_top = 1; INSERT INTO Stack SELECT 1, emp, 1, NULL FROM Tree WHERE boss IS NULL; DELETE FROM Tree WHERE boss IS NULL; WHILE counter <= (max_counter - 2) LOOP IF EXISTS (SELECT * FROM Stack AS S1, Tree AS T1 WHERE S1.emp = T1.boss AND S1.stack_top = current_top) THEN BEGIN -- push when top has subordinates and set lft value INSERT INTO Stack SELECT (current_top + 1), MIN(T1.emp), counter, NULL FROM Stack AS S1, Tree AS T1 WHERE S1.emp = T1.boss AND S1.stack_top = current_top; DELETE FROM Tree WHERE emp = (SELECT emp FROM Stack WHERE stack_top = current_top + 1); SET counter = counter + 1; SET current_top = current_top + 1; END ELSE BEGIN -- pop the stack and set rgt value UPDATE Stack SET rgt = counter, stack_top = -stack_top -- pops the stack WHERE stack_top = current_top SET counter = counter + 1; SET current_top = current_top - 1; END IF; END LOOP; END; This approach will be two to three orders of magnitude faster than the adjacency list model for subtree and aggregate operations. "Antonio Fiol Bonn�n" <fiol@w3ping.com> wrote in message news:3BEA7B0E.A6EFA5E6@w3ping.com... > Hello, > > I have found a problem today to which I am unable to find the solution. > I write to this list looking for help. > > I have been using PostgreSQL for about a year or so, and I manage a > quite large database. I usually design extensions to it, create new > tables and views, indexes, and many more. In short, it's not my first > database application. > > I would like to store a tree in the database. A tree much like a > directory tree, with un unknown depth. > > However, in my case, the order of the leafs (left to right) is > important. > > I tried to implement the tree as a table: > > CREATE TABLE tree ( father_id int, son_id int ); > > Then, I can easily find all sons for a father. Even in correct order: > > SELECT son_id FROM tree WHERE father_id=1234 ORDER BY son_id; > > Even if I wish to use another ordering, I could: > > CREATE TABLE tree (father_id int, son_id int ); > CREATE TABLE people ( person_id int, name text, age int ); > and then > SELECT people.person_id, people.name, people.age FROM tree, people WHERE > tree.father_id=1234 AND tree.son_id=people.person_id > ORDER BY people.age; > > For one generation, all works well. I could also extend that up to 2 > generations, but not until the sons have no more sons. Could someone > help me find a way to output the data in the following way ? > > > Peter, 90 > John (Peter's son), 65 > Richard (John's son), 44 > William (John's son), 45 > Philip (William's son), 20 > Tony (Peter's son), 70 > Other (Tony's son), 50 > > Two things are crucial: ORDER and MULTIPLE GENERATIONS. > > The genealogic example is given only to avoid explaining the complexity > of our application design. We are a company specialized in web server > performance monitoring, so genealogic studies is not our core business > ;-) This way, I thought I could simplify the understanding of the > problem. > > I am certain that others have been faced to similar problems, so > probably someone may help me. > > Thank you all for any lights you can shed on this problem. > > Antonio Fiol > > P.S. Other similar problems I can think of: > > Relation Boss --> Employee (though depth is finite in this case). > > History of people owning an object (though only one "son" per "father", > so no ordering issue) > > A directory tree (without any files). > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> This approach will be two to three orders of magnitude faster than the > adjacency list model for subtree and aggregate operations. That's great, thanks for posting that! I take it this would NOT work well on things that are updated often, like threaded Net message boards with hundreds of thousands of messages...
If you consider the approach using multiple trees, it may have quite a good performance even for something like a threaded message board, if you think of each thread as a different tree. Then trees are not enormous, and so updates would not be so slow. Performance will be poor, however, when updating very large trees. Antonio Fiol Micah Yoder wrote: > > This approach will be two to three orders of magnitude faster than the > > adjacency list model for subtree and aggregate operations. > > That's great, thanks for posting that! > > I take it this would NOT work well on things that are updated often, like > threaded Net message boards with hundreds of thousands of messages... > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org
Dear all, This tree problem is very interesting. I need a solution for my own XML developments. What do you think of using triggers and PL/pgSQL to maintain a set of results? On any insert/update/delete, we would update the tree node and the set of results. Has anyone already experienced this? Best regards, Jean-Michel POURE
On Saturday 10 November 2001 10:08 am, Antonio Fiol Bonnín wrote: > If you consider the approach using multiple trees, it may have quite a good > performance even for something like a threaded message board, if you think > of each thread as a different tree. Then trees are not enormous, and so > updates would not be so slow. > > Performance will be poor, however, when updating very large trees. Ok, I guess you're right. I was thinking all messages in a large database would be in the same tree. Of course, they would all be in the same table, but it would just need a 'toplevel' field for the root message ID for the thread (which may itself need to be in another table), and with an index on that field it shouldn't be too bad. -- Like to travel? http://TravTalk.org Micah Yoder Internet Development http://yoderdev.com
Christian Meunier schrieb: > Instead of the adjacency model, you can try the nested sets one. > Here is the Celko's article on this issue: Hello, as that very interesting article was on [SQL] and I got no answer there to a question, I'm so impolite to send my question here again: To limit the result to entries below one node, I'd use something like SELECT P1.lft, COUNT(P2.emp) AS indentation, P1.emp FROM Personnel AS P1, Personnel AS P2 WHERE P1.lft BETWEEN P2.lft AND P2.rgt AND p1.lft>(SELECT lft FROM personnel WHERE emp='Chuck') AND p1.rgt<(SELECT rgt FROM personnel WHERE emp='Chuck') GROUP BY P1.emp, p1.lft ORDER BY P1.lft; lft | indentation | emp -----+-------------+------------ 5 | 3 | Donna 7 | 3 | Eddie 9 | 3 | Fred (3 rows) for emp='Albert' it returns lft | indentation | emp -----+-------------+------------ 2 | 2 | Bert 4 | 2 | Chuck 5 | 3 | Donna 7 | 3 | Eddie 9 | 3 | Fred (5 rows) How to limit this result to (Albert's indentation)+1? Thanks, Knut Sübert
Antonio Fiol Bonnín schrieb: > If you consider the approach using multiple trees, it may have quite a good > performance even for something like a threaded message board, if you think > of each thread as a different tree. Then trees are not enormous, and so > updates would not be so slow. > > Performance will be poor, however, when updating very large trees. Hello Antonio, how would you organize multiple trees? Multiple trees in one table made by multiple top level entries wouldn't help, as far as I understand that model. It seems more a question, if the update/insert is in the left or right wing of the tree - maybe it could help a bit to make the tree grow in both directions (lowering left to <0 or altering rgt), depending on where the insert happens. The hard and expensive things become easy and cheap -- and vice versa, I fear. Bye, Knut Sübert
On Thu, Nov 08, 2001 at 01:31:10PM +0100, Antonio Fiol Bonn?n wrote: > Hello, > > I have found a problem today to which I am unable to find the solution. > I write to this list looking for help. try this for something different: http://www.utdt.edu/~mig/sql-trees/ tinus
On my model, I did it like this. I am not sure about its correctness or minimality or cost-effectiveness. select t3.lft,t3.rgt from tree t1, tree t2, tree t3 where t1.lft=19 and t2.lft between t1.lft and t1.rgt and t3.lft between t2.lft+1 and t2.rgt-1 and t2.lft!=t3.lft group by t3.lft,t3.rgt having count(*)=1; This is provided "as-is". ;-) Condition "t1.lft=19" is to be substituted by your condition to choose the parent node. Good luck! If you get other interesting requests/results with that model, could you please post them to the list or mail them to me? Antonio Fiol knut.suebert@web.de wrote: > Christian Meunier schrieb: > > Instead of the adjacency model, you can try the nested sets one. > > Here is the Celko's article on this issue: > > Hello, > > as that very interesting article was on [SQL] and I got no answer > there to a question, I'm so impolite to send my question here again: > > To limit the result to entries below one node, I'd use something like > > SELECT P1.lft, COUNT(P2.emp) AS indentation, P1.emp > FROM Personnel AS P1, Personnel AS P2 > WHERE P1.lft BETWEEN P2.lft AND P2.rgt > AND p1.lft>(SELECT lft FROM personnel WHERE emp='Chuck') > AND p1.rgt<(SELECT rgt FROM personnel WHERE emp='Chuck') > GROUP BY P1.emp, p1.lft ORDER BY P1.lft; > > lft | indentation | emp > -----+-------------+------------ > 5 | 3 | Donna > 7 | 3 | Eddie > 9 | 3 | Fred > (3 rows) > > for emp='Albert' it returns > > lft | indentation | emp > -----+-------------+------------ > 2 | 2 | Bert > 4 | 2 | Chuck > 5 | 3 | Donna > 7 | 3 | Eddie > 9 | 3 | Fred > (5 rows) > > How to limit this result to (Albert's indentation)+1? > > Thanks, > Knut Sübert > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html
Well, if you CREATE TABLE tree ( treeid int, lft int, rgt int ); CREATE INDEX tree_idx ON tree (treeid); And then you issue a SELECT query having a WHERE clause that includes "treeid=1234", you will only get your 1234 tree. OTOH, you are required to include that "treeid=1234" on EVERY query, as nodes are not uniquely identified by "lft". Workaround: Include a (nodeid int) field you may also have an index on and then use the (unique) nodeid to locate your reference node. Then you will need to include a "t1.treeid=t2.treeid" or something like that, to avoid crossing trees on your requests. Wow! re-reading my e-mail I found that understanding my words may be very difficult. Sorry! Antonio Fiol knut.suebert@web.de wrote: > Antonio Fiol Bonnín schrieb: > > If you consider the approach using multiple trees, it may have quite a good > > performance even for something like a threaded message board, if you think > > of each thread as a different tree. Then trees are not enormous, and so > > updates would not be so slow. > > > > Performance will be poor, however, when updating very large trees. > > Hello Antonio, > > how would you organize multiple trees? > > Multiple trees in one table made by multiple top level entries > wouldn't help, as far as I understand that model. It seems more a > question, if the update/insert is in the left or right wing of the > tree - maybe it could help a bit to make the tree grow in both > directions (lowering left to <0 or altering rgt), depending on where > the insert happens. > > The hard and expensive things become easy and cheap -- and vice versa, > I fear. > > Bye, > Knut Sübert > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster
On Mon, 12 Nov 2001, gravity wrote: > On Thu, Nov 08, 2001 at 01:31:10PM +0100, Antonio Fiol Bonn?n wrote: > > Hello, > > > > I have found a problem today to which I am unable to find the solution. > > I write to this list looking for help. > > try this for something different: > > http://www.utdt.edu/~mig/sql-trees/ How I'm doing a tree structure in SQL is ... I'll just cut/paste my notes: --- Fast SQL tree or hierarchy structure where you have varying levels of parent and child relationships. Typical use include Internet portal category listings, family tree, filesystem structure, or organization classifications by company, division, and departments. TOP One big advantage of using this method is | you can search starting at any node andall +---O 01 it's branches or subnodes by using one query. | | In addition getting the path by traversing | +---O 0101 back to the top can be done with just one | | | query instead of many recursive queries. | | +---O 010101 | | | The left diagram shows the relationship | | +---O 010102 between each node and their associated paths. | | Here we are using 2 chars for each node. This | +---O 0102 gives us a max of (using base 36) 36 * 36 | immediate childs per node. Since SQL CHAR +---O 02 fields have a max of 255 chars, we can have | a max depth of 255 / 2 = 127. By varying the +---O 03 char width of each node, we can increase or decrease the depth at the cost or value of the number of child per depth. CHAR field type gives us the possibility of using indexes on the column. If we for-go the indexes, we could use TEXT fields for more depths and childs. With this approach, you can easily move, delete any branch in that tree or move it else where. Another interesting thing is given a node or path, you can trace back all the way to the top using just one query. First you'd break down the path by generations using substring (in PHP, Perl, C) and select using the IN clause. For example: Assume a table like: CREATE SEQUENCE next_cat_id; CREATE TABLE "categories" ( "rec_id" int4 DEFAULT nextval('next_cat_id') PRIMARY KEY, "path" varchar(10) DEFAULT '' NOT NULL, "name" varchar(64) DEFAULT '' NOT NULL ); Find the trail from current node to the TOP, we first break down the node Node 010102 => (01, 0101, 010102) Then when you SELECT branch, name FROM tree WHERE branch IN (01, 0101, 010102) ORDER BY branch ASC You'd get back results in order from the oldest parent to the youngest child. Then pull the result as an array into your app and walk it to the top to show the trail. If you want to select all immediate child under a node: SELECT branch, name FROM tree WHERE branch IS LIKE '01__' If you want to move a branch to another branch: UPDATE tree SET $pathField = ('$toPath' || "; SUBSTRING(path, CHARACTER_LENGTH('$fromPath') + 1)) WHERE path LIKE '$fromPath%'" To delete all the nodes under a branch '01': DELETE FROM tree WHERE path IS LIKE "01%" Another nice thing is that you can seach for what ever, you can use the " IS LIKE '$parentNode%' and search from that node to it's deepest child. ---