Re: Making a tree with "millions and millions" of dynamic nodes - Mailing list pgsql-general

From joe.celko@northface.edu (--CELKO--)
Subject Re: Making a tree with "millions and millions" of dynamic nodes
Date
Msg-id a264e7ea.0312101319.7de8adb7@posting.google.com
Whole thread Raw
In response to Making a tree with "millions and millions" of dynamic nodes  (Christian Fowler <google@NOSPAM.gravesweeper.com>)
List pgsql-general
>> The depth should not exceed 6 or 7 levels. The initial import will
have about 6 million leaves, and 3 million branches. I would expect
the leaves to grow significantly, in number easily tripling. <<

That is not too bad -- depth would worry me more than breadth

>> 1. path generation speed <<

That is pretty easy in a nested set model:

 SELECT T1.node, (T1.rgt-T1.lft) AS depth
   FROM Tree AS T1, Tree AS T2
  WHERE T2.lft BETWEEN T1.lft AND T1.rgt
    AND T2.node = :my_guy
 ORDER BY depth;

The greater (T1.rgt-T1.lft) is, the closer the node is to the root

You can also convert depth into a sequential number by using.

SELECT T2.node, COUNT (T1.node)AS depth
   FROM Tree AS T1, Tree AS T2
  WHERE T2.lft BETWEEN T1.lft AND T1.rgt
  GROUP BY T2.part;

>> 2. immediate sibling speed <<

SELECT B.node AS boss, P.node
  FROM Tree AS P
       LEFT OUTER JOIN
       Tree AS B
       ON B.lft
          = (SELECT MAX(lft)
               FROM Tree AS S
              WHERE P.lft > S.lft
                ND P.lft <S.rgt);

>> 3. children count speed <<

 SELECT :my_guy, (rgt - lft +1)/2 AS subtree_size
   FROM Tree
  WHERE node = :my_guy;

>> I have implemented a first run using Joe Celko's Nested Sets (w/ a
mod to store tree level for speed). The update propigation issue is
the achilles heel of this approach for me. <<

It is not as bad as people first think. The tree structure is in one
table and the nodes are in another, so you can manipulating of two
integers and one key (perhaps another integer).  Since new siblings
are added to the right side of the existing line of siblings under the
parent, you average a bit less than half of the nodes being scanned in
practice.

There are some other tricks, like leaving a large gap between (lft)
and (rgt) numbers so you have room to insert new nodes.

>> So it seems Materialized Path is my only option, however I am
concerned about LIKE performance for the right hand side of the tree,
where the path is 8 digits x 6 levels = 48 chars. Should I be
concerned? <<

It does not sound too bad, assuming an ordered index.  I wonder if you
could improve performance of the LIKE predicates by reversing the path
string ...

>> ...  some reassuring words about real-time performance of a 20
million row tree, i'd love to hear <<

Sorry, the most I have played with is a few hundred thousand rows with
the nested set model.

pgsql-general by date:

Previous
From: "Bob Badour"
Date:
Subject: Re: Making a tree with "millions and millions" of dynamic nodes
Next
From: "Ramesh Krishnan"
Date:
Subject: can someone point me to links for olap/multi-dimension database in postgresql site.