Re: Making a tree with "millions and millions" of dynamic nodes - Mailing list pgsql-general

From Mikito Harakiri
Subject Re: Making a tree with "millions and millions" of dynamic nodes
Date
Msg-id 3kczb.23$2A3.117@news.oracle.com
Whole thread Raw
In response to Making a tree with "millions and millions" of dynamic nodes  (Christian Fowler <google@NOSPAM.gravesweeper.com>)
List pgsql-general
"Christian Fowler" <google@NOSPAM.gravesweeper.com> wrote in message
news:6b-dnQJnDI6YvlCiXTWc-g@speakeasy.net...
> For selection, it is crucial for me to get:
>
> 1. path generation speed

Path to the root? It is coded explicitly in the materialized path. All you
need to do is parsing the path and generating a list of prefixes that you
use within your query like this:

select * from hierarchy where path in ('1','1.2','1.2.1', '1.2.1.1')

If you have an index, and your rdbms optimiser is smart enough, the query
should execute fast.

> 2. immediate sibling speed

Here is the kludge:

select * from hierarchy where path in ('1.2.1.1','1.2.1.2','1.2.1.3',
'1.2.1.4','1.2.1.5')

Again, I assume that your query would use an index here.

If you get exactly 5 records, then you can't be sure that your node has that
many children. You have to repeat query like this:
select * from hierarchy where path in (
    '1.2.1.1','1.2.1.2','1.2.1.3', '1.2.1.4','1.2.1.5'
   ,'1.2.1.6','1.2.1.7','1.2.1.8', '1.2.1.9','1.2.1.10')
   ,'1.2.1.11','1.2.1.12','1.2.1.13', '1.2.1.4','1.2.1.15')
   ,'1.2.1.16','1.2.1.17','1.2.1.18', '1.2.1.19','1.2.1.20')
   ,'1.2.1.21','1.2.1.22','1.2.1.23', '1.2.1.4','1.2.1.25')

Yet again, there might be more than 25 children, so you'll have to run yet
again more "generos" query.

The pitfall here is that at some point the optimiser may get tired to do
"or" expansion, so that at some point you might end up with full table scan.
But, this is implementation dependent, so that you might be able to
influence optimizer decision.

As you see, I'm not worried how many bytes materialized path has, or if
parsing it takes more time than multiplying 2 integers. My concern is if
your query can leverage index or not.

> 3. children count speed

Children, or descendants? I guess no method can beat Celko's descendant's
formula
#descendants=(rgt-lft+1)/2
The count is implicitly stored at the root node, so that even for hierarchy
1M nodes we answer how many nodes are there instantly. This is also an
Achiles' heel of the method: any update to the hierarchy triggers  refresh
of the "counter". One aggregate is never enough, though: it's often useful
to know total salary too:-)

If you meant not descendants, but children, then use a method similar to
bullet #2.








pgsql-general by date:

Previous
From: Oleg Lebedev
Date:
Subject: Re: Storing passwords
Next
From: "Joe \"Nuke Me Xemu\" Foster"
Date:
Subject: Re: Making a tree with "millions and millions" of dynamic nodes