Thread: Optimizing the implementation of an optimized tree

Optimizing the implementation of an optimized tree

From

Christian Rishoej

Date:

06 May 2002, 21:02:42


Hi,

Recently I needed to store a hiearchy of pages on a website in Postgres
in a way that would allow for fast extraction of parts of the tree for
menu generation and "path" specification (which nodes in the tree make
up the path from the root to my position?).

This can be accomplished by letting each node in the tree have an l and
r value with values determined by traversing the edge of the tree and
assigning the value of integer that is incremented at each node visit to
l and doing the same thing for r, this time traversing the edge of the
tree backwards.

The tree then becomes:

CREATE TABLE tree (
        id      serial PRIMARY KEY,
        parent  int REFERENCES tree,
        l       int DEFAULT NULL,
        r       int DEFAULT NULL,
);

The parent id strictly not needed, but I chose to include it for
convenience.

I can easily extract a complete branch like this:

        SELECT treeNode.id
        FROM tree treeNode, tree treeParent
        WHERE treeNode.l BETWEEN treeParent.l AND treeParent.r
                AND treeParent.id = $1
        ORDER BY treeNode.l

And the "path" from the root to a particular node like this:

        SELECT treeNode.id
        FROM tree treeNode, tree currentNode
        WHERE currentNode.r BETWEEN treeNode.l AND treeNode.r
                AND currentNode.id = $1
        ORDER BY treeNode.l;

Now, in order to avoid having to maintain the values of l and r for each
node when the tree is modified I created triggers in PL/PGSQL that
handle INSERTs of new nodes into the tree, DELETEs of existing nodes and
UPDATEs of a node's position in the tree (i.e. the parant field).

The implementation is included as an attachment.

I am very interested in comments on the implementation - especially
hints on possible optimizations.

It seems to me that the PL/PGSQL-definition of each function is parsed
at each call. Is this true? Is it possible to avoid this?

It is my understanding that Postgres does not yet support triggers that
fire on the update of a paricular column. Therefore I have a check in
the UPDATE trigger that checks if the parent-field was modified. Is
there any was to do this in a nicer manner?

Regards,
Christian

CREATE TABLE tree (
    id     serial PRIMARY KEY,
    parent     int REFERENCES tree,
    l     int DEFAULT NULL,
    r     int DEFAULT NULL,
);

-- Are these a good idea?
CREATE INDEX tree_l_idx ON tree (l);
CREATE INDEX tree_r_idx ON tree (r);

CREATE OR REPLACE FUNCTION handle_tree_insert() RETURNS opaque AS '
    DECLARE
        parentR INTEGER;
    BEGIN
        IF NEW.l ISNULL THEN
            parentR := ( SELECT r FROM tree WHERE id = NEW.parent );
            UPDATE tree SET l=l+2 WHERE l >= parentR;
            UPDATE tree SET r=r+2 WHERE r >= parentR;
            NEW.l := parentR;
            NEW.r := parentR + 1;
        END IF;
        RETURN NEW;
    END;
' LANGUAGE PLPGSQL;

CREATE TRIGGER on_tree_insert BEFORE INSERT
    ON tree
    FOR EACH ROW
    EXECUTE PROCEDURE handle_tree_insert();


CREATE OR REPLACE FUNCTION handle_tree_delete() RETURNS opaque AS '
    DECLARE
        decr INTEGER;
    BEGIN
        -- RAISE NOTICE ''Gets here, ID %'', OLD.id;
        IF NOT OLD.l ISNULL THEN
            decr := (((OLD.r - OLD.l - 1) / 2 ) + 1) * 2;
            -- RAISE NOTICE ''...and executes stuff (decrementing with %)'', decr;

            UPDATE tree SET parent = NULL, l = NULL, r = NULL WHERE l > OLD.l AND r < OLD.r;
            DELETE FROM tree WHERE parent ISNULL AND l ISNULL AND r ISNULL;

            UPDATE tree SET l = l - decr WHERE l > OLD.l;
            UPDATE tree SET r = r - decr WHERE r > OLD.r;
        END IF;
        RETURN OLD;
    END;
' LANGUAGE PLPGSQL;

CREATE TRIGGER on_tree_delete AFTER DELETE
    ON tree
    FOR EACH ROW
    EXECUTE PROCEDURE handle_tree_delete();

CREATE OR REPLACE FUNCTION handle_tree_update() RETURNS opaque AS '
    DECLARE
        parentR INTEGER;
        span INTEGER;
        dist INTEGER;
    BEGIN
        IF (NOT OLD.parent ISNULL) AND (NOT NEW.parent ISNULL) AND (OLD.parent <> NEW.parent) THEN
            span := OLD.r - OLD.l + 1;

            -- Get the R value of the new parent
            parentR := ( SELECT r FROM tree WHERE id = NEW.parent );

            -- Determine the distance
            dist = parentR - OLD.l;

            -- Make the gab
            UPDATE tree SET l = l + span WHERE l >= parentR;
            UPDATE tree SET r = r + span WHERE r >= parentR;

            -- Move our branch to the newly created gab
            -- Decrement nodes after the old position to close the old gab
            -- If dist is < 0 then our own branch was moved <span> steps - this must be taken into consideration
            IF (dist > 0)
                THEN UPDATE tree SET l = l + (dist), r = r + (dist) WHERE l >= OLD.l AND r <= OLD.r;
                UPDATE tree SET l = l - span WHERE l >= OLD.l;
                UPDATE tree SET r = r - span WHERE r >= OLD.r;
            ELSE
                UPDATE tree SET l = l + (dist - span), r = r + (dist - span) WHERE l >= (OLD.l + span) AND r <= (OLD.r
+span); 
                UPDATE tree SET l = l - span WHERE l >= (OLD.l + span);
                UPDATE tree SET r = r - span WHERE r >= (OLD.r + span);
            END IF;

        END IF;
        RETURN NULL;
    END;
' LANGUAGE PLPGSQL;

CREATE TRIGGER on_tree_update AFTER UPDATE
    ON tree
    FOR EACH ROW
    EXECUTE PROCEDURE handle_tree_update();

Re: Optimizing the implementation of an optimized tree

From

"Christopher Kings-Lynne"

Date:

06 May 2002, 22:04:06

Why not try Oleg and Teodor's tree module?

http://cddb.sai.msu.su/~megera/postgres/gist/

You have to expend a little effort in implementing it as the README's in
Russian :)  Still has examples tho.

Chris

> -----Original Message-----
> From: pgsql-sql-owner@postgresql.org
> [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Christian Rishoej
> Sent: Tuesday, 7 May 2002 9:02 AM
> To: pgsql-sql@postgresql.org
> Cc: Anders Johannsen
> Subject: [SQL] Optimizing the implementation of an optimized tree
>
>
>
>
> Hi,
>
> Recently I needed to store a hiearchy of pages on a website in Postgres
> in a way that would allow for fast extraction of parts of the tree for
> menu generation and "path" specification (which nodes in the tree make
> up the path from the root to my position?).
>
> This can be accomplished by letting each node in the tree have an l and
> r value with values determined by traversing the edge of the tree and
> assigning the value of integer that is incremented at each node visit to
> l and doing the same thing for r, this time traversing the edge of the
> tree backwards.
>
> The tree then becomes:
>
> CREATE TABLE tree (
>         id      serial PRIMARY KEY,
>         parent  int REFERENCES tree,
>         l       int DEFAULT NULL,
>         r       int DEFAULT NULL,
> );
>
> The parent id strictly not needed, but I chose to include it for
> convenience.
>
> I can easily extract a complete branch like this:
>
>         SELECT treeNode.id
>         FROM tree treeNode, tree treeParent
>         WHERE treeNode.l BETWEEN treeParent.l AND treeParent.r
>                 AND treeParent.id = $1
>         ORDER BY treeNode.l
>
> And the "path" from the root to a particular node like this:
>
>         SELECT treeNode.id
>         FROM tree treeNode, tree currentNode
>         WHERE currentNode.r BETWEEN treeNode.l AND treeNode.r
>                 AND currentNode.id = $1
>         ORDER BY treeNode.l;
>
> Now, in order to avoid having to maintain the values of l and r for each
> node when the tree is modified I created triggers in PL/PGSQL that
> handle INSERTs of new nodes into the tree, DELETEs of existing nodes and
> UPDATEs of a node's position in the tree (i.e. the parant field).
>
> The implementation is included as an attachment.
>
> I am very interested in comments on the implementation - especially
> hints on possible optimizations.
>
> It seems to me that the PL/PGSQL-definition of each function is parsed
> at each call. Is this true? Is it possible to avoid this?
>
> It is my understanding that Postgres does not yet support triggers that
> fire on the update of a paricular column. Therefore I have a check in
> the UPDATE trigger that checks if the parent-field was modified. Is
> there any was to do this in a nicer manner?
>
> Regards,
> Christian
>

Re: Optimizing the implementation of an optimized tree

From

Christian Rishoej

Date:

07 May 2002, 07:03:20

On Tue, 2002-05-07 at 04:00, Christopher Kings-Lynne wrote:

> Why not try Oleg and Teodor's tree module?
> 
> http://cddb.sai.msu.su/~megera/postgres/gist/
> 
> You have to expend a little effort in implementing it as the README's in
> Russian :)  Still has examples tho.

As far as I can tell Oleg and Teotor are doing *balanced* trees, and not
trees usable for containing a hiearchy of nodes where each node can have
0-N children.

It is my understanding that I cannot rely on the organization of a
balanced tree - hence the name "balanced" (nodes are inserted where some
algotithm thinks it fits in, not where I want it. Imagine if your
directory-hiearchy on your filesystem was a balanced tree...).

Does anyone have a clue on how to optimize triggers? Can I use another
language than PL/PGSQL? Would C be crazy when I do lots of SQL's and no
calculation in the trigger?

Christian

Re: Optimizing the implementation of an optimized tree

From

Oleg Bartunov

Date:

07 May 2002, 07:55:39

On 7 May 2002, Christian Rishoej wrote:

> On Tue, 2002-05-07 at 04:00, Christopher Kings-Lynne wrote:
>
> > Why not try Oleg and Teodor's tree module?
> >
> > http://cddb.sai.msu.su/~megera/postgres/gist/
> >
> > You have to expend a little effort in implementing it as the README's in
> > Russian :)  Still has examples tho.
>
> As far as I can tell Oleg and Teotor are doing *balanced* trees, and not
> trees usable for containing a hiearchy of nodes where each node can have
> 0-N children.
>
> It is my understanding that I cannot rely on the organization of a
> balanced tree - hence the name "balanced" (nodes are inserted where some
> algotithm thinks it fits in, not where I want it. Imagine if your
> directory-hiearchy on your filesystem was a balanced tree...).

Hmm, you decide where to insert new node. It's my fault we don't have
english documentation for our module. But it was designed to work with
catalog like structure. Here is the link to my original posting:
http://fts.postgresql.org/db/mw/msg.html?mid=1352122

Test data (snapshot DMOZ) is available from
http://www.sai.msu.su/~megera/postgres/gist/tree/dmoz-full.sql.gz

There are functions
entree_level(entree) and bitree_level(bitree) which returns next level of
the node. To add node:

select entree_next(tid) from dmoz where tid <* '1.2.3.*.0' order by tid
        desc limit 1;
 

but you're free to add node by hand:

test=# create table tree ( node bitree );
CREATE
test=# insert into tree values ('22.9.15.19.12');
INSERT 41728161 1
est=# select * from tree;    node
---------------22.9.15.19.12
(1 row)



>
> Does anyone have a clue on how to optimize triggers? Can I use another
> language than PL/PGSQL? Would C be crazy when I do lots of SQL's and no
> calculation in the trigger?
>
> Christian
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83