Thread: storing intermediate results from recursive plpgsql

storing intermediate results from recursive plpgsql

From
Fran Fabrizio
Date:
Hello,

I've got a plpgsql function that is recursive.  Basically, it traverses
a table that represents a tree, which in turn represents parent-child
relationships.  So, I have a function, get_descendants.  For each pass,
it gets the children of some id.  Then it recurses and looks for the
children of all of those children, etc...So, along the way, I'm building
a list of ids that represent the whole family.

For lack of a better idea, I'm storing the id's into a table on each
pass.  So, if I recurse three levels, I'm doing three inserts.  When the
recursion exits, I simply select the entire table and then I delete all
rows from it.  The performance hit I take is unacceptable, something
like .02 - .03 seconds per insert, and it's adding up due to the amount
of times I have to run this function.  The end result is that the web
page that displays this data takes many seconds to run.

Is there some sort of data structure in plpgsql (an array) that I can
use instead of the hack of inserting into a table on each pass and
selecting back out at the end?  I have to find a way to optimize this
process further.

Thanks,
Fran


Re: storing intermediate results from recursive plpgsql

From
wsheldah@lexmark.com
Date:

It sounds like each batch of children gets operated on three different times:
once when you select the children of a particular id, again when you insert them
into the temporary table, and a third time when you select from the temp table.

The first and easiest optimization would be to truncate the temp table instead
of deleting from it, if you're not doing that alread. That won't solve the real
problem though.

I have basically the same design. What I'm doing is issuing the selects from
perl, and storing the results in a perl hash structure. I only have to select
each batch of id's once this way. I'm sure this makes up for whatever I lose by
not doing it in a postgres function. Seems to work well. In some cases I'm using
Storable to cache the resulting perl hash in a Postgresql bytea field so I don't
always rebuild the entire tree from scratch.

You might also google for Joe Celko and his nested set model. It's a bit
complex, but looks like it could be a win, especially if you have a very high
ratio of selects to inserts/updates. Other people have tried other variations of
this; the mail archives of this list in just the last year have other ideas if
you look for tree implementations.

Good luck,

Wes Sheldahl



Fran Fabrizio <ffabrizio%mmrd.com@interlock.lexmark.com> on 12/13/2001 04:43:34
PM

To:   pgsql-general%postgresql.org@interlock.lexmark.com
cc:    (bcc: Wesley Sheldahl/Lex/Lexmark)
Subject:  [GENERAL] storing intermediate results from recursive plpgsql



Hello,

I've got a plpgsql function that is recursive.  Basically, it traverses
a table that represents a tree, which in turn represents parent-child
relationships.  So, I have a function, get_descendants.  For each pass,
it gets the children of some id.  Then it recurses and looks for the
children of all of those children, etc...So, along the way, I'm building
a list of ids that represent the whole family.

For lack of a better idea, I'm storing the id's into a table on each
pass.  So, if I recurse three levels, I'm doing three inserts.  When the
recursion exits, I simply select the entire table and then I delete all
rows from it.  The performance hit I take is unacceptable, something
like .02 - .03 seconds per insert, and it's adding up due to the amount
of times I have to run this function.  The end result is that the web
page that displays this data takes many seconds to run.

Is there some sort of data structure in plpgsql (an array) that I can
use instead of the hack of inserting into a table on each pass and
selecting back out at the end?  I have to find a way to optimize this
process further.

Thanks,
Fran


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster





Re: storing intermediate results from recursive plpgsql

From
Fran Fabrizio
Date:
wsheldah@lexmark.com wrote:

> It sounds like each batch of children gets operated on three different times:
> once when you select the children of a particular id, again when you insert them
> into the temporary table, and a third time when you select from the temp table.

It's true that I select them and then insert them on each pass.  At the very end
after the recursion is when I select back out from the temp table, so that just
happens once.

>
> The first and easiest optimization would be to truncate the temp table instead
> of deleting from it, if you're not doing that alread. That won't solve the real
> problem though.

Indeed the delete also takes .02 seconds, but I only delete once whereas I insert
once per pass so optimizing the insert would be more helpful (though I will try
this one too).

> I have basically the same design. What I'm doing is issuing the selects from
> perl, and storing the results in a perl hash structure. I only have to select
> each batch of id's once this way. I'm sure this makes up for whatever I lose by
> not doing it in a postgres function. Seems to work well. In some cases I'm using
> Storable to cache the resulting perl hash in a Postgresql bytea field so I don't
> always rebuild the entire tree from scratch.

I really need to have it happen in the database so that I can do things like

select current_status from status where entity_id IN (select
get_descendants(12345));

Since get_descendants has so many applications/uses distributed across many client
apps, I really need it centralized.  Unless you mean plperl, which could be an
option but I was skeptical that moving from plpgsql to plperl would make anything
faster.

> You might also google for Joe Celko and his nested set model. It's a bit
> complex, but looks like it could be a win, especially if you have a very high
> ratio of selects to inserts/updates. Other people have tried other variations of

This is in fact my long term solution.  I bought his book last week and have begun
digesting this approach.  I was looking for something I can deploy in the meantime
to hold us over for a few weeks. =)

Thanks Wes, very helpful feedback!

-Fran


Re: storing intermediate results from recursive plpgsql

From
Philip Hallstrom
Date:
As someone else mentioned Joe Celko's book has some good stuff in it.  You
might also try this code snippet.  It's currently in PHP and happens
outside of the database, but you should get a feeling for it.  It works
pretty well for me.

http://stuff.adhesivemedia.com/php/heirarchial-sorting.php



-philip

On Thu, 13 Dec 2001, Fran Fabrizio wrote:

>
> Hello,
>
> I've got a plpgsql function that is recursive.  Basically, it traverses
> a table that represents a tree, which in turn represents parent-child
> relationships.  So, I have a function, get_descendants.  For each pass,
> it gets the children of some id.  Then it recurses and looks for the
> children of all of those children, etc...So, along the way, I'm building
> a list of ids that represent the whole family.
>
> For lack of a better idea, I'm storing the id's into a table on each
> pass.  So, if I recurse three levels, I'm doing three inserts.  When the
> recursion exits, I simply select the entire table and then I delete all
> rows from it.  The performance hit I take is unacceptable, something
> like .02 - .03 seconds per insert, and it's adding up due to the amount
> of times I have to run this function.  The end result is that the web
> page that displays this data takes many seconds to run.
>
> Is there some sort of data structure in plpgsql (an array) that I can
> use instead of the hack of inserting into a table on each pass and
> selecting back out at the end?  I have to find a way to optimize this
> process further.
>
> Thanks,
> Fran
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>


Re: storing intermediate results from recursive plpgsql

From
Darren Ferguson
Date:
I use a function written in the PLTCL for a recursive stored procedure. It
could be written in PLPG but was written for speed in pltcl and i haven't
got round to updating it yet

Here it is though it might prove some use for you

CREATE OR REPLACE FUNCTION sp_go_up_loc_tree(VARCHAR) RETURNS VARCHAR AS '
    if { $1 == "" } {
        return ""
    }
    set inv_loc_id $1
    append return_list "$inv_loc_id "
    spi_exec -array C "SELECT inv_p_loc_id
                       FROM inv_loc_parents
                       WHERE inv_loc_id=$inv_loc_id" {
         spi_exec "SELECT sp_go_up_loc_tree($C(inv_p_loc_id)) AS parent"
         append return_list "$parent "
    }
    if { ![info exists return_list] } {
        return ""
    }
    return $return_list
' LANGUAGE 'pltcl';

This function returns a tcl list from it and you can basically pass back
whater you want.

TCL List has format { item1 } { item 2 } { item3 }

You could even return a tcl list of lists this would allow you to return
more than one thing i.e.

{ { item1 } { name1 } { description1 } } { { item2 } { name2 } {
description2 } }

This may help or it may not if you haven't used the TCL API and would like
mods written feel free to contact me and i will be most happy to write the
recursive function for you

Darren


Darren Ferguson
Software Engineer
Openband

On Thu, 13 Dec 2001, Philip Hallstrom wrote:

> As someone else mentioned Joe Celko's book has some good stuff in it.  You
> might also try this code snippet.  It's currently in PHP and happens
> outside of the database, but you should get a feeling for it.  It works
> pretty well for me.
>
> http://stuff.adhesivemedia.com/php/heirarchial-sorting.php
>
>
>
> -philip
>
> On Thu, 13 Dec 2001, Fran Fabrizio wrote:
>
> >
> > Hello,
> >
> > I've got a plpgsql function that is recursive.  Basically, it traverses
> > a table that represents a tree, which in turn represents parent-child
> > relationships.  So, I have a function, get_descendants.  For each pass,
> > it gets the children of some id.  Then it recurses and looks for the
> > children of all of those children, etc...So, along the way, I'm building
> > a list of ids that represent the whole family.
> >
> > For lack of a better idea, I'm storing the id's into a table on each
> > pass.  So, if I recurse three levels, I'm doing three inserts.  When the
> > recursion exits, I simply select the entire table and then I delete all
> > rows from it.  The performance hit I take is unacceptable, something
> > like .02 - .03 seconds per insert, and it's adding up due to the amount
> > of times I have to run this function.  The end result is that the web
> > page that displays this data takes many seconds to run.
> >
> > Is there some sort of data structure in plpgsql (an array) that I can
> > use instead of the hack of inserting into a table on each pass and
> > selecting back out at the end?  I have to find a way to optimize this
> > process further.
> >
> > Thanks,
> > Fran
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>


Re: storing intermediate results from recursive plpgsql

From
"Andrew Snow"
Date:

> This function returns a tcl list from it and you can
> basically pass back whater you want.
>
> TCL List has format { item1 } { item 2 } { item3 }

Just a quick thought

While this is a fine piece of programming and probably works very well
for those who use it;  when I see something like this I can't help but
feel something is very lacking in these arena of functions returning
sets of data in PostgreSQL.  I can't see how being able to return open
cursors from PL/PGsql functions is going to help much, either..


- Andrew