Thread: storing intermediate results from recursive plpgsql
Hello, I've got a plpgsql function that is recursive. Basically, it traverses a table that represents a tree, which in turn represents parent-child relationships. So, I have a function, get_descendants. For each pass, it gets the children of some id. Then it recurses and looks for the children of all of those children, etc...So, along the way, I'm building a list of ids that represent the whole family. For lack of a better idea, I'm storing the id's into a table on each pass. So, if I recurse three levels, I'm doing three inserts. When the recursion exits, I simply select the entire table and then I delete all rows from it. The performance hit I take is unacceptable, something like .02 - .03 seconds per insert, and it's adding up due to the amount of times I have to run this function. The end result is that the web page that displays this data takes many seconds to run. Is there some sort of data structure in plpgsql (an array) that I can use instead of the hack of inserting into a table on each pass and selecting back out at the end? I have to find a way to optimize this process further. Thanks, Fran
It sounds like each batch of children gets operated on three different times: once when you select the children of a particular id, again when you insert them into the temporary table, and a third time when you select from the temp table. The first and easiest optimization would be to truncate the temp table instead of deleting from it, if you're not doing that alread. That won't solve the real problem though. I have basically the same design. What I'm doing is issuing the selects from perl, and storing the results in a perl hash structure. I only have to select each batch of id's once this way. I'm sure this makes up for whatever I lose by not doing it in a postgres function. Seems to work well. In some cases I'm using Storable to cache the resulting perl hash in a Postgresql bytea field so I don't always rebuild the entire tree from scratch. You might also google for Joe Celko and his nested set model. It's a bit complex, but looks like it could be a win, especially if you have a very high ratio of selects to inserts/updates. Other people have tried other variations of this; the mail archives of this list in just the last year have other ideas if you look for tree implementations. Good luck, Wes Sheldahl Fran Fabrizio <ffabrizio%mmrd.com@interlock.lexmark.com> on 12/13/2001 04:43:34 PM To: pgsql-general%postgresql.org@interlock.lexmark.com cc: (bcc: Wesley Sheldahl/Lex/Lexmark) Subject: [GENERAL] storing intermediate results from recursive plpgsql Hello, I've got a plpgsql function that is recursive. Basically, it traverses a table that represents a tree, which in turn represents parent-child relationships. So, I have a function, get_descendants. For each pass, it gets the children of some id. Then it recurses and looks for the children of all of those children, etc...So, along the way, I'm building a list of ids that represent the whole family. For lack of a better idea, I'm storing the id's into a table on each pass. So, if I recurse three levels, I'm doing three inserts. When the recursion exits, I simply select the entire table and then I delete all rows from it. The performance hit I take is unacceptable, something like .02 - .03 seconds per insert, and it's adding up due to the amount of times I have to run this function. The end result is that the web page that displays this data takes many seconds to run. Is there some sort of data structure in plpgsql (an array) that I can use instead of the hack of inserting into a table on each pass and selecting back out at the end? I have to find a way to optimize this process further. Thanks, Fran ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
wsheldah@lexmark.com wrote: > It sounds like each batch of children gets operated on three different times: > once when you select the children of a particular id, again when you insert them > into the temporary table, and a third time when you select from the temp table. It's true that I select them and then insert them on each pass. At the very end after the recursion is when I select back out from the temp table, so that just happens once. > > The first and easiest optimization would be to truncate the temp table instead > of deleting from it, if you're not doing that alread. That won't solve the real > problem though. Indeed the delete also takes .02 seconds, but I only delete once whereas I insert once per pass so optimizing the insert would be more helpful (though I will try this one too). > I have basically the same design. What I'm doing is issuing the selects from > perl, and storing the results in a perl hash structure. I only have to select > each batch of id's once this way. I'm sure this makes up for whatever I lose by > not doing it in a postgres function. Seems to work well. In some cases I'm using > Storable to cache the resulting perl hash in a Postgresql bytea field so I don't > always rebuild the entire tree from scratch. I really need to have it happen in the database so that I can do things like select current_status from status where entity_id IN (select get_descendants(12345)); Since get_descendants has so many applications/uses distributed across many client apps, I really need it centralized. Unless you mean plperl, which could be an option but I was skeptical that moving from plpgsql to plperl would make anything faster. > You might also google for Joe Celko and his nested set model. It's a bit > complex, but looks like it could be a win, especially if you have a very high > ratio of selects to inserts/updates. Other people have tried other variations of This is in fact my long term solution. I bought his book last week and have begun digesting this approach. I was looking for something I can deploy in the meantime to hold us over for a few weeks. =) Thanks Wes, very helpful feedback! -Fran
As someone else mentioned Joe Celko's book has some good stuff in it. You might also try this code snippet. It's currently in PHP and happens outside of the database, but you should get a feeling for it. It works pretty well for me. http://stuff.adhesivemedia.com/php/heirarchial-sorting.php -philip On Thu, 13 Dec 2001, Fran Fabrizio wrote: > > Hello, > > I've got a plpgsql function that is recursive. Basically, it traverses > a table that represents a tree, which in turn represents parent-child > relationships. So, I have a function, get_descendants. For each pass, > it gets the children of some id. Then it recurses and looks for the > children of all of those children, etc...So, along the way, I'm building > a list of ids that represent the whole family. > > For lack of a better idea, I'm storing the id's into a table on each > pass. So, if I recurse three levels, I'm doing three inserts. When the > recursion exits, I simply select the entire table and then I delete all > rows from it. The performance hit I take is unacceptable, something > like .02 - .03 seconds per insert, and it's adding up due to the amount > of times I have to run this function. The end result is that the web > page that displays this data takes many seconds to run. > > Is there some sort of data structure in plpgsql (an array) that I can > use instead of the hack of inserting into a table on each pass and > selecting back out at the end? I have to find a way to optimize this > process further. > > Thanks, > Fran > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >
I use a function written in the PLTCL for a recursive stored procedure. It could be written in PLPG but was written for speed in pltcl and i haven't got round to updating it yet Here it is though it might prove some use for you CREATE OR REPLACE FUNCTION sp_go_up_loc_tree(VARCHAR) RETURNS VARCHAR AS ' if { $1 == "" } { return "" } set inv_loc_id $1 append return_list "$inv_loc_id " spi_exec -array C "SELECT inv_p_loc_id FROM inv_loc_parents WHERE inv_loc_id=$inv_loc_id" { spi_exec "SELECT sp_go_up_loc_tree($C(inv_p_loc_id)) AS parent" append return_list "$parent " } if { ![info exists return_list] } { return "" } return $return_list ' LANGUAGE 'pltcl'; This function returns a tcl list from it and you can basically pass back whater you want. TCL List has format { item1 } { item 2 } { item3 } You could even return a tcl list of lists this would allow you to return more than one thing i.e. { { item1 } { name1 } { description1 } } { { item2 } { name2 } { description2 } } This may help or it may not if you haven't used the TCL API and would like mods written feel free to contact me and i will be most happy to write the recursive function for you Darren Darren Ferguson Software Engineer Openband On Thu, 13 Dec 2001, Philip Hallstrom wrote: > As someone else mentioned Joe Celko's book has some good stuff in it. You > might also try this code snippet. It's currently in PHP and happens > outside of the database, but you should get a feeling for it. It works > pretty well for me. > > http://stuff.adhesivemedia.com/php/heirarchial-sorting.php > > > > -philip > > On Thu, 13 Dec 2001, Fran Fabrizio wrote: > > > > > Hello, > > > > I've got a plpgsql function that is recursive. Basically, it traverses > > a table that represents a tree, which in turn represents parent-child > > relationships. So, I have a function, get_descendants. For each pass, > > it gets the children of some id. Then it recurses and looks for the > > children of all of those children, etc...So, along the way, I'm building > > a list of ids that represent the whole family. > > > > For lack of a better idea, I'm storing the id's into a table on each > > pass. So, if I recurse three levels, I'm doing three inserts. When the > > recursion exits, I simply select the entire table and then I delete all > > rows from it. The performance hit I take is unacceptable, something > > like .02 - .03 seconds per insert, and it's adding up due to the amount > > of times I have to run this function. The end result is that the web > > page that displays this data takes many seconds to run. > > > > Is there some sort of data structure in plpgsql (an array) that I can > > use instead of the hack of inserting into a table on each pass and > > selecting back out at the end? I have to find a way to optimize this > > process further. > > > > Thanks, > > Fran > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 4: Don't 'kill -9' the postmaster > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
> This function returns a tcl list from it and you can > basically pass back whater you want. > > TCL List has format { item1 } { item 2 } { item3 } Just a quick thought While this is a fine piece of programming and probably works very well for those who use it; when I see something like this I can't help but feel something is very lacking in these arena of functions returning sets of data in PostgreSQL. I can't see how being able to return open cursors from PL/PGsql functions is going to help much, either.. - Andrew