Thread: storing intermediate results in recursive plpgsql functions

storing intermediate results in recursive plpgsql functions

From
Fran Fabrizio
Date:
Hello,

I have a table 'entity'.  It has an id column and a parent_id column
that refers to another entity in the same table.  I often need to
calculate all of the "descendants" of a particular id.  I wrote a
plpgsql function 'get_descendants' that, given an id, recurses over all
the children of that id, and the children's children, etc... As it
recurses, it puts the ids of the descendants into a one-column table
called 'descendants'.

Since the return of this function needs to be a setof int4, and as best
I can tell there's no way to return a setof from a recursive plpgsql
function (nowhere internally to store the intermediate result set), I
call this function from a wrapper sql function that simply calls the
recursive function, and then selects from the 'descendants' table.

This all works great until two users simultaneously call
get_descendants() wanting the descendants of two different people.  They
of course get mixed and matched in the descendants table.  I thought I
knew of a couple of different ways to possibly solve the problem but I'm
finding out that they don't work so well after all.

First I thought I would just have the function create 'descendants' as a
temporary table instead of a permanent one, but since table names get
converted to OIDs when I load the function into the db, I'm getting
Relation Not Found errors. I'm guessing that trying to refer to
temporary tables from stored procedures is not a common practice. :-)

Then I thought I'd just have the sql function obtain an access exclusive
lock on the 'descendants' table, call the recursive plpgsql function,
select the results from 'descendants', then unlock the 'descendants'
table.  Then the second instance of get_descendants would sit and wait
for descendants to come free.  That doesn't work because the sql
function wants the last statement called to be the one that returns the
setof int4, not the commit; that is necessary.

Right now the only way I can think of to do this is to have a bit flag
that the client can examine to see if anyone else is currently calling
get_descendants and to wait if so and try again to obtain the bit for
itself in a second or so.  I am the first to admit that this would be a
horrendous hack (yeah, like the rest of what I've already done isn't ;)

I feel like I'm missing something obvious.  But it seems the exact
combination I have here (sql function calling a recursive plpgsql
function and then selecting the result set from a table) is just enough
to thwart all the usual techniques.  I have a short mental list of
things I should have done/could do to maybe help:

1. Stop using recursive functions and represent the family tree in Joe
Celko fashion (I learned about his existence after the creation of this
database, unfortunately)

2. Figure out how to store an array of integers as the intermediate
result set on every pass of a recursive plpgsql function and return them
as a setof int4 when I'm done recursing.  I remember trying a while on
this and finding it impossible.

I still feel like I'm missing some obvious solution.  Anyone have any
insight?

Thanks,
Fran


Re: storing intermediate results in recursive plpgsql functions

From
Tom Lane
Date:
Fran Fabrizio <ffabrizio@mmrd.com> writes:
> I feel like I'm missing something obvious.

Well, you could use a temp table if you access it via EXECUTEd commands.
Or you could use a non-temp table and just not unlock; let the unlock
happen at normal end of transaction.  The latter is probably easier.

            regards, tom lane

Re: storing intermediate results in recursive plpgsql functions

From
Fran Fabrizio
Date:
This is a followup to a thread last Friday.  I'll restate the problem
and already-tried solutions, including Tom Lane's from Friday, then I'll
ask for more ideas. :-)

Requirement: I need a plpgsql function that can return to me the list of
all child id's given a parent id.  This will query a table 'entities'
with 'id' and 'parent_id' fields.  Therefore, the nature of the data is
such that this function will need to be recursive.  Additionally, this
function needs to be multiuser-safe in case two connections call it at
the same time with different parent id's.

Limitations/Assumptions (correct me if I am wrong):

1. plpgsql functions cannot return setof's.  So, if I want the set of
child ids, I need to wrap this plpgsql function with an sql function,
which can return setof.

2. The only way to pass the result set from the plpgsql to the sql
function is storing it externally to the function- i.e. in a (temporary
or permanent) table.

3.  I can create a table called descendants that has one column, id.  If
it's a permanent table, I'll have to lock it so that a concurrent call
to the function cannot intersperse id's in the result set.  If it's a
temporary table, it will only be visible to it's own connection anyhow.

4.  If I take the permanent table route, I run into a transaction
problem.  I cannot have the wrapper sql function lock the table, because
the last statement has to be a select that returns the setof integers,
not the commit to unlock the table.  If I put the commit before the
'select descendants.id from descendants', it's not multiuser-safe.  If I
omit the commit, the table stays locked for the rest of the life of the
connection (apprently sql functions do not automatically end
transactions/release locks like plpgsql functions do).  Similarly, if I
lock the table inside the inner plpgsql function, it gets unlocked
before the wrapper sql function returns the result set (because every
plpgsql function is run inside a transaction and the table gets released
as soon as the plpgsql function returns.)  So, it appears there's no
safe way to do it via permanent table.  This was one of Tom's
suggestions, but since a plpgsql function cannot return setof, and I
need to rely on a wrapper sql function for that, I'm not sure how that
could work.

5.  If I go with a temporary table, there is the 'when do i create it'
issue.  I cannot create it in the sql function because i need
conditional logic to say "if it doesnt already exist for this
connection, create it".  If I try to create it when it already exists, I
of course get an error.  On the other hand, I don't know of a way to
check for temporary table existence from within plpgsql in order to know
whether to create it or not.  If that is possible, I might have a
working solution there.  The plpgsql function would check for temp table
existence, create it if not, delete previous contents, and populate it.
  The wrapper would select from the temp table for the return set.
Multiuser safe since it's a temp table.  But not at all sure if it's
even possible.

At this point, I believe my only solution to this problem is to convert
the recursive data in the table into Joe Celko tree style (which would
reduce the problem to "select id from entities where left > (select left
from entities where id = $parentid) and right < (select right from
entities where id = $parentid).  However, if anyone can think of a way
to do this with the current table, this would be preferable for the
short and medium-term.  I appreciate all the help I've received so far,
hopefully we can get this solved before too long.

Thanks,
Fran


Re: storing intermediate results in recursive plpgsql functions

From
Tom Lane
Date:
Fran Fabrizio <ffabrizio@mmrd.com> writes:
> (apprently sql functions do not automatically end
> transactions/release locks like plpgsql functions do)

Say what?  There's no difference on that score for any kind of function;
none of them end transactions or release locks at exit.

            regards, tom lane

Re: storing intermediate results in recursive plpgsql functions

From
Fran Fabrizio
Date:
>>(apprently sql functions do not automatically end
>>transactions/release locks like plpgsql functions do)
>>
>
> Say what?  There's no difference on that score for any kind of function;
> none of them end transactions or release locks at exit.


If that's the case, then this is the second time in a week I've found
huge errors in this darn PostgreSQL Developer's Handbook I bought.  I'm
coming very close to tossing this thing in the garbage.

"When a PL/pgSQL function locks a table, the lock is released when the
PL/pgSQL function returns".

"You can't have transactions in PL/pgSQL functions.  Every function is
executed in one transaction."

This book is really starting to tick me off.  Grr.  I'm amazed that
there are errors of this magnitude.  The authors are Ewald Geschwinde
and Hans-Jurgen Schonig, does anyone know their credentials?

Thanks for clearing that up, Tom.

-Fran



Re: storing intermediate results in recursive plpgsql functions

From
Tom Lane
Date:
Fran Fabrizio <ffabrizio@mmrd.com> writes:
> If that's the case, then this is the second time in a week I've found
> huge errors in this darn PostgreSQL Developer's Handbook I bought.  I'm
> coming very close to tossing this thing in the garbage.

> "When a PL/pgSQL function locks a table, the lock is released when the
> PL/pgSQL function returns".

This is bogus...

> "You can't have transactions in PL/pgSQL functions.  Every function is
> executed in one transaction."

This is perfectly true: the transaction in which the calling query is
contained also contains the operations executed in the called function.
However, evidently the context misled you to think it meant that the
function has its own transaction.

> This book is really starting to tick me off.  Grr.  I'm amazed that
> there are errors of this magnitude.  The authors are Ewald Geschwinde
> and Hans-Jurgen Schonig, does anyone know their credentials?

Can't say that I recognize either name.  But you might as well let them
know of the mistakes you find, so that they can fix 'em in future
editions (if any).

            regards, tom lane

Re: storing intermediate results in recursive plpgsql functions

From
"Gregory Wood"
Date:
> >>(apprently sql functions do not automatically end
> >>transactions/release locks like plpgsql functions do)
> >
> > Say what?  There's no difference on that score for any kind of function;
> > none of them end transactions or release locks at exit.
>
> "When a PL/pgSQL function locks a table, the lock is released when the
> PL/pgSQL function returns".

I don't see any way in which that could be a true statement. Which is not to
say that it couldn't be a true statement :)

> "You can't have transactions in PL/pgSQL functions.  Every function is
> executed in one transaction."

This is a true statement (although not worded very clearly). Let me give a
shot at making it more understandable:

You can not start or end a transaction inside a function. The reason for
that is rather simple: PostgreSQL does not support nested transactions.
Since all functions are already wrapped in a transaction, implicit or
explicit, they are already inside a transaction, and therefore can not start
a new transaction. For the same reason, you can not end a transaction
because the function is considered an atomic action.

Greg


Re: storing intermediate results in recursive plpgsql

From
Vince Vielhaber
Date:
On Tue, 5 Mar 2002, Tom Lane wrote:

> Fran Fabrizio <ffabrizio@mmrd.com> writes:
> > If that's the case, then this is the second time in a week I've found
> > huge errors in this darn PostgreSQL Developer's Handbook I bought.  I'm
> > coming very close to tossing this thing in the garbage.
>
> > "When a PL/pgSQL function locks a table, the lock is released when the
> > PL/pgSQL function returns".
>
> This is bogus...
>
> > "You can't have transactions in PL/pgSQL functions.  Every function is
> > executed in one transaction."
>
> This is perfectly true: the transaction in which the calling query is
> contained also contains the operations executed in the called function.
> However, evidently the context misled you to think it meant that the
> function has its own transaction.

According to our docs:

----------
If you do a LOCK TABLE in PL/pgSQL, the lock will not be released until
the calling transaction is finished.

and

You also cannot have transactions in PL/pgSQL procedures. The entire
function (and other functions called from therein) is executed in a
transaction and PostgreSQL rolls back the results if something goes
wrong.
----------

So if the function is executed in a transaction and the transaction
ends when the function does, doesn't that mean the lock is released
when the function ends?

Vince.
--
==========================================================================
Vince Vielhaber -- KA8CSH    email: vev@michvhf.com    http://www.pop4.net
         56K Nationwide Dialup from $16.00/mo at Pop4 Networking
        Online Campground Directory    http://www.camping-usa.com
       Online Giftshop Superstore    http://www.cloudninegifts.com
==========================================================================




Re: storing intermediate results in recursive plpgsql functions

From
Tom Lane
Date:
Vince Vielhaber <vev@michvhf.com> writes:
> So if the function is executed in a transaction and the transaction
> ends when the function does, doesn't that mean the lock is released
> when the function ends?

But the transaction *doesn't* end when the function does.

The function must have been called from some SQL query (eg, a SELECT)
and there is an explicit or implicit transaction wrapped around every
query issued by an application.  The earliest the lock can possibly be
released is when the current application-given query ends.  If the query
is inside an explicit BEGIN/COMMIT block then the lock will be held till
you COMMIT or ROLLBACK.

Feel free to propose better wording for that part of the plpgsql docs...

            regards, tom lane