Re: Using Expanded Objects other than Arrays from plpgsql - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Using Expanded Objects other than Arrays from plpgsql
Date
Msg-id 2581216.1729794746@sss.pgh.pa.us
Whole thread Raw
In response to Re: Using Expanded Objects other than Arrays from plpgsql  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I wrote:
> ... I'm still writing up
> details, but right now I'm envisioning completely separate sets of
> rules for the prosupport case versus the no-prosupport case.

So here is the design I've come up with for optimizing R/W expanded
object updates in plpgsql without any special knowledge from a
prosupport function.  AFAICS this requires no assumptions at all
about the behavior of called functions, other than the bare minimum
"you can't corrupt the object to the point where it wouldn't be
cleanly free-able".  In particular that means it can work for
user-written called functions in plpgsql, SQL, or whatever, not
only for C-coded functions.

There are two requirements to apply the optimization:

* If the assignment statement is within a BEGIN ... EXCEPTION block,
its target variable must be declared inside the most-closely-nested
such block.  This ensures that if an error is thrown from within the
assignment statement's expression, we do not care about the value
of the target variable, except to the extent of being able to clean
it up.

* The target variable must be referenced exactly once within the
RHS expression.  This avoids aliasing hazards such as we discussed
upthread.  But unlike the existing rule, that reference can be
anywhere in the RHS --- it doesn't have to be an argument of the
topmost function.  So for example we can optimize
    foo := fee(fi(fo(fum(foo), other_variable), ...));

While I've not tried to write any code yet, I think both of these
conditions should be reasonably easy to verify.

Given that those conditions are met and the current value of the
assignment target variable is a R/W expanded pointer, we can
execute the assignment as follows:

* Provide the R/W expanded pointer as the value of the Param
representing the sole reference.  At the same time, apply
TransferExpandedObject to reparent the object under the transient
eval_mcontext memory context that's being used to evaluate the
RHS expression, and then set the target variable's value to NULL.
(At this point, the R/W object has exactly the same logical
status as any intermediate calculation result that's palloc'd
in the eval_mcontext.)

* At successful completion of the RHS, assign the result to the
target variable normally.  This includes, if it's an R/W expanded
object, reparenting it under the calling function's main context.

If the RHS expression winds up returning the same expanded object
(probably mutated), then the outer function regains ownership of it
after no more than a couple of TransferExpandedObject calls, which
are cheap.  If the RHS returns something different, then either the
original expanded object got discarded during RHS evaluation, or it
will be cleaned up when we reset the eval_mcontext, so that it won't
be leaked.

I didn't originally foresee the need to transfer the object
into the transient memory context.  But this design avoids any
possibility of double-free attempts, which would be likely to
happen if we allow the outer function's variable to hold onto
a reference to the object while the RHS is evaluated.  A function
receiving an R/W reference is entitled to assume that that value
is not otherwise referenced and can be freed when it's no longer
needed, so it might well get freed during RHS evaluation.  By
converting the original R/W object into (effectively) a temporary
value within the RHS evaluation, we make that assumption valid.

So, while this design greatly expands the set of cases we can
optimize, it does lose some cases that the old approach could
support.  I envision addressing that by allowing a prosupport
function attached to the RHS' topmost function to "bless"
other cases as safe, using reasoning similar to the old rules.
(Or different rules, even, but it's on the prosupport function
to be sure it's safe.)  I don't have a detailed design in mind,
but I'm thinking along the lines of just passing the whole RHS
expression to the prosupport function and letting it decide
what's safe.  In any case, we don't need to even call the
prosupport function unless there's an exception block or
multiple RHS references to the target variable.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: Re: general purpose array_sort
Next
From: Greg Sabino Mullane
Date:
Subject: Re: Trigger more frequent autovacuums of heavy insert tables