Re: Rethinking plpgsql's assignment implementation - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: Rethinking plpgsql's assignment implementation
Date
Msg-id CAFj8pRBUN3Q-6S7DR+F3nLhNntYoksFbL6RzFQsmMHKYobqJ3A@mail.gmail.com
Whole thread Raw
In response to Re: Rethinking plpgsql's assignment implementation  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Rethinking plpgsql's assignment implementation
List pgsql-hackers


út 15. 12. 2020 v 21:18 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
I realized that the speedup patch I posted yesterday is flawed: it's
too aggressive about applying the R/W param mechanism, instead of
not aggressive enough.

To review, the point of that logic is that if we have an assignment
like
        arrayvar := array_append(arrayvar, some-scalar-expression);
a naive implementation would have array_append construct an entire
new array, which we'd then have to copy into plpgsql's variable
storage.  Instead, if the array variable is in expanded-array
format (which plpgsql encourages it to be) then we can pass the
array parameter as a "read/write expanded datum", which array_append
recognizes as license to scribble right on its input and return the
modified input; that takes only O(1) time not O(N).  Then plpgsql's
assignment code notices that the expression result datum is the same
pointer already stored in the variable, so it does nothing.

With the patch at hand, a subscripted assignment a[i] := x becomes,
essentially,
        a := subscriptingref(a, i, x);
and we need to make the same sort of transformation to allow
array_set_element to scribble right on the original value of "a"
instead of making a copy.

However, we can't simply not consider the source expression "x",
as I proposed yesterday.  For example, if we have
        a := subscriptingref(a, i, f(array_append(a, x)));
it's not okay for array_append() to scribble on "a".  The R/W
param mechanism normally disallows any additional references to
the target variable, which would prevent this error, but I broke
that safety check with the 0007 patch.

After thinking about this awhile, I decided that plpgsql's R/W param
mechanism is really misdesigned.  Instead of requiring the assignment
source expression to be such that *all* its references to the target
variable could be passed as R/W, we really want to identify *one*
reference to the target variable to be passed as R/W, allowing any other
ones to be passed read/only as they would be by default.  As long as the
R/W reference is a direct argument to the top-level (hence last to be
executed) function in the expression, there is no harm in R/O references
being passed to other lower parts of the expression.  Nor is there any
use-case for more than one argument of the top-level function being R/W.

So the attached rewrite of the 0007 patch reimplements that logic to
identify one single Param that references the target variable, and
make only that Param pass a read/write reference, not any other
Params referencing the target variable.  This is a good change even
without considering the assignment-reimplementation proposal, because
even before this patchset we could have cases like
        arrayvar := array_append(arrayvar, arrayvar[i]);
The existing code would be afraid to optimize this, but it's in fact
safe.

I also re-attach the 0001-0006 patches, which have not changed, just
to keep the cfbot happy.


I run some performance tests and it looks very well.


                        regards, tom lane

pgsql-hackers by date:

Previous
From: Laurenz Albe
Date:
Subject: Re: Confusing behavior of psql's \e
Next
From: Brar Piening
Date:
Subject: Re: Minor documentation error regarding streaming replication protocol