Thread: Converting plpgsql to use DTYPE_REC for named composite types

Converting plpgsql to use DTYPE_REC for named composite types

From
Tom Lane
Date:
Those with long memories will recall that for some time now I've been
arguing that plpgsql should be changed to treat all composite-type
variables (both "record" and named composite types) via its DTYPE_REC
code paths, instead of the current situation where it handles variables of
named composite types via its DTYPE_ROW code paths.  DTYPE_ROW is great
for what it was meant for, which is to allow multiple target variables
in usages like "SELECT ... INTO a, b, c" to be represented by a single
PLpgSQL_datum.  It's not great for composite-type variables.  There are
two really fundamental problems with doing things that way:

* DTYPE_ROW cannot represent a simple NULL value of the composite datum;
the closest it can manage is to set the component variables to nulls,
which is equivalent to ROW(NULL, NULL, ...), which is not the same thing
as a simple NULL.  We've had complaints about this before, eg:
https://www.postgresql.org/message-id/flat/52A9010D.3070202%40ultimeth.com
https://www.postgresql.org/message-id/flat/20120920210519.241290%40gmx.com

* If the composite type changes, say by adding or renaming a column,
plpgsql cannot hope to cope with that without fully recompiling the
function, since its symbol table (list of PLpgSQL_datums) would have
to change.  Currently it doesn't even try.  We've had complaints about
that too:
https://www.postgresql.org/message-id/flat/CAL870DWGgkr7W6ZW%3DjGeqE4bHi0E%3DTLwjcQx95C9pYAVL3%3DU%3DQ%40mail.gmail.com
https://www.postgresql.org/message-id/flat/CA%2BTgmoYDf7dkXhKtk7u_YnAfSt47SgK5J8gD8C1JfSiouU194g%40mail.gmail.com
https://www.postgresql.org/message-id/flat/CAFj8pRC8_-Vppe_sx7%2Bjn-4UQ_YVXGdhWP5O0rtmv6qZxShmFg%40mail.gmail.com

A slightly lesser problem is that we've been discouraged from adding
features like allowing composite-typed variables to be given initializers
or marked CONSTANT because we'd have to fix two completely different
code paths, doubling the work needed.  This also applies to the issue
of fixing plpgsql to support domains over composites correctly, which
is what got me interested in doing something about it now.

Hence, attached is a proposed patch to convert plpgsql to use DTYPE_REC
for all user-declared composite-typed variables.  DTYPE_ROW remains, but
is used only for the purpose of collecting lists of target variables.

Since the main objection that's been raised to this change in the past
is possible performance loss in some cases, I've gone to considerable
trouble to try to minimize such losses.  Work remains to be done for
most of the feature issues mentioned above, though this does fix the
issue of allowing composite-typed variables to be simple NULLs.

(I did yield to the temptation to allow plpgsql functions to take
arguments declared as just "record", since there seems no good reason
why that's still disallowed.)

I believe I've gotten things to the point where this is acceptable from
a performance standpoint.  Testing various microbenchmarks on variables
declared as named composites, some things are faster and some are slower,
but nothing seems to get more than circa 5% worse.  The same benchmarks
on variables declared as "record" are uniformly improvements from HEAD,
by significant margins ranging up to 2X faster.  The worst issues I've
noted are with trivial trigger functions, where there's a noticeable
increase in startup overhead.  I have some basically-independent patches
that can buy that back, but since this patch is more than large enough,
I'll post those as a separate thread.

The core idea of the patch is to introduce an implementation of composite
values as "expanded objects", extending the infrastructure that we used
to improve performance of plpgsql array variables in commit 1dc5ebc90
and follow-on patches.  So far, only plpgsql has been taught about such
objects --- later it might be interesting to extend some of the core
operations such as FieldSelect to deal with them explicitly.

My plan is that expandedrecord.c will grow the ability to store values of
domains-over-composite, including the ability to apply domain constraint
checks during assignments.  That's not there yet, though some comments
make reference to it, and a few bits of code are ready for it.

Also, this expands the idea I had in commit 687f096ea to get the typcache
to assign unique-within-a-process numbers to different versions of a
composite type, so that dependent code can easily recognize that a change
has happened.  Now, the numbers are unique within a process across all
composite types, rather than being just unique per type.  Since they're
64-bit counters there seems no risk of wraparound within the lifespan
of a backend process.

I added a bunch of new regression test cases.  Those are mainly meant
to ensure adequate test coverage of the new code.  Running those cases
on HEAD shows no behavioral changes except the expected ones around
handling of composite NULLs and the addition of RECORD as an allowed
argument type.

I'll stick this into the January commitfest, but I'd like to get it
reviewed and committed pretty soon, because there are follow-on patches
that need to get done in time for v11 --- in particular, we need to close
out the lack of plpgsql support for domains-over-composite.

            regards, tom lane


Attachment

Re: Converting plpgsql to use DTYPE_REC for named composite types

From
Pavel Stehule
Date:
Hi

I'll stick this into the January commitfest, but I'd like to get it
reviewed and committed pretty soon, because there are follow-on patches
that need to get done in time for v11 --- in particular, we need to close
out the lack of plpgsql support for domains-over-composite.


I didn't checked code - just I did some performance tests and I am thinking so performance is very good.

Master's record type has 50% speed of row type in my test. Patched has +/- same speed.

I see very small slowdown for row type .. about 3% but I think so it is acceptable - I tested some worst case.

Unfortunately - it breaks and very breaks all plpgsql related extensions - pldebug, plprofiler, plpgsql_check. On second hand, there are only few extensions of this kind.

Regards

Pavel


                        regards, tom lane


Re: Converting plpgsql to use DTYPE_REC for named composite types

From
Pavel Stehule
Date:
Hi

2017-12-29 9:56 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

I'll stick this into the January commitfest, but I'd like to get it
reviewed and committed pretty soon, because there are follow-on patches
that need to get done in time for v11 --- in particular, we need to close
out the lack of plpgsql support for domains-over-composite.


I didn't checked code - just I did some performance tests and I am thinking so performance is very good.

Master's record type has 50% speed of row type in my test. Patched has +/- same speed.

I see very small slowdown for row type .. about 3% but I think so it is acceptable - I tested some worst case.

Unfortunately - it breaks and very breaks all plpgsql related extensions - pldebug, plprofiler, plpgsql_check. On second hand, there are only few extensions of this kind.


I checked the code:

Interesting part from test:

alter table mutable drop column f1;
alter table mutable add column f1 float8;
-- currently, this fails due to cached plan for "r.f1 + 1" expression
select sillyaddone(42);
ERROR:  type of parameter 4 (double precision) does not match that when preparing the plan (integer)
CONTEXT:  PL/pgSQL function sillyaddone(integer) line 1 at RETURN

In this case, can we invalidate plan cache? It can decrease a risk of runtime issues when tables are altered.

Because PLPGSQL_NSTYPE_ROW is removed, then "switch" statement is maybe useless

        if (ns != NULL && nnames == 2)
        {
            switch (ns->itemtype)
            {
                case PLPGSQL_NSTYPE_REC:
                    {
                        /*
                         * words 1/2 are a record name, so third word could be
                         * a field in this record.
                         */
                        PLpgSQL_rec *rec;
                        PLpgSQL_recfield *new;

                        rec = (PLpgSQL_rec *) (plpgsql_Datums[ns->itemno]);
                        new = plpgsql_build_recfield(rec, word3);

                        wdatum->datum = (PLpgSQL_datum *) new;
                        wdatum->ident = NULL;
                        wdatum->quoted = false; /* not used */
                        wdatum->idents = idents;
                        return true;
                    }

                default:
                    break;
            }
        }
    }

should be reduced

                   if (ns != NULL && nnames == 2 && ns->itemtype == PLPGSQL_NSTYPE_REC)
                    {
                        /*
                         * words 1/2 are a record name, so third word could be
                         * a field in this record.
                         */
                        PLpgSQL_rec *rec;
                        PLpgSQL_recfield *new;

                        rec = (PLpgSQL_rec *) (plpgsql_Datums[ns->itemno]);
                        new = plpgsql_build_recfield(rec, word3);

                        wdatum->datum = (PLpgSQL_datum *) new;
                        wdatum->ident = NULL;
                        wdatum->quoted = false; /* not used */
                        wdatum->idents = idents;
                        return true;
                    }


why is in exec_assign_value still case for PLPGSQL_DTYPE_ROW ?


Regards

Pavel

 
Regards

Pavel


                        regards, tom lane



Re: Converting plpgsql to use DTYPE_REC for named composite types

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> Interesting part from test:

> alter table mutable drop column f1;
> alter table mutable add column f1 float8;
> -- currently, this fails due to cached plan for "r.f1 + 1" expression
> select sillyaddone(42);
> ERROR:  type of parameter 4 (double precision) does not match that when
> preparing the plan (integer)
> CONTEXT:  PL/pgSQL function sillyaddone(integer) line 1 at RETURN

> In this case, can we invalidate plan cache?

That's something I expect we can improve in followon patches, but
it seems a bit outside the scope of this patch (which is mighty
big already).

> Because PLPGSQL_NSTYPE_ROW is removed, then "switch" statement is maybe
> useless

I'd just as soon leave it as a switch, for possible future expansion.

> why is in exec_assign_value still case for PLPGSQL_DTYPE_ROW ?

Take it out and see ;-).  The whole point of having DTYPE_ROW
at all is to support stuff like "SELECT ... INTO a,b,c".

            regards, tom lane


Re: Converting plpgsql to use DTYPE_REC for named composite types

From
Pavel Stehule
Date:


2017-12-29 18:38 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> Interesting part from test:

> alter table mutable drop column f1;
> alter table mutable add column f1 float8;
> -- currently, this fails due to cached plan for "r.f1 + 1" expression
> select sillyaddone(42);
> ERROR:  type of parameter 4 (double precision) does not match that when
> preparing the plan (integer)
> CONTEXT:  PL/pgSQL function sillyaddone(integer) line 1 at RETURN

> In this case, can we invalidate plan cache?

That's something I expect we can improve in followon patches, but
it seems a bit outside the scope of this patch (which is mighty
big already).


ok

 
> Because PLPGSQL_NSTYPE_ROW is removed, then "switch" statement is maybe
> useless

I'd just as soon leave it as a switch, for possible future expansion.

I don't think so there will be some cases - but it is not a issue


> why is in exec_assign_value still case for PLPGSQL_DTYPE_ROW ?

Take it out and see ;-).  The whole point of having DTYPE_ROW
at all is to support stuff like "SELECT ... INTO a,b,c".

ok

Regards

Pavel


                        regards, tom lane

Re: Converting plpgsql to use DTYPE_REC for named composite types

From
Tom Lane
Date:
I wrote:
> I'll stick this into the January commitfest, but I'd like to get it
> reviewed and committed pretty soon, because there are follow-on patches
> that need to get done in time for v11 --- in particular, we need to close
> out the lack of plpgsql support for domains-over-composite.

I hacked on the domain-support problem a bit and it worked out well,
so attached is a revised patch that incorporates that.  This caused
me to revise some assumptions about what expandedrecord.c's internal
invariants ought to be, so it's probably better to look at this as
a new patch rather than a diff from v1.

Mostly this is changes internal to expandedrecord.c, but I cleaned up
some code in plpgsql that tests for domain-ness, and I also added support
in ExecEvalFieldSelect for extracting a field directly from an expanded
record without flattening it into a tuple first.  It hadn't been clear
that that was worth the trouble before, but it definitely does come up
while applying domain constraints.  For example, having that fast path
makes about a 2X speed difference in a test like this:

create type two_ints as (f1 int, f2 int);
create domain ordered_ints as two_ints check((value).f1 <= (value).f2);

\timing on

do $$
declare d ordered_ints;
begin
  for i in 1..3000000 loop
    d.f2 := i;
    d.f1 := i;
  end loop;
end$$;


There are still a couple of soft spots having to do with enforcing
domain constraints against null composite values, e.g. if there's
a constraint that would reject a null value we don't notice that
at DECLARE time.  I think there's not much point in being strict
about that until we have support for initializers for composite
variables.  Which is definitely worth doing but it seems like it
could be a separate patch.

The patches in <11986.1514407114@sss.pgh.pa.us> still apply over this
with just some line-number offsets, so I won't post a new version
of those.

            regards, tom lane


Attachment

Re: Converting plpgsql to use DTYPE_REC for named composite types

From
Tom Lane
Date:
I wrote:
> I hacked on the domain-support problem a bit and it worked out well,
> so attached is a revised patch that incorporates that.  This caused
> me to revise some assumptions about what expandedrecord.c's internal
> invariants ought to be, so it's probably better to look at this as
> a new patch rather than a diff from v1.

Oh, I'd not looked at the documentation angle of this.  Here's a short
add-on patch which just adds the fact that "record" is now a valid
argument type, and removes a no-longer-correct claim that system columns
are always inaccessible in row variables.

The documentation draws a distinction between "record" and "row" variables
which is now rather artificial so far as the code is concerned.  But it's
not totally pointless either, since it's still true that a variable
declared "record" will mutate its type in a way that a
named-composite-type variable will not.  I'm inclined to leave that text
as is; anyone think differently?

            regards, tom lane
diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index 7d23ed4..86d28fb 100644
*** a/doc/src/sgml/plpgsql.sgml
--- b/doc/src/sgml/plpgsql.sgml
***************
*** 123,129 ****
       and they can return a result of any of these types.  They can also
       accept or return any composite type (row type) specified by name.
       It is also possible to declare a <application>PL/pgSQL</application>
!      function as returning <type>record</type>, which means that the result
       is a row type whose columns are determined by specification in the
       calling query, as discussed in <xref linkend="queries-tablefunctions"/>.
      </para>
--- 123,131 ----
       and they can return a result of any of these types.  They can also
       accept or return any composite type (row type) specified by name.
       It is also possible to declare a <application>PL/pgSQL</application>
!      function as accepting <type>record</type>, which means that any
!      composite type will do as input, or
!      as returning <type>record</type>, which means that the result
       is a row type whose columns are determined by specification in the
       calling query, as discussed in <xref linkend="queries-tablefunctions"/>.
      </para>
*************** user_id users.user_id%TYPE;
*** 672,685 ****
     </para>

     <para>
-     Only the user-defined columns of a table row are accessible in a
-     row-type variable, not the OID or other system columns (because the
-     row could be from a view).  The fields of the row type inherit the
-     table's field size or precision for data types such as
-     <type>char(<replaceable>n</replaceable>)</type>.
-    </para>
-
-    <para>
      Here is an example of using composite types.  <structname>table1</structname>
      and <structname>table2</structname> are existing tables having at least the
      mentioned fields:
--- 674,679 ----

Re: Converting plpgsql to use DTYPE_REC for named composite types

From
Pavel Stehule
Date:


2017-12-30 0:16 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
I wrote:
> I'll stick this into the January commitfest, but I'd like to get it
> reviewed and committed pretty soon, because there are follow-on patches
> that need to get done in time for v11 --- in particular, we need to close
> out the lack of plpgsql support for domains-over-composite.

I hacked on the domain-support problem a bit and it worked out well,
so attached is a revised patch that incorporates that.  This caused
me to revise some assumptions about what expandedrecord.c's internal
invariants ought to be, so it's probably better to look at this as
a new patch rather than a diff from v1.

Mostly this is changes internal to expandedrecord.c, but I cleaned up
some code in plpgsql that tests for domain-ness, and I also added support
in ExecEvalFieldSelect for extracting a field directly from an expanded
record without flattening it into a tuple first.  It hadn't been clear
that that was worth the trouble before, but it definitely does come up
while applying domain constraints.  For example, having that fast path
makes about a 2X speed difference in a test like this:

create type two_ints as (f1 int, f2 int);
create domain ordered_ints as two_ints check((value).f1 <= (value).f2);

\timing on

do $$
declare d ordered_ints;
begin
  for i in 1..3000000 loop
    d.f2 := i;
    d.f1 := i;
  end loop;
end$$;


There are still a couple of soft spots having to do with enforcing
domain constraints against null composite values, e.g. if there's
a constraint that would reject a null value we don't notice that
at DECLARE time.  I think there's not much point in being strict
about that until we have support for initializers for composite
variables.  Which is definitely worth doing but it seems like it
could be a separate patch.

The patches in <11986.1514407114@sss.pgh.pa.us> still apply over this
with just some line-number offsets, so I won't post a new version
of those.


all test passed on my comp too.

I think so these patches can be useful for schema variables too.

Regards

Pavel
 
                        regards, tom lane


Re: Converting plpgsql to use DTYPE_REC for named composite types

From
Tom Lane
Date:
Here's a version of this rebased up to HEAD, fixing a couple of trivial
merge conflicts and incorporating the docs delta I posted separately.

(I'd supposed this patch was still OK because the patch tester said so,
but I now see that the tester was only testing the docs delta :-(.)

            regards, tom lane


Attachment

Re: Converting plpgsql to use DTYPE_REC for named composite types

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> 2017-12-30 0:16 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
>>> I'll stick this into the January commitfest, but I'd like to get it
>>> reviewed and committed pretty soon, because there are follow-on patches
>>> that need to get done in time for v11 --- in particular, we need to close
>>> out the lack of plpgsql support for domains-over-composite.

> all test passed on my comp too.

Is anyone planning on doing further review of this patch than Pavel
already did?  If not, I'd like to go ahead and push it, since there's
still followup stuff that I'd like to address for v11.  As noted upthread,
there are places where plpgsql fails to enforce DOMAIN NOT NULL
constraints, and I think I had some other squishy things in my notes.

            regards, tom lane