Converting plpgsql to use DTYPE_REC for named composite types - Mailing list pgsql-hackers

From Tom Lane
Subject Converting plpgsql to use DTYPE_REC for named composite types
Date
Msg-id 8962.1514399547@sss.pgh.pa.us
Whole thread Raw
Responses Re: Converting plpgsql to use DTYPE_REC for named composite types
Re: Converting plpgsql to use DTYPE_REC for named composite types
List pgsql-hackers
Those with long memories will recall that for some time now I've been
arguing that plpgsql should be changed to treat all composite-type
variables (both "record" and named composite types) via its DTYPE_REC
code paths, instead of the current situation where it handles variables of
named composite types via its DTYPE_ROW code paths.  DTYPE_ROW is great
for what it was meant for, which is to allow multiple target variables
in usages like "SELECT ... INTO a, b, c" to be represented by a single
PLpgSQL_datum.  It's not great for composite-type variables.  There are
two really fundamental problems with doing things that way:

* DTYPE_ROW cannot represent a simple NULL value of the composite datum;
the closest it can manage is to set the component variables to nulls,
which is equivalent to ROW(NULL, NULL, ...), which is not the same thing
as a simple NULL.  We've had complaints about this before, eg:
https://www.postgresql.org/message-id/flat/52A9010D.3070202%40ultimeth.com
https://www.postgresql.org/message-id/flat/20120920210519.241290%40gmx.com

* If the composite type changes, say by adding or renaming a column,
plpgsql cannot hope to cope with that without fully recompiling the
function, since its symbol table (list of PLpgSQL_datums) would have
to change.  Currently it doesn't even try.  We've had complaints about
that too:
https://www.postgresql.org/message-id/flat/CAL870DWGgkr7W6ZW%3DjGeqE4bHi0E%3DTLwjcQx95C9pYAVL3%3DU%3DQ%40mail.gmail.com
https://www.postgresql.org/message-id/flat/CA%2BTgmoYDf7dkXhKtk7u_YnAfSt47SgK5J8gD8C1JfSiouU194g%40mail.gmail.com
https://www.postgresql.org/message-id/flat/CAFj8pRC8_-Vppe_sx7%2Bjn-4UQ_YVXGdhWP5O0rtmv6qZxShmFg%40mail.gmail.com

A slightly lesser problem is that we've been discouraged from adding
features like allowing composite-typed variables to be given initializers
or marked CONSTANT because we'd have to fix two completely different
code paths, doubling the work needed.  This also applies to the issue
of fixing plpgsql to support domains over composites correctly, which
is what got me interested in doing something about it now.

Hence, attached is a proposed patch to convert plpgsql to use DTYPE_REC
for all user-declared composite-typed variables.  DTYPE_ROW remains, but
is used only for the purpose of collecting lists of target variables.

Since the main objection that's been raised to this change in the past
is possible performance loss in some cases, I've gone to considerable
trouble to try to minimize such losses.  Work remains to be done for
most of the feature issues mentioned above, though this does fix the
issue of allowing composite-typed variables to be simple NULLs.

(I did yield to the temptation to allow plpgsql functions to take
arguments declared as just "record", since there seems no good reason
why that's still disallowed.)

I believe I've gotten things to the point where this is acceptable from
a performance standpoint.  Testing various microbenchmarks on variables
declared as named composites, some things are faster and some are slower,
but nothing seems to get more than circa 5% worse.  The same benchmarks
on variables declared as "record" are uniformly improvements from HEAD,
by significant margins ranging up to 2X faster.  The worst issues I've
noted are with trivial trigger functions, where there's a noticeable
increase in startup overhead.  I have some basically-independent patches
that can buy that back, but since this patch is more than large enough,
I'll post those as a separate thread.

The core idea of the patch is to introduce an implementation of composite
values as "expanded objects", extending the infrastructure that we used
to improve performance of plpgsql array variables in commit 1dc5ebc90
and follow-on patches.  So far, only plpgsql has been taught about such
objects --- later it might be interesting to extend some of the core
operations such as FieldSelect to deal with them explicitly.

My plan is that expandedrecord.c will grow the ability to store values of
domains-over-composite, including the ability to apply domain constraint
checks during assignments.  That's not there yet, though some comments
make reference to it, and a few bits of code are ready for it.

Also, this expands the idea I had in commit 687f096ea to get the typcache
to assign unique-within-a-process numbers to different versions of a
composite type, so that dependent code can easily recognize that a change
has happened.  Now, the numbers are unique within a process across all
composite types, rather than being just unique per type.  Since they're
64-bit counters there seems no risk of wraparound within the lifespan
of a backend process.

I added a bunch of new regression test cases.  Those are mainly meant
to ensure adequate test coverage of the new code.  Running those cases
on HEAD shows no behavioral changes except the expected ones around
handling of composite NULLs and the addition of RECORD as an allowed
argument type.

I'll stick this into the January commitfest, but I'd like to get it
reviewed and committed pretty soon, because there are follow-on patches
that need to get done in time for v11 --- in particular, we need to close
out the lack of plpgsql support for domains-over-composite.

            regards, tom lane


Attachment

pgsql-hackers by date:

Previous
From: Jeremy Finzel
Date:
Subject: Re: Deadlock in multiple CIC.
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] pow support for pgbench