Converting plpgsql to use DTYPE_REC for named composite types - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Converting plpgsql to use DTYPE_REC for named composite types |
Date | |
Msg-id | 8962.1514399547@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Converting plpgsql to use DTYPE_REC for named composite types
Re: Converting plpgsql to use DTYPE_REC for named composite types |
List | pgsql-hackers |
Those with long memories will recall that for some time now I've been arguing that plpgsql should be changed to treat all composite-type variables (both "record" and named composite types) via its DTYPE_REC code paths, instead of the current situation where it handles variables of named composite types via its DTYPE_ROW code paths. DTYPE_ROW is great for what it was meant for, which is to allow multiple target variables in usages like "SELECT ... INTO a, b, c" to be represented by a single PLpgSQL_datum. It's not great for composite-type variables. There are two really fundamental problems with doing things that way: * DTYPE_ROW cannot represent a simple NULL value of the composite datum; the closest it can manage is to set the component variables to nulls, which is equivalent to ROW(NULL, NULL, ...), which is not the same thing as a simple NULL. We've had complaints about this before, eg: https://www.postgresql.org/message-id/flat/52A9010D.3070202%40ultimeth.com https://www.postgresql.org/message-id/flat/20120920210519.241290%40gmx.com * If the composite type changes, say by adding or renaming a column, plpgsql cannot hope to cope with that without fully recompiling the function, since its symbol table (list of PLpgSQL_datums) would have to change. Currently it doesn't even try. We've had complaints about that too: https://www.postgresql.org/message-id/flat/CAL870DWGgkr7W6ZW%3DjGeqE4bHi0E%3DTLwjcQx95C9pYAVL3%3DU%3DQ%40mail.gmail.com https://www.postgresql.org/message-id/flat/CA%2BTgmoYDf7dkXhKtk7u_YnAfSt47SgK5J8gD8C1JfSiouU194g%40mail.gmail.com https://www.postgresql.org/message-id/flat/CAFj8pRC8_-Vppe_sx7%2Bjn-4UQ_YVXGdhWP5O0rtmv6qZxShmFg%40mail.gmail.com A slightly lesser problem is that we've been discouraged from adding features like allowing composite-typed variables to be given initializers or marked CONSTANT because we'd have to fix two completely different code paths, doubling the work needed. This also applies to the issue of fixing plpgsql to support domains over composites correctly, which is what got me interested in doing something about it now. Hence, attached is a proposed patch to convert plpgsql to use DTYPE_REC for all user-declared composite-typed variables. DTYPE_ROW remains, but is used only for the purpose of collecting lists of target variables. Since the main objection that's been raised to this change in the past is possible performance loss in some cases, I've gone to considerable trouble to try to minimize such losses. Work remains to be done for most of the feature issues mentioned above, though this does fix the issue of allowing composite-typed variables to be simple NULLs. (I did yield to the temptation to allow plpgsql functions to take arguments declared as just "record", since there seems no good reason why that's still disallowed.) I believe I've gotten things to the point where this is acceptable from a performance standpoint. Testing various microbenchmarks on variables declared as named composites, some things are faster and some are slower, but nothing seems to get more than circa 5% worse. The same benchmarks on variables declared as "record" are uniformly improvements from HEAD, by significant margins ranging up to 2X faster. The worst issues I've noted are with trivial trigger functions, where there's a noticeable increase in startup overhead. I have some basically-independent patches that can buy that back, but since this patch is more than large enough, I'll post those as a separate thread. The core idea of the patch is to introduce an implementation of composite values as "expanded objects", extending the infrastructure that we used to improve performance of plpgsql array variables in commit 1dc5ebc90 and follow-on patches. So far, only plpgsql has been taught about such objects --- later it might be interesting to extend some of the core operations such as FieldSelect to deal with them explicitly. My plan is that expandedrecord.c will grow the ability to store values of domains-over-composite, including the ability to apply domain constraint checks during assignments. That's not there yet, though some comments make reference to it, and a few bits of code are ready for it. Also, this expands the idea I had in commit 687f096ea to get the typcache to assign unique-within-a-process numbers to different versions of a composite type, so that dependent code can easily recognize that a change has happened. Now, the numbers are unique within a process across all composite types, rather than being just unique per type. Since they're 64-bit counters there seems no risk of wraparound within the lifespan of a backend process. I added a bunch of new regression test cases. Those are mainly meant to ensure adequate test coverage of the new code. Running those cases on HEAD shows no behavioral changes except the expected ones around handling of composite NULLs and the addition of RECORD as an allowed argument type. I'll stick this into the January commitfest, but I'd like to get it reviewed and committed pretty soon, because there are follow-on patches that need to get done in time for v11 --- in particular, we need to close out the lack of plpgsql support for domains-over-composite. regards, tom lane
Attachment
pgsql-hackers by date: