Thread: ALTER TABLE ADD COLUMN fast default
Attached is a patch for $subject. It's based on Serge Reilau's patch of about a year ago, taking into account comments made at the time. with bitrot removed and other enhancements such as documentation. Essentially it stores a value in pg_attribute that is used when the stored tuple is missing the attribute. This works unless the default expression is volatile, in which case a table rewrite is forced as happens now. Comments welcome. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > Attached is a patch for $subject. It's based on Serge Reilau's patch of > about a year ago, taking into account comments made at the time. with > bitrot removed and other enhancements such as documentation. > Essentially it stores a value in pg_attribute that is used when the > stored tuple is missing the attribute. This works unless the default > expression is volatile, in which case a table rewrite is forced as > happens now. I'm quite disturbed both by the sheer invasiveness of this patch (eg, changing the API of heap_attisnull()) and by the amount of logic it adds to fundamental tuple-deconstruction code paths. I think we'll need to ask some hard questions about whether the feature gained is worth the distributed performance loss that will ensue. regards, tom lane
On 12/06/2017 10:16 AM, Tom Lane wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> Attached is a patch for $subject. It's based on Serge Reilau's patch of >> about a year ago, taking into account comments made at the time. with >> bitrot removed and other enhancements such as documentation. >> Essentially it stores a value in pg_attribute that is used when the >> stored tuple is missing the attribute. This works unless the default >> expression is volatile, in which case a table rewrite is forced as >> happens now. > I'm quite disturbed both by the sheer invasiveness of this patch > (eg, changing the API of heap_attisnull()) and by the amount of logic > it adds to fundamental tuple-deconstruction code paths. I think > we'll need to ask some hard questions about whether the feature gained > is worth the distributed performance loss that will ensue. > > Thanks for prompt comments. I'm sure we can reduce the footprint some. w.r.t. heap_attisnull, only a handful of call sites actually need the new functionality, 8 or possibly 10 (I need to check more on those in relcache.c). So we could instead of changing the function provide a new one to be used at those sites. I'll work on that. and submit a new patch fairly shortly. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > On 12/06/2017 10:16 AM, Tom Lane wrote: >> I'm quite disturbed both by the sheer invasiveness of this patch >> (eg, changing the API of heap_attisnull()) and by the amount of logic >> it adds to fundamental tuple-deconstruction code paths. I think >> we'll need to ask some hard questions about whether the feature gained >> is worth the distributed performance loss that will ensue. > I'm sure we can reduce the footprint some. > w.r.t. heap_attisnull, only a handful of call sites actually need the > new functionality, 8 or possibly 10 (I need to check more on those in > relcache.c). So we could instead of changing the function provide a new > one to be used at those sites. I'll work on that. and submit a new patch > fairly shortly. Meh, I'm not at all sure that's an improvement. A version of heap_attisnull that ignores the possibility of a "missing" attribute value will give the *wrong answer* if the other version should have been used instead. Furthermore it'd be an attractive nuisance because people would fail to notice if an existing call were now unsafe. If we're to do this I think we have no choice but to impose that compatibility break on third-party code. In general, you need to be thinking about this in terms of making sure that it's impossible to accidentally not update code that needs to be updated. Another example is that I noticed that some places the patch has s/CreateTupleDescCopy/CreateTupleDescCopyConstr/, evidently because the former will fail to copy the missing-attribute support data. I think that's an astonishingly bad idea. There is basically no situation in which CreateTupleDescCopy defined in that way will ever be safe to use. The missing-attribute info is now a fundamental part of a tupdesc and it has to be copied always, just as much as e.g. atttypid. I would opine that it's a mistake to design TupleDesc as though the missing-attribute data had anything to do with default values. It may have originated from the same place but it's a distinct thing. Better to store it in a separate sub-structure. regards, tom lane
On 12/06/2017 11:05 AM, Tom Lane wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> On 12/06/2017 10:16 AM, Tom Lane wrote: >>> I'm quite disturbed both by the sheer invasiveness of this patch >>> (eg, changing the API of heap_attisnull()) and by the amount of logic >>> it adds to fundamental tuple-deconstruction code paths. I think >>> we'll need to ask some hard questions about whether the feature gained >>> is worth the distributed performance loss that will ensue. >> I'm sure we can reduce the footprint some. >> w.r.t. heap_attisnull, only a handful of call sites actually need the >> new functionality, 8 or possibly 10 (I need to check more on those in >> relcache.c). So we could instead of changing the function provide a new >> one to be used at those sites. I'll work on that. and submit a new patch >> fairly shortly. > Meh, I'm not at all sure that's an improvement. A version of > heap_attisnull that ignores the possibility of a "missing" attribute value > will give the *wrong answer* if the other version should have been used > instead. Furthermore it'd be an attractive nuisance because people would > fail to notice if an existing call were now unsafe. If we're to do this > I think we have no choice but to impose that compatibility break on > third-party code. Fair point. > > In general, you need to be thinking about this in terms of making sure > that it's impossible to accidentally not update code that needs to be > updated. Another example is that I noticed that some places the patch > has s/CreateTupleDescCopy/CreateTupleDescCopyConstr/, evidently because > the former will fail to copy the missing-attribute support data. > I think that's an astonishingly bad idea. There is basically no situation > in which CreateTupleDescCopy defined in that way will ever be safe to use. > The missing-attribute info is now a fundamental part of a tupdesc and it > has to be copied always, just as much as e.g. atttypid. > > I would opine that it's a mistake to design TupleDesc as though the > missing-attribute data had anything to do with default values. It may > have originated from the same place but it's a distinct thing. Better > to store it in a separate sub-structure. OK, will work on all that. Thanks again. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 12/06/2017 11:26 AM, Andrew Dunstan wrote: >> In general, you need to be thinking about this in terms of making sure >> that it's impossible to accidentally not update code that needs to be >> updated. Another example is that I noticed that some places the patch >> has s/CreateTupleDescCopy/CreateTupleDescCopyConstr/, evidently because >> the former will fail to copy the missing-attribute support data. >> I think that's an astonishingly bad idea. There is basically no situation >> in which CreateTupleDescCopy defined in that way will ever be safe to use. >> The missing-attribute info is now a fundamental part of a tupdesc and it >> has to be copied always, just as much as e.g. atttypid. >> >> I would opine that it's a mistake to design TupleDesc as though the >> missing-attribute data had anything to do with default values. It may >> have originated from the same place but it's a distinct thing. Better >> to store it in a separate sub-structure. > > OK, will work on all that. Thanks again. > Looking closer at this. First, there is exactly one place where the patch does s/CreateTupleDescCopy/CreateTupleDescCopyConstr/. making this like atttypid suggests that it belongs right outside the TupleConstr structure. But then it seems to me that the change could well end up being a darn site more invasive. For example, should we be passing the number of missing values to CreateTemplateTupleDesc and CreateTupleDesc? I'm happy to do whatever work is required, but I want to have a firm understanding of the design before I spend lots of time cutting code. Once I understand how tupdesc.h should look I should be good. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 12/12/2017 02:13 PM, Andrew Dunstan wrote: > > On 12/06/2017 11:26 AM, Andrew Dunstan wrote: >>> In general, you need to be thinking about this in terms of making sure >>> that it's impossible to accidentally not update code that needs to be >>> updated. Another example is that I noticed that some places the patch >>> has s/CreateTupleDescCopy/CreateTupleDescCopyConstr/, evidently because >>> the former will fail to copy the missing-attribute support data. >>> I think that's an astonishingly bad idea. There is basically no situation >>> in which CreateTupleDescCopy defined in that way will ever be safe to use. >>> The missing-attribute info is now a fundamental part of a tupdesc and it >>> has to be copied always, just as much as e.g. atttypid. >>> >>> I would opine that it's a mistake to design TupleDesc as though the >>> missing-attribute data had anything to do with default values. It may >>> have originated from the same place but it's a distinct thing. Better >>> to store it in a separate sub-structure. >> OK, will work on all that. Thanks again. >> > > > Looking closer at this. First, there is exactly one place where the > patch does s/CreateTupleDescCopy/CreateTupleDescCopyConstr/. > > making this like atttypid suggests that it belongs right outside the > TupleConstr structure. But then it seems to me that the change could > well end up being a darn site more invasive. For example, should we be > passing the number of missing values to CreateTemplateTupleDesc and > CreateTupleDesc? I'm happy to do whatever work is required, but I want > to have a firm understanding of the design before I spend lots of time > cutting code. Once I understand how tupdesc.h should look I should be good. > Here is a new version of the patch. It separates the missing values constructs and logic from the default values constructs and logic. The missing values now sit alongside the default values in tupleConstr. In some places the separation makes the logic a good bit cleaner. Some of the logic is also reworked to remove an assumption that the missing values structure is populated in attnum order, Also code to support the missing stuctures is added to equalTupleDescs. We still have that one extra place where we need to call CreateTupleDescCopyConstr instead of CreateTupleDescCopy. I haven't seen an easy way around that. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 12/31/2017 06:48 PM, Andrew Dunstan wrote: > > Here is a new version of the patch. It separates the missing values > constructs and logic from the default values constructs and logic. The > missing values now sit alongside the default values in tupleConstr. In > some places the separation makes the logic a good bit cleaner. > > Some of the logic is also reworked to remove an assumption that the > missing values structure is populated in attnum order, Also code to > support the missing stuctures is added to equalTupleDescs. > > We still have that one extra place where we need to call > CreateTupleDescCopyConstr instead of CreateTupleDescCopy. I haven't seen > an easy way around that. > New version of the patch that fills in the remaining piece in equalTupleDescs. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 2 January 2018 at 05:01, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > New version of the patch that fills in the remaining piece in > equalTupleDescs. This no longer applies to current master. Can send an updated patch? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 01/16/2018 12:13 AM, David Rowley wrote: > On 2 January 2018 at 05:01, Andrew Dunstan > <andrew.dunstan@2ndquadrant.com> wrote: >> New version of the patch that fills in the remaining piece in >> equalTupleDescs. > This no longer applies to current master. Can send an updated patch? > Yeah, got caught by the bki/pg_attribute changes on Friday. here's an updated version. Thanks for looking. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Wed, Jan 17, 2018 at 2:21 AM, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > Yeah, got caught by the bki/pg_attribute changes on Friday. here's an > updated version. Thanks for looking. A boring semi-automated update: this no long compiles, because 8561e4840c8 added a new call to heap_attisnull(). Pretty sure it just needs NULL in the third argument. -- Thomas Munro http://www.enterprisedb.com
On Thu, Jan 25, 2018 at 6:10 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Wed, Jan 17, 2018 at 2:21 AM, Andrew Dunstan > <andrew.dunstan@2ndquadrant.com> wrote: >> Yeah, got caught by the bki/pg_attribute changes on Friday. here's an >> updated version. Thanks for looking. > > A boring semi-automated update: this no long compiles, because > 8561e4840c8 added a new call to heap_attisnull(). Pretty sure it just > needs NULL in the third argument. > Yeah, thanks. revised patch attached cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hi, On 2018-01-26 10:53:12 +1030, Andrew Dunstan wrote: > Yeah, thanks. revised patch attached Given that this patch touches code that's a huge bottleneck in a lot of cases, I think this needs benchmarks that heavily exercises tuple deforming. Greetings, Andres Freund
On 02/01/2018 02:54 PM, Andres Freund wrote: > Hi, > > On 2018-01-26 10:53:12 +1030, Andrew Dunstan wrote: >> Yeah, thanks. revised patch attached > > Given that this patch touches code that's a huge bottleneck in a lot of > cases, I think this needs benchmarks that heavily exercises tuple > deforming. > That's a reasonable request, I guess, and I tried to collect such data today. I measured two cases: 1) Table with 1000 columns and 64 rows, when there were no ALTER TABLE adding columns with 'fast' defaults. This is meant to measure the best case, with minimal impact from the patch. See the create.sql script attached to this message, and the query.sql which was used for tests using pgbench like this: pgbench -n -f q.sql -T 15 test after 100 runs, the results (tps) look like this: min max median -------------------------------------- master 1827 1873 1860 patched 2023 2066 2056 That is, the patch apparently improves the performance by about 10% (according to perf profiles this is due to slot_deform_tuple getting cheaper). So this case seems fine. 2) Table with 64 rows and 1000 columns, all added by ALTER TABLE with fast default without rewrite. See create-alter.sql. Using the same query.sql as before, this shold significant drop to only about 40 tps (from ~2000 tps for master). The profiles something like this: + 98.87% 98.87% postgres [.] slot_getmissingattrs + 98.77% 0.00% postgres [.] PortalRun + 98.77% 0.00% postgres [.] ExecAgg + 98.74% 0.01% postgres [.] ExecInterpExpr which is kinda understandable, although the 2000 to 40 tps seems like a pretty significant drop. But then again, this case is constructed like a fairly extreme corner case. However, there seems to be some sort of bug, because when I did VACUUM FULL - ideally this would replace the "missing" default values with actual values stored in the heap rows, eliminating the performance impact. But the default values got lost and replaced by NULL values, which seems like a clear data loss scenario. I'm not quite sure what's wrong, but try this: \i create-alter.sql -- this returns 64, which is correct SELECT COUNT(*) FROM t; -- this actually retuns 64 rows with values "1" SELECT c1000 FROM t; -- this returns 63, which is incorrect (should be 64) SELECT count(c1000) FROM t; VACUUM FULL t; -- suddenly we only get NULL values for all 64 rows SELECT c1000 FROM t; -- and now we got 0 (instead of 64) SELECT count(c1000) FROM t; regard -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > Yeah, thanks. revised patch attached FYI the identity regression test started failing recently with this patch applied (maybe due to commit 533c5d8bddf0feb1785b3da17c0d17feeaac76d8?) *** /home/travis/build/postgresql-cfbot/postgresql/src/test/regress/expected/identity.out 2018-02-04 00:56:12.146303616 +0000 --- /home/travis/build/postgresql-cfbot/postgresql/src/test/regress/results/identity.out 2018-02-04 01:05:55.217932032 +0000 *************** *** 257,268 **** INSERT INTO itest13 VALUES (1), (2), (3); -- add column to populated table ALTER TABLE itest13 ADD COLUMN c int GENERATED BY DEFAULT AS IDENTITY; SELECT * FROM itest13; ! a | b | c ! ---+---+--- ! 1 | 1 | 1 ! 2 | 2 | 2 ! 3 | 3 | 3 (3 rows) -- various ALTER COLUMN tests --- 257,269 ---- INSERT INTO itest13 VALUES (1), (2), (3); -- add column to populated table ALTER TABLE itest13 ADD COLUMN c int GENERATED BY DEFAULT AS IDENTITY; + ERROR: column "c" contains null values SELECT * FROM itest13; ! a | b ! ---+--- ! 1 | 1 ! 2 | 2 ! 3 | 3 (3 rows) -- various ALTER COLUMN tests -- Thomas Munro http://www.enterprisedb.com
On Mon, Feb 5, 2018 at 7:19 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan > <andrew.dunstan@2ndquadrant.com> wrote: >> Yeah, thanks. revised patch attached > > FYI the identity regression test started failing recently with this > patch applied (maybe due to commit > 533c5d8bddf0feb1785b3da17c0d17feeaac76d8?) > Thanks. Probably the same bug Tomas Vondra found a few days ago. I'm on it. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Feb 5, 2018 at 7:49 AM, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > On Mon, Feb 5, 2018 at 7:19 AM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan >> <andrew.dunstan@2ndquadrant.com> wrote: >>> Yeah, thanks. revised patch attached >> >> FYI the identity regression test started failing recently with this >> patch applied (maybe due to commit >> 533c5d8bddf0feb1785b3da17c0d17feeaac76d8?) >> > > Thanks. Probably the same bug Tomas Vondra found a few days ago. I'm on it. > Here's a version that fixes the above issue and also the issue with VACUUM that Tomas Vondra reported. I'm still working on the issue with aggregates that Tomas also reported. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 09/02/18 06:24, Andrew Dunstan wrote: > On Mon, Feb 5, 2018 at 7:49 AM, Andrew Dunstan > <andrew.dunstan@2ndquadrant.com> wrote: >> On Mon, Feb 5, 2018 at 7:19 AM, Thomas Munro >> <thomas.munro@enterprisedb.com> wrote: >>> On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan >>> <andrew.dunstan@2ndquadrant.com> wrote: >>>> Yeah, thanks. revised patch attached >>> >>> FYI the identity regression test started failing recently with this >>> patch applied (maybe due to commit >>> 533c5d8bddf0feb1785b3da17c0d17feeaac76d8?) >>> >> >> Thanks. Probably the same bug Tomas Vondra found a few days ago. I'm on it. >> > > > Here's a version that fixes the above issue and also the issue with > VACUUM that Tomas Vondra reported. I'm still working on the issue with > aggregates that Tomas also reported. > I see the patch does not update the ALTER TABLE docs section which discusses table rewrites and it seems like it should. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Feb 11, 2018 at 2:50 PM, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: >> >> >> Here's a version that fixes the above issue and also the issue with >> VACUUM that Tomas Vondra reported. I'm still working on the issue with >> aggregates that Tomas also reported. >> > > I see the patch does not update the ALTER TABLE docs section which > discusses table rewrites and it seems like it should. > Umm it changes the second and third paras of the Notes section, which refer to rewrites. Not sure what else it should change. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Feb 9, 2018 at 3:54 PM, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > On Mon, Feb 5, 2018 at 7:49 AM, Andrew Dunstan > <andrew.dunstan@2ndquadrant.com> wrote: >> On Mon, Feb 5, 2018 at 7:19 AM, Thomas Munro >> <thomas.munro@enterprisedb.com> wrote: >>> On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan >>> <andrew.dunstan@2ndquadrant.com> wrote: >>>> Yeah, thanks. revised patch attached >>> >>> FYI the identity regression test started failing recently with this >>> patch applied (maybe due to commit >>> 533c5d8bddf0feb1785b3da17c0d17feeaac76d8?) >>> >> >> Thanks. Probably the same bug Tomas Vondra found a few days ago. I'm on it. >> > > > Here's a version that fixes the above issue and also the issue with > VACUUM that Tomas Vondra reported. I'm still working on the issue with > aggregates that Tomas also reported. > This version fixes that issue. Thanks to Tomas for his help in finding where the problem was. There are now no outstanding issues that I'm aware of. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Tue, Feb 13, 2018 at 6:28 PM, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > On Fri, Feb 9, 2018 at 3:54 PM, Andrew Dunstan > <andrew.dunstan@2ndquadrant.com> wrote: >> On Mon, Feb 5, 2018 at 7:49 AM, Andrew Dunstan >> <andrew.dunstan@2ndquadrant.com> wrote: >>> On Mon, Feb 5, 2018 at 7:19 AM, Thomas Munro >>> <thomas.munro@enterprisedb.com> wrote: >>>> On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan >>>> <andrew.dunstan@2ndquadrant.com> wrote: >>>>> Yeah, thanks. revised patch attached >>>> >>>> FYI the identity regression test started failing recently with this >>>> patch applied (maybe due to commit >>>> 533c5d8bddf0feb1785b3da17c0d17feeaac76d8?) >>>> >>> >>> Thanks. Probably the same bug Tomas Vondra found a few days ago. I'm on it. >>> >> >> >> Here's a version that fixes the above issue and also the issue with >> VACUUM that Tomas Vondra reported. I'm still working on the issue with >> aggregates that Tomas also reported. >> > > This version fixes that issue. Thanks to Tomas for his help in finding > where the problem was. There are now no outstanding issues that I'm > aware of. > The attached version largely fixes with the performance degradation that Tomas Vondra reported, replacing an O(N^2) piece of code in slot_getmissingattrs() by one that is O(N). cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 02/16/2018 10:46 PM, Andrew Dunstan wrote: > On Tue, Feb 13, 2018 at 6:28 PM, Andrew Dunstan > <andrew.dunstan@2ndquadrant.com> wrote: >> On Fri, Feb 9, 2018 at 3:54 PM, Andrew Dunstan >> <andrew.dunstan@2ndquadrant.com> wrote: >>> On Mon, Feb 5, 2018 at 7:49 AM, Andrew Dunstan >>> <andrew.dunstan@2ndquadrant.com> wrote: >>>> On Mon, Feb 5, 2018 at 7:19 AM, Thomas Munro >>>> <thomas.munro@enterprisedb.com> wrote: >>>>> On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan >>>>> <andrew.dunstan@2ndquadrant.com> wrote: >>>>>> Yeah, thanks. revised patch attached >>>>> >>>>> FYI the identity regression test started failing recently with this >>>>> patch applied (maybe due to commit >>>>> 533c5d8bddf0feb1785b3da17c0d17feeaac76d8?) >>>>> >>>> >>>> Thanks. Probably the same bug Tomas Vondra found a few days ago. I'm on it. >>>> >>> >>> >>> Here's a version that fixes the above issue and also the issue with >>> VACUUM that Tomas Vondra reported. I'm still working on the issue with >>> aggregates that Tomas also reported. >>> >> >> This version fixes that issue. Thanks to Tomas for his help in finding >> where the problem was. There are now no outstanding issues that I'm >> aware of. >> > > > The attached version largely fixes with the performance degradation > that Tomas Vondra reported, replacing an O(N^2) piece of code in > slot_getmissingattrs() by one that is O(N). > Seems fine to me - for comparison, numbers for master and v9/v10 patch versions look like this: create.sql create-alter.sql ----------------------------------------- master 1120 1320 v9 1100 50 v10 1130 1370 This particular machine has a laptop-class CPU, so the timings are somewhat noisy - which explains the small tps differences. What I find interesting is that if I do VACUUM FULL after running the SQL script, I get this: create.sql create-alter.sql ----------------------------------------- master 1120 1450 v9 1100 1320 v10 1100 1450 That is, the throughput for create-alter case jumps up by about 100 tps, for some reason. Not sure why, though ... Anyway, I consider the performance to be OK. But perhaps Andres could comment on this too, as he requested the benchmarks. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 17 February 2018 at 10:46, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > The attached version largely fixes with the performance degradation > that Tomas Vondra reported, replacing an O(N^2) piece of code in > slot_getmissingattrs() by one that is O(N). I've looked over this patch and had a play around with it. Just a couple of things that I noticed while reading: 1. I believe "its" should be "it's" here: + /* + * Yes, and its of the by-value kind Copy in the Datum + */ copy does not need an upper case 'C'. There should also be a comma after "kind" There's a similar problem with the following: + /* + * Yes, and its fixed length Copy it out and have teh Datum + * point to it. + */ there should be a comma after "length" too. Also, "teh" ... 2. "dfferent" -> "different" +-- Test a large sample of dfferent datatypes I'll try to spend some time reviewing the code in detail soon. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Feb 18, 2018 at 2:52 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 17 February 2018 at 10:46, Andrew Dunstan > <andrew.dunstan@2ndquadrant.com> wrote: >> The attached version largely fixes with the performance degradation >> that Tomas Vondra reported, replacing an O(N^2) piece of code in >> slot_getmissingattrs() by one that is O(N). > > I've looked over this patch and had a play around with it. > > Just a couple of things that I noticed while reading: > [ various typos ] > > I'll try to spend some time reviewing the code in detail soon. > Thanks. I'll fix the typos in the next version of the patch, hopefully after your review. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 19 February 2018 at 13:48, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > On Sun, Feb 18, 2018 at 2:52 PM, David Rowley > <david.rowley@2ndquadrant.com> wrote: >> I'll try to spend some time reviewing the code in detail soon. >> > > Thanks. I'll fix the typos in the next version of the patch, hopefully > after your review. Here's a more complete review... I reserve the right to be wrong about a few things and be nitpicky on a few more... 3. All doc changes are indented 1 space too much. Many lines have also been broken needlessly before 80 chars 4. Surplus "it" at end of line. + * Return the missing value of an attribute, or NULL if it + * there is no missing value. 5. "sttributes" -> "attributes" + /* initialize all the missing sttributes to null */ 6. Surplus space before >= + if (missattn >= startAttNum && !attrmiss[i].ammissingNull) 7. Why bother with the else here? + if (slot->tts_tupleDescriptor->constr) + { + missingnum = slot->tts_tupleDescriptor->constr->num_missing -1; + attrmiss = slot->tts_tupleDescriptor->constr->missing; + } + else + { + missingnum = -1; + attrmiss = NULL; + } You may as well just do: + if (!slot->tts_tupleDescriptor->constr) + return; + missingnum = slot->tts_tupleDescriptor->constr->num_missing -1; + attrmiss = slot->tts_tupleDescriptor->constr->missing; 8. -1; should be - 1; + missingnum = slot->tts_tupleDescriptor->constr->num_missing -1; 9. Surplus braces can be removed: + if (attno < attnum) { - slot->tts_values[attno] = (Datum) 0; - slot->tts_isnull[attno] = true; + slot_getmissingattrs(slot, attno); } 10. You've made a behavioral change in the following code: - for (; attno < attnum; attno++) + if (attno < attnum) { - slot->tts_values[attno] = (Datum) 0; - slot->tts_isnull[attno] = true; + slot_getmissingattrs(slot, attno); } Previously this just used to initialise up to attnum, but slot_getmissingattrs does not know about attnum and always seems to fill the entire slot with all atts in the TupleDesc This may cause some performance regression when in cases where only part of the tuple needs to be deformed. I think Tomas' benchmarks used the final column in the table, so likely wouldn't have notice this, although it may have if he'd aggregated one of the middle columns. Once that's fixed, you could ditch the Min() call in the following code: attno = Min(attno, attnum); slot_deform_tuple(slot, attno); /* * If tuple doesn't have all the atts indicated by tupleDesc, read the * rest as NULLs or missing values */ if (attno < attnum) { slot_getmissingattrs(slot, attno); } And change it to something more like: /* * If tuple doesn't have all the atts indicated by tupleDesc, deform as * much as possible, then fill the remaining required atts with nulls or * missing values. */ if (attno < attnum) { slot_deform_tuple(slot, attno); slot_getmissingattrs(slot, attno, attnum); // <-- (includes new parameter) } else slot_deform_tuple(slot, attnum); Which should result in a small performance increase due to not having to compare attno < attnum twice. Although, perhaps the average compile may optimise this anyway. I've not checked. 11. Is there a reason why the following code in getmissingattr() can't be made into a bsearch? + for (missingnum = tupleDesc->constr->num_missing - 1; + missingnum >= 0; missingnum--) + { + if (attrmiss[missingnum].amnum == attnum) + { + if (attrmiss[missingnum].ammissingNull) + { + *isnull = true; + return (Datum) 0; + } + else + { + *isnull = false; + return attrmiss[missingnum].ammissing; + } + } + } + Assert(false); /* should not be able to get here */ + } As far as I can see, the attrmiss is sorted by amnum. But maybe I just failed to find a case where it's not. I'd imagine doing this would further improve Tomas' benchmark score for the patched version, since he was performing a sum() on the final column. Although, If done, I'm actually holding some regard to the fact that users may one day complain that their query became slower after a table rewrite :-) Update: I've stumbled upon the following code: + /* + * We have a dependency on the attrdef array being filled in in + * ascending attnum order. This should be guaranteed by the index + * driving the scan. But we want to be double sure + */ + if (!(attp->attnum > attnum)) + elog(ERROR, "attribute numbers not ascending"); and the code below this seems to also assemble the missing values in attnum order. While I'm here, I'd rather see this if test written as: if (attp->attnum <= attnum) Since the attnum order in the missing values appears to be well defined in ascending order, you can also simplify the comparison logic in equalTupleDescs(). The inner-most nested loop shouldn't be required. 12. Additional braces not required in: + if (tupleDesc && + attnum <= tupleDesc->natts && + TupleDescAttr(tupleDesc, attnum - 1)->atthasmissing) + { + return false; + } + else + { + return true; + } 13. Also, just on study of the following condition: + if (tupleDesc && + attnum <= tupleDesc->natts && + TupleDescAttr(tupleDesc, attnum - 1)->atthasmissing) I think that instead of testing "attnum <= tupleDesc->natts", you should remove that condition an just add; Assert(!tupleDesc || attnum <= tupleDesc->natts); to the start of the function. No code should ever pass an attnum greater than the tupleDesc's natts, so there's probably not much point in checking that now if we didn't before. 14. In expand_tuple, the following can be declared in inner scope: + HeapTupleHeader targetTHeader; (It would be nice if expand_tuple was a bit easier on the eye, but I have no suggestions to improve it right now) 15. Why not: slot->tts_values[missattnum] = (Datum) 0; + slot->tts_values[missattnum] = PointerGetDatum(NULL); There are a few other places you've done this. I'm unsure if there's any sort of standard to follow. 16. Additional braces not required in: { - *isnull = true; - return (Datum) 0; + return getmissingattr(slot->tts_tupleDescriptor, attnum, isnull); } Likewise in: - for (; attnum < tdesc_natts; attnum++) + if (attnum < tdesc_natts) { - slot->tts_values[attnum] = (Datum) 0; - slot->tts_isnull[attnum] = true; + slot_getmissingattrs(slot, attnum); } + and: - for (; attno < attnum; attno++) + if (attno < attnum) { - slot->tts_values[attno] = (Datum) 0; - slot->tts_isnull[attno] = true; + slot_getmissingattrs(slot, attno); } maybe just fix this in all places your patch is touching. 17. Unrelated change: diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl index ed90a02303..3b0be39101 100644 --- a/src/backend/catalog/genbki.pl +++ b/src/backend/catalog/genbki.pl @@ -416,7 +416,6 @@ sub morph_row_for_pgattr } elsif ($priornotnull) { - # attnotnull will automatically be set if the type is # fixed-width and prior columns are all NOT NULL --- # compare DefineAttr in bootstrap.c. oidvector and 18. Can you explain what you mean by "doomed" here? + * If we add a column that is not null and there is no missing value + * (i.e. the missing value is NULL) then this ADD COLUMN is doomed. + * Unless the table is empty... 19. A few lines below are belong 80 chars: + if (HeapTupleHeaderGetNatts(tuple.t_data) < RelationGetDescr(erm->relation)->natts) + { + copyTuple = heap_expand_tuple(&tuple, + RelationGetDescr(erm->relation)); Likewise in: + if (HeapTupleHeaderGetNatts(slot->tts_tuple->t_data) < slot->tts_tupleDescriptor->natts) and: + if (att_tup->atthasdef && rd_att->constr && rd_att->constr->num_defval > 0) 20. "missin" -> "missing", "Fill" -> "fill" + * If the column has a default or missin value Fill it into the + * attrdef array 21. Spaces instead of tabs. Maybe just pgindent the whole patch? + AttrMissing *missing; /* missing attributes values, NULL if none */ uint16 num_defval; uint16 num_check; + uint16 num_missing; 22. Missing <tab><space> before " 24" +#define Anum_pg_attribute_attmissingval 24 Also, missing <tab> before " 15" +#define Anum_pg_attribute_atthasmissing 15 23. Looks like you'll need to find a new home for the test, since the comment claims we want no more than 19. You've added the 20th. # NB: temp.sql does a reconnect which transiently uses 2 connections, # so keep this parallel group to at most 19 tests # ---------- -test: plancache limit plpgsql copy2 temp domain rangefuncs prepare without_oid conversion truncate alter_table sequence polymorphism rowtypes returning largeobject with xml +test: plancache limit plpgsql copy2 temp domain rangefuncs prepare without_oid conversion truncate alter_table sequence polymorphism rowtypes returning largeobject with xml fast_default I'll await the next version before looking again. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
http://www.publix.com/myaccount/verify?validationCode=889fd4cb-6dbb-4f93-9d4f-c701410cd8a2 On Mon, Feb 19, 2018 at 1:18 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 19 February 2018 at 13:48, Andrew Dunstan > <andrew.dunstan@2ndquadrant.com> wrote: >> On Sun, Feb 18, 2018 at 2:52 PM, David Rowley >> <david.rowley@2ndquadrant.com> wrote: >>> I'll try to spend some time reviewing the code in detail soon. >>> >> >> Thanks. I'll fix the typos in the next version of the patch, hopefully >> after your review. > > Here's a more complete review... I reserve the right to be wrong about > a few things and be nitpicky on a few more... I've dealt with most of these issues. The only change of substance is the suggested change in slot_getmissingattrs(). > Once that's fixed, you could ditch the Min() call in the following code: > > attno = Min(attno, attnum); > > slot_deform_tuple(slot, attno); > > /* > * If tuple doesn't have all the atts indicated by tupleDesc, read the > * rest as NULLs or missing values > */ > if (attno < attnum) > { > slot_getmissingattrs(slot, attno); > } > > And change it to something more like: > > /* > * If tuple doesn't have all the atts indicated by tupleDesc, deform as > * much as possible, then fill the remaining required atts with nulls or > * missing values. > */ > if (attno < attnum) > { > slot_deform_tuple(slot, attno); > slot_getmissingattrs(slot, attno, attnum); // <-- (includes new parameter) > } > else > slot_deform_tuple(slot, attnum); > > Which should result in a small performance increase due to not having > to compare attno < attnum twice. Although, perhaps the average compile > may optimise this anyway. I've not checked. > Maybe it would make a difference, but it seems unlikely to be significant. > 11. Is there a reason why the following code in getmissingattr() can't > be made into a bsearch? > [...] > As far as I can see, the attrmiss is sorted by amnum. But maybe I just > failed to find a case where it's not. > > I'd imagine doing this would further improve Tomas' benchmark score > for the patched version, since he was performing a sum() on the final > column. > > Although, If done, I'm actually holding some regard to the fact that > users may one day complain that their query became slower after a > table rewrite :-) > > Update: I've stumbled upon the following code: > > + /* > + * We have a dependency on the attrdef array being filled in in > + * ascending attnum order. This should be guaranteed by the index > + * driving the scan. But we want to be double sure > + */ > + if (!(attp->attnum > attnum)) > + elog(ERROR, "attribute numbers not ascending"); > > and the code below this seems to also assemble the missing values in > attnum order. > > While I'm here, I'd rather see this if test written as: if > (attp->attnum <= attnum) > > Since the attnum order in the missing values appears to be well > defined in ascending order, you can also simplify the comparison logic > in equalTupleDescs(). The inner-most nested loop shouldn't be > required. > Well, this comment in the existing code in equalTupleDescs() suggests that in fact we can't rely on the attrdef items being in attnum order: /* * We can't assume that the items are always read from the system * catalogs in the same order; so use the adnum field to identify * the matching item to compare. */ And in any case the code out of date since the code no longer ties missing values to default values. So I've removed the comment and the error check. There are probably a few optimisations we can make if we can assume that the missing values are in the Tuple Descriptor are in attnum order. But it's unclear to me why they should be and the attrdef entries not. > > 15. Why not: slot->tts_values[missattnum] = (Datum) 0; > > + slot->tts_values[missattnum] = PointerGetDatum(NULL); > > There are a few other places you've done this. I'm unsure if there's > any sort of standard to follow. > This is a pretty widely used idiom in the source. > > I'll await the next version before looking again. > Thanks attached. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hi, On 2018-02-17 00:23:40 +0100, Tomas Vondra wrote: > Anyway, I consider the performance to be OK. But perhaps Andres could > comment on this too, as he requested the benchmarks. My performance concerns were less about CREATE TABLE related things than about analytics workloads or such, where deforming is the primary bottleneck. I think it should be ok, but doing a before/after tpc-h of scale 5-10 or so wouldn't be a bad thing to verify. Greetings, Andres Freund
On Tue, Feb 20, 2018 at 5:03 PM, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > http://www.publix.com/myaccount/verify?validationCode=889fd4cb-6dbb-4f93-9d4f-c701410cd8a2 Wow, my c&p was working overtime. Good thing this won't do anyone any good. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thanks for making those updates . On 20 February 2018 at 19:33, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > On Mon, Feb 19, 2018 at 1:18 PM, David Rowley > <david.rowley@2ndquadrant.com> wrote: >> Since the attnum order in the missing values appears to be well >> defined in ascending order, you can also simplify the comparison logic >> in equalTupleDescs(). The inner-most nested loop shouldn't be >> required. > > Well, this comment in the existing code in equalTupleDescs() suggests > that in fact we can't rely on the attrdef items being in attnum order: > > /* > * We can't assume that the items are always read from the system > * catalogs in the same order; so use the adnum field to identify > * the matching item to compare. > */ On closer look, perhaps the reason that comment claims the order is not well defined is that the use of the index depends on the value of criticalRelcachesBuilt in the following: pg_attribute_scan = systable_beginscan(pg_attribute_desc, AttributeRelidNumIndexId, criticalRelcachesBuilt, NULL, 2, skey); Otherwise, I see no reason for them to be out of order, as if I grep for instances of "->missing = " I only see the place where the missing attr details are set in RelationBuildTupleDesc() and one other place in CreateTupleDescCopyConstr(), where it's simply just copied via memcpy(). Nevertheless, it would be interesting to see how much a bsearch in getmissingattr would help Tomas' case. Though, perhaps you're happy enough with the performance already. I'll make another pass over the updated code soon to see if I can see anything else. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 20 February 2018 at 23:10, David Rowley <david.rowley@2ndquadrant.com> wrote: > Nevertheless, it would be interesting to see how much a bsearch in > getmissingattr would help Tomas' case. Though, perhaps you're happy > enough with the performance already. I thought I'd give this a quick test using the attached (incomplete) patch. My findings made me realise that Tomas' case actually exploited a best case for the patch. Tomas' query.sql is performing a sum(c1000), which is the final column in the table. No doubt he's done this since it's the typical thing to do to get a worst case for tuple deforming, but he might not have realised it was the best case for your patch due to the way you're performing the for loop in getmissingattr starting with the final missing value first. I noticed this when I tested the mockup bsearch code. I was surprised to see that it actually slowed performance a little with the sum(c1000) case. The entire results for those using: pgbench -n -T 60 -j 1 -c 1 -f query.sql postgres is: Using: select sum(c1000) from t; v11 + bsearch + create.sql: tps = 1479.267518 v11 + bsearch + create-alter.sql: tps = 1366.885968 v11 + create.sql: tps = 1544.378091 v11 + create-alter.sql: tps = 1416.608320 (both the create.sql results should be about the same here since they're not really exercising any new code.) Notice that the bsearch version is slightly slower, 1366 tps vs 1416. If I use sum(c10) instead, I get. Using: select sum(c10) from t; v11 + bsearch + create-alter.sql: tps = 1445.940061 v11 + bsearch + create.sql: tps = 3320.102543 v11 + create.sql: tps = 3330.131437 v11 + create-alter.sql: tps = 1398.635251 The bsearch here does help recover some performance, but it's a small improvement on what is quite a big drop from the create.sql case. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 20/02/18 07:42, Andres Freund wrote: > Hi, > > On 2018-02-17 00:23:40 +0100, Tomas Vondra wrote: >> Anyway, I consider the performance to be OK. But perhaps Andres could >> comment on this too, as he requested the benchmarks. > > My performance concerns were less about CREATE TABLE related things than > about analytics workloads or such, where deforming is the primary > bottleneck. I think it should be ok, but doing a before/after tpc-h of > scale 5-10 or so wouldn't be a bad thing to verify. > The test Tomas is doing is analytical query, it's running sum on the new fast default column. He uses create and create-alter names as comparison between when the table was created with the columns and when the columns were added using fast default. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On February 20, 2018 5:03:58 AM PST, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: >On 20/02/18 07:42, Andres Freund wrote: >> Hi, >> >> On 2018-02-17 00:23:40 +0100, Tomas Vondra wrote: >>> Anyway, I consider the performance to be OK. But perhaps Andres >could >>> comment on this too, as he requested the benchmarks. >> >> My performance concerns were less about CREATE TABLE related things >than >> about analytics workloads or such, where deforming is the primary >> bottleneck. I think it should be ok, but doing a before/after tpc-h >of >> scale 5-10 or so wouldn't be a bad thing to verify. >> > >The test Tomas is doing is analytical query, it's running sum on the >new >fast default column. > >He uses create and create-alter names as comparison between when the >table was created with the columns and when the columns were added >using >fast default. It's still a fairly simplistic test case. Running some queries with reasonably well known characteristics seems like a goodidea regardless. It's not like a scale 5 run takes that long. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
... btw, I've not read this patch in any detail, but the recent thread about toast tables for system catalogs prompts me to wonder what happens if a "fast default" value is large enough to require out-of-line toasting. I can easily think of problems that will ensue if we try to support that case, because right now the toast mechanisms assume that OOL toasted values can only be referenced from the associated table. regards, tom lane
Hi, On 2018-02-20 13:07:30 -0500, Tom Lane wrote: > ... btw, I've not read this patch in any detail, but the recent thread > about toast tables for system catalogs prompts me to wonder what happens > if a "fast default" value is large enough to require out-of-line toasting. Hm, interesting. > I can easily think of problems that will ensue if we try to support that > case, because right now the toast mechanisms assume that OOL toasted > values can only be referenced from the associated table. What problem are you seeing with treating the toasted value to be from pg_attribute? I'm only drinking my first coffee just now, so I might be missing something... Now we certainly would need to make sure that the corresponding pg_attribute row containing the default value doesn't go away too early, but that shouldn't be much of a problem given that we never remove them. I wondered for a second if there's problematic cases where the default value is referenced by an index, and then the default-adding transaction rolls back. But I can't construct anything realistically problematic. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2018-02-20 13:07:30 -0500, Tom Lane wrote: >> I can easily think of problems that will ensue if we try to support that >> case, because right now the toast mechanisms assume that OOL toasted >> values can only be referenced from the associated table. > What problem are you seeing with treating the toasted value to be from > pg_attribute? I'm only drinking my first coffee just now, so I might be > missing something... Locking/vacuuming is exactly what I'm worried about. Right now, a transaction cannot reference a toast value without holding at least AccessShare on the parent table. That would not be true of OOL fast defaults that are living in pg_attribute's toast table (if it had one). If you think this isn't potentially problematic, see the problems that forced us into hacks like 08e261cbc. I guess the fix equivalent to 08e261cbc would be to require any toasted fast-default value to be detoasted into line whenever it's copied into a tupledesc, rather than possibly being fetched later. > Now we certainly would need to make sure that the corresponding > pg_attribute row containing the default value doesn't go away too early, > but that shouldn't be much of a problem given that we never remove > them. I wondered for a second if there's problematic cases where the > default value is referenced by an index, and then the default-adding > transaction rolls back. But I can't construct anything realistically > problematic. Oooh ... but no, because we don't allow toasted values to be copied into indexes, for (I think) exactly this reason. See index_form_tuple. regards, tom lane
Hi, On 2018-02-20 13:50:54 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2018-02-20 13:07:30 -0500, Tom Lane wrote: > >> I can easily think of problems that will ensue if we try to support that > >> case, because right now the toast mechanisms assume that OOL toasted > >> values can only be referenced from the associated table. > > > What problem are you seeing with treating the toasted value to be from > > pg_attribute? I'm only drinking my first coffee just now, so I might be > > missing something... > > Locking/vacuuming is exactly what I'm worried about. Right now, a > transaction cannot reference a toast value without holding at least > AccessShare on the parent table. Is that actually reliably true? Doesn't e.g. use a variable from a rolled-back subtransaction in plpgsql circumvent this already? There are a few bugs around this though, so that's not really a convincing argument of there not being any danger :/ > That would not be true of OOL fast defaults that are living in > pg_attribute's toast table (if it had one). If you think this isn't > potentially problematic, see the problems that forced us into hacks > like 08e261cbc. I don't think the equivalent problem quite applies here, the OOL default should afaics be immutable. But yea, it's a concern. > I guess the fix equivalent to 08e261cbc would be to require any toasted > fast-default value to be detoasted into line whenever it's copied into > a tupledesc, rather than possibly being fetched later. Yea, thats probably a good idea. Correctness aside, it's probably better to detoast once rather than do so for every single row for performance reasons (detoasting the same row over and over in multiple backends would lead to quite the contention on used buffer headers and relation locks). > > Now we certainly would need to make sure that the corresponding > > pg_attribute row containing the default value doesn't go away too early, > > but that shouldn't be much of a problem given that we never remove > > them. I wondered for a second if there's problematic cases where the > > default value is referenced by an index, and then the default-adding > > transaction rolls back. But I can't construct anything realistically > > problematic. > > Oooh ... but no, because we don't allow toasted values to be copied into > indexes, for (I think) exactly this reason. See index_form_tuple. I don't think that'd necessarily provide full protection - what about a index recheck during a lossy bitmap index scan for example? But given that any such index creation would also be rolled back I'm not too concerned. Greetings, Andres Freund
On 02/20/2018 06:43 PM, Andres Freund wrote: > > > On February 20, 2018 5:03:58 AM PST, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: >> On 20/02/18 07:42, Andres Freund wrote: >>> Hi, >>> >>> On 2018-02-17 00:23:40 +0100, Tomas Vondra wrote: >>>> Anyway, I consider the performance to be OK. But perhaps Andres >> could >>>> comment on this too, as he requested the benchmarks. >>> >>> My performance concerns were less about CREATE TABLE related things >> than >>> about analytics workloads or such, where deforming is the primary >>> bottleneck. I think it should be ok, but doing a before/after tpc-h >> of >>> scale 5-10 or so wouldn't be a bad thing to verify. >>> >> >> The test Tomas is doing is analytical query, it's running sum on the >> new >> fast default column. >> >> He uses create and create-alter names as comparison between when >> the table was created with the columns and when the columns were >> added using fast default. > > It's still a fairly simplistic test case. Running some queries with > reasonably well known characteristics seems like a good idea > regardless. It's not like a scale 5 run takes that long. > I can do that, although I don't really expect any surprises there. The simple tests I did were supposed to test two corner cases - best case, when there are no columns with "fast default", and worst case when all columns (and tuples) have "fast default". Per David's findings, the second test was not perfectly right, but that's a separate issue. The question is how should the schema for TPC-H look like. Because if you just do the usual test without any ALTER TABLE ADD COLUMN, then you really won't get any difference at all. The fast default stuff will be completely "inactive". So, we need to tweak the TPC-H schema somehow ... We need (a) to add some new columns with default values, (b) tweak some of the queries to use those new columns. Suggestions? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2018-02-20 20:57:36 +0100, Tomas Vondra wrote: > The question is how should the schema for TPC-H look like. Because if > you just do the usual test without any ALTER TABLE ADD COLUMN, then you > really won't get any difference at all. The fast default stuff will be > completely "inactive". Well, there'll still be changed behaviour due to the changed sizes of structures and new out of line function calls. The deforming code is quite sensitive to such changes. I think just verifying that there's no meaningful effect with/without the patch is good enough. As soon as there's actual new columns that take advantage of the new functionality I think some degradation is fair game, you got some benefit for it. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2018-02-20 20:57:36 +0100, Tomas Vondra wrote: >> The question is how should the schema for TPC-H look like. Because if >> you just do the usual test without any ALTER TABLE ADD COLUMN, then you >> really won't get any difference at all. The fast default stuff will be >> completely "inactive". > I think just verifying that there's no meaningful effect with/without > the patch is good enough. As soon as there's actual new columns that > take advantage of the new functionality I think some degradation is fair > game, you got some benefit for it. Yeah, I think what we want to know is how much penalty people are paying for the feature to exist even though they aren't using it. That seems to break down into two cases: (1) use of a table that's never had any added columns; (2) use of a table that has had columns added, but they just default to null the same as before. Is there more relcache setup time anyway? Is deforming slower even if there are no fast defaults? Case (1) could be answered by a straight TPC-H test with/without patch. Case (2) seems a bit harder, since presumably you want to measure accesses to tuples that are "short" compared to the current table tupdesc, but for which the extra columns are still default-null. regards, tom lane
On 02/20/2018 09:14 PM, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: >> On 2018-02-20 20:57:36 +0100, Tomas Vondra wrote: >>> The question is how should the schema for TPC-H look like. Because if >>> you just do the usual test without any ALTER TABLE ADD COLUMN, then you >>> really won't get any difference at all. The fast default stuff will be >>> completely "inactive". > >> I think just verifying that there's no meaningful effect with/without >> the patch is good enough. As soon as there's actual new columns that >> take advantage of the new functionality I think some degradation is fair >> game, you got some benefit for it. > > Yeah, I think what we want to know is how much penalty people are paying > for the feature to exist even though they aren't using it. That seems > to break down into two cases: > > (1) use of a table that's never had any added columns; > > (2) use of a table that has had columns added, but they just default > to null the same as before. Is there more relcache setup time anyway? > Is deforming slower even if there are no fast defaults? > > Case (1) could be answered by a straight TPC-H test with/without patch. I don't quite understand why would this case need the TPC-H tests, or why would TPC-H give us more than the very focused tests we've already done. The first test was testing a fairly short query where any such additional overhead would be much more obvious, compared to the TPC-H queries that usually do a lot of other expensive stuff. I'm fine with doing the tests, of course. > Case (2) seems a bit harder, since presumably you want to measure > accesses to tuples that are "short" compared to the current table > tupdesc, but for which the extra columns are still default-null. > Yeah. I think we can add a column or two to the "fact" tables (particularly lineitem), and then tweak the queries to also compute aggregates on this new column. Unless someone has better ideas, I'll do such tests once the machine completes the stuff that's running now. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2018-02-20 21:28:40 +0100, Tomas Vondra wrote: > I don't quite understand why would this case need the TPC-H tests, or > why would TPC-H give us more than the very focused tests we've already > done. Because a more complex query shows the cost of changing cache access costs better than a trivial query. Simplistic queries will often e.g. not show cost of additional branch predictor usage, because the branch history is large enough to fit the simple query. But once you go to a more complex query, and that's not necessarily the case anymore. > The first test was testing a fairly short query where any such > additional overhead would be much more obvious, compared to the TPC-H > queries that usually do a lot of other expensive stuff. Unfortunately such reasoning IME doesn't work well with cpu-bound stuff. Greetings, Andres Freund
On 02/20/2018 09:36 PM, Andres Freund wrote: > Hi, > > On 2018-02-20 21:28:40 +0100, Tomas Vondra wrote: >> I don't quite understand why would this case need the TPC-H tests, or >> why would TPC-H give us more than the very focused tests we've already >> done. > > Because a more complex query shows the cost of changing cache access > costs better than a trivial query. Simplistic queries will often > e.g. not show cost of additional branch predictor usage, because the > branch history is large enough to fit the simple query. But once you go > to a more complex query, and that's not necessarily the case anymore. > > >> The first test was testing a fairly short query where any such >> additional overhead would be much more obvious, compared to the TPC-H >> queries that usually do a lot of other expensive stuff. > > Unfortunately such reasoning IME doesn't work well with cpu-bound stuff. > OK, point taken. I'll do the tests and report the results. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 21 February 2018 at 00:38, David Rowley <david.rowley@2ndquadrant.com> wrote: > Using: select sum(c10) from t; > ... > v11 + create.sql: tps = 3330.131437 > v11 + create-alter.sql: tps = 1398.635251 It seems the difference between these two cases is down to slot_getsomeattrs being asked to deform up to attnum 1000 for the create-alter.sql case, and only up to attnum 10 for the create.sql case. Both plans are using physical tlists per EXPLAIN VERBOSE. I've not managed to narrow down the reason for the difference yet. Looks like it might take a bit of time with a debugger to find the point where the code paths of the two cases diverge. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes: > It seems the difference between these two cases is down to > slot_getsomeattrs being asked to deform up to attnum 1000 for the > create-alter.sql case, and only up to attnum 10 for the create.sql > case. Both plans are using physical tlists per EXPLAIN VERBOSE. I've > not managed to narrow down the reason for the difference yet. There's logic in the execExpr compiler that detects the last attnum we actually reference, if memory serves. regards, tom lane
Hi, On 2018-02-20 17:03:07 +1030, Andrew Dunstan wrote: > + <row> > + <entry><structfield>attmissingval</structfield></entry> > + <entry><type>bytea</type></entry> > + <entry></entry> > + <entry> > + This column has a binary representation of the value used when the > + column is entirely missing from the row, as happens when the column is > + added after the row is created. The value is only used when > + <structname>atthasmissing</structname> is true. > + </entry> > + </row> Hm. Why is this a bytea, and why is that a good idea? In pg_statistics, which has to deal with something similar-ish, we deal with the ambiguity of the storage by storing anyarrays, which then can display the contents properly thanks to the embedded oid. Now it'd be a bit sad to store a one element array in the table. But that seems better than storing a bytea that's typeless and can't be viewed reasonably. The best solution would be to create a variadic any type that can actually be stored, but I see why you'd not want to go there necessarily. But also, a bit higher level, is it a good idea to store this information directly into pg_attribute? That table is already one of the hotter system catalog tables, and very frequently the largest. If we suddenly end up storing a lot of default values in there, possibly ones that aren't ever actually used because all the default values are long since actually stored in the table, we'll spend more time doing cache lookups. I'm somewhat tempted to say that the default data should be in a separate catalog table. Or possibly use pg_attrdef. > > +/* > + * Return the missing value of an attribute, or NULL if there isn't one. > + */ > +static Datum > +getmissingattr(TupleDesc tupleDesc, > + int attnum, bool *isnull) > +{ > + int missingnum; > + Form_pg_attribute att; > + AttrMissing *attrmiss; > + > + Assert(attnum <= tupleDesc->natts); > + Assert(attnum > 0); > + > + att = TupleDescAttr(tupleDesc, attnum - 1); > + > + if (att->atthasmissing) > + { > + Assert(tupleDesc->constr); > + Assert(tupleDesc->constr->num_missing > 0); > + > + attrmiss = tupleDesc->constr->missing; > + > + /* > + * Look through the tupledesc's attribute missing values > + * for the one that corresponds to this attribute. > + */ > + for (missingnum = tupleDesc->constr->num_missing - 1; > + missingnum >= 0; missingnum--) > + { > + if (attrmiss[missingnum].amnum == attnum) > + { > + if (attrmiss[missingnum].ammissingNull) > + { > + *isnull = true; > + return (Datum) 0; > + } > + else > + { > + *isnull = false; > + return attrmiss[missingnum].ammissing; > + } > + } > + } I think requiring this is a bad idea. If, and only if, there's default attributes with missing values we should build an array that can be directly indexed by attribute number, accessing the missing value. What you're doing here is essentially O(n^2) afaict, with n being the number of columns missing values. > +/* > + * Fill in missing values for a TupleTableSlot > + */ > +static void > +slot_getmissingattrs(TupleTableSlot *slot, int startAttNum, int lastAttNum) > +{ > + AttrMissing *attrmiss; > + int missingnum; > + int missattnum; > + int i; > + > + /* initialize all the missing attributes to null */ > + for (missattnum = lastAttNum - 1; missattnum >= startAttNum; missattnum--) > + { > + slot->tts_values[missattnum] = PointerGetDatum(NULL); > + slot->tts_isnull[missattnum] = true; > + } Any reason not to memset this in one (well two) go with memset? Also, how come you're using PointerGetDatum()? Normally we just use (Datum) 0, imo it's confusing to use PointerGetDatum(), as it implies things that are quite possibly not the case. > +/* > + * Fill in one attribute, either a data value or a bit in the null bitmask > + */ Imo this isn't sufficient documentation to explain what this function does. Even if you just add "per-attribute helper for heap_fill_tuple and other routines building tuples" or such, I think it'd be clearer. > +static inline void > +fill_val(Form_pg_attribute att, > + bits8 **bit, > + int *bitmask, > + char **dataP, > + uint16 *infomask, > + Datum datum, > + bool isnull) > +{ > + Size data_length; > + char *data = *dataP; > + > + /* > + * If we're building a null bitmap, set the appropriate bit for the > + * current column value here. > + */ > + if (bit != NULL) > + { > + if (*bitmask != HIGHBIT) > + *bitmask <<= 1; > + else > + { > + *bit += 1; > + **bit = 0x0; > + *bitmask = 1; > + } > + > + if (isnull) > + { > + *infomask |= HEAP_HASNULL; > + return; > + } > + > + **bit |= *bitmask; > + } > + > + Assert(att); Why isn't this asserted at the beginning? > @@ -293,10 +408,18 @@ heap_fill_tuple(TupleDesc tupleDesc, > * ---------------- > */ > bool > -heap_attisnull(HeapTuple tup, int attnum) > +heap_attisnull(HeapTuple tup, int attnum, TupleDesc tupleDesc) > { > + /* > + * We allow a NULL tupledesc for relations not expected to have > + * missing values, such as catalog relations and indexes. > + */ > + Assert(!tupleDesc || attnum <= tupleDesc->natts); This seems fairly likely to hide bugs. > if (attnum > (int) HeapTupleHeaderGetNatts(tup->t_data)) > - return true; > + { > + return !(tupleDesc && > + TupleDescAttr(tupleDesc, attnum - 1)->atthasmissing); Brrr. > +/* > + * Expand a tuple with missing values, or NULLs. The source tuple must have > + * less attributes than the required number. Only one of targetHeapTuple > + * and targetMinimalTuple may be supplied. The other argument must be NULL. > + */ I can't quite make head or tails of this comment "with missing values, or NULLs"? > +static void > +expand_tuple(HeapTuple *targetHeapTuple, > + MinimalTuple *targetMinimalTuple, > + HeapTuple sourceTuple, > + TupleDesc tupleDesc) > +{ > + AttrMissing *attrmiss = NULL; > + int attnum; > + int missingnum; > + int firstmissingnum = 0; > + int num_missing = 0; > + bool hasNulls = HeapTupleHasNulls(sourceTuple); > + HeapTupleHeader targetTHeader; > + HeapTupleHeader sourceTHeader = sourceTuple->t_data; > + int sourceNatts = HeapTupleHeaderGetNatts(sourceTHeader); > + int natts = tupleDesc->natts; > + int sourceIndicatorLen; > + int targetIndicatorLen; > + Size sourceDataLen = sourceTuple->t_len - sourceTHeader->t_hoff; > + Size targetDataLen; > + Size len; > + int hoff; > + bits8 *nullBits = NULL; > + int bitMask = 0; > + char *targetData; > + uint16 *infoMask; > + > + Assert((targetHeapTuple && !targetMinimalTuple) > + || (!targetHeapTuple && targetMinimalTuple)); > + > + Assert(sourceNatts < natts); > + > + sourceIndicatorLen = (hasNulls ? BITMAPLEN(sourceNatts) : 0); Maybe I'm just grumpy right now (did response work on 2016 taxes just now, brrrrr), but I don't find "indicatorLen" a descriptive term. > + targetDataLen = sourceDataLen; > + > + if (tupleDesc->constr && > + tupleDesc->constr->num_missing) > + { > + /* > + * If there are missing values we want to put them into the tuple. > + * Before that we have to compute the extra length for the values > + * array and the variable length data. > + */ > + num_missing = tupleDesc->constr->num_missing; > + attrmiss = tupleDesc->constr->missing; You're accessing a hell of a lot of expensive stuff outside outside of this if block. E.g. HeapTupleHeaderGetNatts specifically is the prime source of cache misses in a lot of workloads. Now a smart compiler might be able to optimize this away, but... > + /* > + * Find the first item in attrmiss for which we don't have a value in > + * the source (i.e. where its amnum is > sourceNatts). We can ignore > + * all the missing entries before that. > + */ > + for (firstmissingnum = 0; > + firstmissingnum < num_missing > + && attrmiss[firstmissingnum].amnum <= sourceNatts; > + firstmissingnum++) > + { /* empty */ > + } > + > + /* > + * If there are no more missing values everything else must be > + * NULL > + */ > + if (firstmissingnum >= num_missing) > + { > + hasNulls = true; > + } > + > + /* > + * Now walk the missing attributes one by one. If there is a > + * missing value make space for it. Otherwise, it's going to be NULL. > + */ > + for (attnum = sourceNatts + 1; > + attnum <= natts; > + attnum++) > + { > + Form_pg_attribute att = TupleDescAttr(tupleDesc, attnum - 1); > + > + for (missingnum = firstmissingnum; missingnum < num_missing; missingnum++) > + { > + > + /* > + * If there is a missing value for this attribute calculate > + * the required space if it's not NULL. > + */ > + if (attrmiss[missingnum].amnum == attnum) > + { > + if (attrmiss[missingnum].ammissingNull) > + hasNulls = true; > + else > + { > + targetDataLen = att_align_datum(targetDataLen, > + att->attalign, > + att->attlen, > + attrmiss[missingnum].ammissing); > + > + targetDataLen = att_addlength_pointer(targetDataLen, > + att->attlen, > + attrmiss[missingnum].ammissing); > + } > + break; > + } > + } > + if (missingnum > num_missing) > + { > + /* there is no missing value for this attribute */ > + hasNulls = true; > + } > + } > + } /* end if have missing values */ > + else > + { > + /* > + * If there are no missing values at all then NULLS must be allowed, > + * since some of the attributes are known to be absent. > + */ > + hasNulls = true; Why are we doing anything at all in this branch? ISTM we're reforming the tuple for absolutely no gain here, just making it larger by including additionall null bits? > + /* > + * Allocate and zero the space needed. Note that the tuple body and > + * HeapTupleData management structure are allocated in one chunk. > + */ > + if (targetHeapTuple) > + { > + len += offsetof(HeapTupleHeaderData, t_bits); > + hoff = len = MAXALIGN(len); /* align user data safely */ > + len += targetDataLen; So len isn't the length of the tuple, but the null bitmap length? That seems, uh, non-obvious. > + *targetHeapTuple = (HeapTuple) palloc0(HEAPTUPLESIZE + len); > + (*targetHeapTuple)->t_data > + = targetTHeader > + = (HeapTupleHeader) ((char *) *targetHeapTuple + HEAPTUPLESIZE); Yuck. > + (*targetHeapTuple)->t_len = len; > + (*targetHeapTuple)->t_tableOid = sourceTuple->t_tableOid; > + ItemPointerSetInvalid(&((*targetHeapTuple)->t_self)); Seems like a lot of this code would be more readable by storing *targetHeapTuple in a local var. > + targetTHeader->t_infomask = sourceTHeader->t_infomask; > + targetTHeader->t_hoff = hoff; > + HeapTupleHeaderSetNatts(targetTHeader, natts); > + HeapTupleHeaderSetDatumLength(targetTHeader, len); > + HeapTupleHeaderSetTypeId(targetTHeader, tupleDesc->tdtypeid); > + HeapTupleHeaderSetTypMod(targetTHeader, tupleDesc->tdtypmod); > + /* We also make sure that t_ctid is invalid unless explicitly set */ > + ItemPointerSetInvalid(&(targetTHeader->t_ctid)); This seems fairly expensive, why aren't we just memcpy'ing the old header into the new one? > + if (targetIndicatorLen > 0) > + { > + if (sourceIndicatorLen > 0) > + { > + /* if bitmap pre-existed copy in - all is set */ can't parse. > + memcpy(nullBits, > + ((char *) sourceTHeader) > + + offsetof(HeapTupleHeaderData, t_bits), > + sourceIndicatorLen); > + nullBits += sourceIndicatorLen - 1; > + } > + else > + { > + sourceIndicatorLen = BITMAPLEN(sourceNatts); > + /* Set NOT NULL for all existing attributes */ > + memset(nullBits, 0xff, sourceIndicatorLen); > + > + nullBits += sourceIndicatorLen - 1; > + > + if (sourceNatts & 0x07) > + { > + /* build the mask (inverted!) */ > + bitMask = 0xff << (sourceNatts & 0x07); > + /* Voila */ > + *nullBits = ~bitMask; > + } It's late, I'm tired. But how can this be correct? Also, this code is massively under-commented. > + > +/* > + * Fill in the missing values for an ordinary HeapTuple > + */ > +MinimalTuple > +minimal_expand_tuple(HeapTuple sourceTuple, TupleDesc tupleDesc) > +{ > + MinimalTuple minimalTuple; > + > + expand_tuple(NULL, &minimalTuple, sourceTuple, tupleDesc); > + return minimalTuple; > +} > + > +/* > + * Fill in the missing values for a minimal HeapTuple > + */ > +HeapTuple > +heap_expand_tuple(HeapTuple sourceTuple, TupleDesc tupleDesc) > +{ > + HeapTuple heapTuple; > + > + expand_tuple(&heapTuple, NULL, sourceTuple, tupleDesc); > + return heapTuple; > +} Comments appear to be swapped. > /* ---------------- > * heap_copy_tuple_as_datum > * > @@ -1012,13 +1422,10 @@ heap_deform_tuple(HeapTuple tuple, TupleDesc tupleDesc, > > /* > * If tuple doesn't have all the atts indicated by tupleDesc, read the > - * rest as null > + * rest as null, or a missing value if there is one. > */ "a missing value" isn't there quite possibly multiple ones? > /* > @@ -1192,8 +1599,7 @@ slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull) > tup = tuple->t_data; > if (attnum > HeapTupleHeaderGetNatts(tup)) > { > - *isnull = true; > - return (Datum) 0; > + return getmissingattr(slot->tts_tupleDescriptor, attnum, isnull); > } This turned a ~4 instruction case into considerably more. I've recently removed one of the major slot_getattr sources (hashagg comparisons), but we still call this super frequently for hashagg's hash computations. I suggest micro-benchmarking that case. > Oid > StoreAttrDefault(Relation rel, AttrNumber attnum, > - Node *expr, bool is_internal) > + Node *expr, bool is_internal, bool add_column_mode) > { > char *adbin; > char *adsrc; > @@ -1939,9 +1945,20 @@ StoreAttrDefault(Relation rel, AttrNumber attnum, > Relation attrrel; > HeapTuple atttup; > Form_pg_attribute attStruct; > + Form_pg_attribute defAttStruct; > Oid attrdefOid; > ObjectAddress colobject, > defobject; > + ExprState *exprState; > + Expr *expr2 = (Expr *) expr; > + EState *estate = NULL; > + ExprContext *econtext; > + char *missingBuf = NULL; > + Datum valuesAtt[Natts_pg_attribute]; > + bool nullsAtt[Natts_pg_attribute]; > + bool replacesAtt[Natts_pg_attribute]; > + Datum missingval = (Datum) 0; > + bool missingIsNull = true; > > /* > * Flatten expression to string form for storage. > @@ -1956,6 +1973,44 @@ StoreAttrDefault(Relation rel, AttrNumber attnum, > RelationGetRelid(rel)), > false, false); > > + /* > + * Compute the missing value > + */ > + expr2 = expression_planner(expr2); > + > + exprState = ExecInitExpr(expr2, NULL); This should instead use ExecPrepareExpr() > + if (add_column_mode) > + { > + missingval = ExecEvalExpr(exprState, econtext, > + &missingIsNull); > + > + defAttStruct = TupleDescAttr(rel->rd_att, attnum - 1); > + > + if (missingIsNull) > + { > + missingval = PointerGetDatum(NULL); (Datum) 0; > + } > + else if (defAttStruct->attbyval) > + { > + missingBuf = palloc(VARHDRSZ + sizeof(Datum)); > + memcpy(VARDATA(missingBuf), &missingval, sizeof(Datum)); > + SET_VARSIZE(missingBuf, VARHDRSZ + sizeof(Datum)); > + missingval = PointerGetDatum(missingBuf); > + } > + else if (defAttStruct->attlen >= 0) > + { > + missingBuf = palloc(VARHDRSZ + defAttStruct->attlen); > + memcpy(VARDATA(missingBuf), DatumGetPointer(missingval), > + defAttStruct->attlen); > + SET_VARSIZE(missingBuf, > + VARHDRSZ + defAttStruct->attlen); > + missingval = PointerGetDatum(missingBuf); > + } What if we get an extended datum back in the varlena case? Even if we wouldn't change the storage from bytea to something else, you really would need to force detoasting here. > + } > + > /* > * Make the pg_attrdef entry. > */ > @@ -1996,7 +2051,22 @@ StoreAttrDefault(Relation rel, AttrNumber attnum, > attStruct = (Form_pg_attribute) GETSTRUCT(atttup); > if (!attStruct->atthasdef) > { > - attStruct->atthasdef = true; > + MemSet(valuesAtt, 0, sizeof(valuesAtt)); > + MemSet(nullsAtt, false, sizeof(nullsAtt)); > + MemSet(replacesAtt, false, sizeof(replacesAtt)); > + valuesAtt[Anum_pg_attribute_atthasdef - 1] = true; > + replacesAtt[Anum_pg_attribute_atthasdef - 1] = true; > + if (add_column_mode) > + { > + valuesAtt[Anum_pg_attribute_atthasmissing - 1] = true; > + replacesAtt[Anum_pg_attribute_atthasmissing - 1] = true; > + valuesAtt[Anum_pg_attribute_attmissingval - 1] = missingval; > + replacesAtt[Anum_pg_attribute_attmissingval - 1] = true; > + nullsAtt[Anum_pg_attribute_attmissingval - 1] = missingIsNull; > + } > + atttup = heap_modify_tuple(atttup, RelationGetDescr(attrrel), > + valuesAtt, nullsAtt, replacesAtt); > + This seems weird. We've above formed a tuple, now we just modify it again? > @@ -2296,7 +2377,14 @@ AddRelationNewConstraints(Relation rel, > (IsA(expr, Const) &&((Const *) expr)->constisnull)) > continue; > > - defOid = StoreAttrDefault(rel, colDef->attnum, expr, is_internal); > + /* If the default is volatile we cannot use a missing value */ > + if (contain_volatile_functions((Node *) expr)) > + { > + colDef->missingMode = false; > + } > + defOid = StoreAttrDefault(rel, colDef->attnum, expr, is_internal, > + colDef->missingMode); Doesn't this have to be immutable rather than just stable? I mean just storing the value of now() doesn't seem like it'd do the trick? > @@ -663,7 +671,23 @@ ExecFetchSlotTuple(TupleTableSlot *slot) > * If we have a regular physical tuple then just return it. > */ > if (TTS_HAS_PHYSICAL_TUPLE(slot)) > - return slot->tts_tuple; > + { > + if (HeapTupleHeaderGetNatts(slot->tts_tuple->t_data) < > + slot->tts_tupleDescriptor->natts) > + { > + MemoryContext oldContext = MemoryContextSwitchTo(slot->tts_mcxt); > + > + slot->tts_tuple = heap_expand_tuple(slot->tts_tuple, > + slot->tts_tupleDescriptor); > + slot->tts_shouldFree = true; > + MemoryContextSwitchTo(oldContext); > + return slot->tts_tuple; > + } > + else > + { > + return slot->tts_tuple; > + } > + } I'm extremely dubious about this. HeapTupleHeaderGetNatts() commonly causes cachemisses and therefore causes slowdowns when unnecessarily done. Additionally here we're expanding the tuple to include columns that aren't even likely to be accessed. This is a fairly hot codepath, and made hugely more expensive by this. > @@ -508,6 +508,8 @@ tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc > return false; /* out of order */ > if (att_tup->attisdropped) > return false; /* table contains dropped columns */ > + if (att_tup->atthasmissing) > + return false; /* table contains cols with missing values */ Hm, so incrementally building a table definitions with defaults, even if there aren't ever any rows, or if there's a subsequent table rewrite, will cause slowdowns. If so, shouldn't a rewrite remove all atthasmissing values? > @@ -493,7 +494,10 @@ RelationBuildTupleDesc(Relation relation) > @@ -561,10 +568,73 @@ RelationBuildTupleDesc(Relation relation) > MemoryContextAllocZero(CacheMemoryContext, > relation->rd_rel->relnatts * > sizeof(AttrDefault)); > - attrdef[ndef].adnum = attp->attnum; > + attrdef[ndef].adnum = attnum; > attrdef[ndef].adbin = NULL; > + > ndef++; > } > + > + /* Likewise for a missing value */ > + if (attp->atthasmissing) > + { As I've written somewhere above, I don't think it's a good idea to do this preemptively. Tupledescs are very commonly copied, and the missing attributes are quite likely never used. IOW, we should just remember the attmissingattr value and fill in the corresponding attmissingval on-demand. > + Datum missingval; > + bool missingNull; > + > + if (attrmiss == NULL) > + attrmiss = (AttrMissing *) > + MemoryContextAllocZero(CacheMemoryContext, > + relation->rd_rel->relnatts * > + sizeof(AttrMissing)); > + > + /* Do we have a missing value? */ > + missingval = SysCacheGetAttr(ATTNUM, pg_attribute_tuple, > + Anum_pg_attribute_attmissingval, > + &missingNull); Shouldn't this be a normal heap_getattr rather than a SysCacheGetAttr access? * SysCacheGetAttr * * Given a tuple previously fetched by SearchSysCache(), * extract a specific attribute. > @@ -620,7 +691,19 @@ RelationBuildTupleDesc(Relation relation) > AttrDefaultFetch(relation); > } > else > + { > constr->num_defval = 0; > + } > + > + if (nmissing > 0) > + { > + if (nmissing < relation->rd_rel->relnatts) > + constr->missing = (AttrMissing *) > + repalloc(attrmiss, nmissing * sizeof(AttrMissing)); > + else > + constr->missing = attrmiss; > + } > + constr->num_missing = nmissing; Am I understand correctly that you're *downsizing* the allocation with repalloc here? That seems likely to commonly lead to some wastage... > @@ -4059,10 +4142,6 @@ AttrDefaultFetch(Relation relation) > > systable_endscan(adscan); > heap_close(adrel, AccessShareLock); > - > - if (found != ndef) > - elog(WARNING, "%d attrdef record(s) missing for rel %s", > - ndef - found, RelationGetRelationName(relation)); > } Hm, it's not obvious why this is a good thing? > diff --git a/src/include/access/tupdesc.h b/src/include/access/tupdesc.h > index 415efbab97..ed30f98e33 100644 > --- a/src/include/access/tupdesc.h > +++ b/src/include/access/tupdesc.h > @@ -14,6 +14,7 @@ > #ifndef TUPDESC_H > #define TUPDESC_H > > +#include "postgres.h" > #include "access/attnum.h" postgres.h should never be included in headers, why was it added here? This doesn't seem ready yet. Greetings, Andres Freund
On Wed, Feb 21, 2018 at 7:48 PM, Andres Freund <andres@anarazel.de> wrote: > Hi, > [ Long and useful review] > This doesn't seem ready yet. > Thanks. I'm working through the issues you raised. Will reply in a few days. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Feb 21, 2018 at 7:48 PM, Andres Freund <andres@anarazel.de> wrote: > Hi, Comments interspersed. > > On 2018-02-20 17:03:07 +1030, Andrew Dunstan wrote: >> + <row> >> + <entry><structfield>attmissingval</structfield></entry> >> + <entry><type>bytea</type></entry> >> + <entry></entry> >> + <entry> >> + This column has a binary representation of the value used when the >> + column is entirely missing from the row, as happens when the column is >> + added after the row is created. The value is only used when >> + <structname>atthasmissing</structname> is true. >> + </entry> >> + </row> > > Hm. Why is this a bytea, and why is that a good idea? In pg_statistics, > which has to deal with something similar-ish, we deal with the ambiguity > of the storage by storing anyarrays, which then can display the contents > properly thanks to the embedded oid. Now it'd be a bit sad to store a > one element array in the table. But that seems better than storing a > bytea that's typeless and can't be viewed reasonably. The best solution > would be to create a variadic any type that can actually be stored, but > I see why you'd not want to go there necessarily. The one-element array idea is a slight hack, but it's not that bad, and I agree a good deal better than using a bytea. I've made that change. > > But also, a bit higher level, is it a good idea to store this > information directly into pg_attribute? That table is already one of the > hotter system catalog tables, and very frequently the largest. If we > suddenly end up storing a lot of default values in there, possibly ones > that aren't ever actually used because all the default values are long > since actually stored in the table, we'll spend more time doing cache > lookups. > > I'm somewhat tempted to say that the default data should be in a > separate catalog table. Or possibly use pg_attrdef. > This was debated quite some time ago. See for example <https://www.postgresql.org/message-id/22999.1475770835%40sss.pgh.pa.us> The consensus then seemed to be to store them in pg_attribute. If not, I think we'd need a new catalog - pg_attrdef doesn't seem right. They would have to go on separate rows, and we'd be storing values rather than expressions, so it would get somewhat ugly. >> >> +/* >> + * Return the missing value of an attribute, or NULL if there isn't one. >> + */ >> +static Datum >> +getmissingattr(TupleDesc tupleDesc, >> + int attnum, bool *isnull) >> +{ >> + int missingnum; >> + Form_pg_attribute att; >> + AttrMissing *attrmiss; >> + >> + Assert(attnum <= tupleDesc->natts); >> + Assert(attnum > 0); >> + >> + att = TupleDescAttr(tupleDesc, attnum - 1); >> + >> + if (att->atthasmissing) >> + { >> + Assert(tupleDesc->constr); >> + Assert(tupleDesc->constr->num_missing > 0); >> + >> + attrmiss = tupleDesc->constr->missing; >> + >> + /* >> + * Look through the tupledesc's attribute missing values >> + * for the one that corresponds to this attribute. >> + */ >> + for (missingnum = tupleDesc->constr->num_missing - 1; >> + missingnum >= 0; missingnum--) >> + { >> + if (attrmiss[missingnum].amnum == attnum) >> + { >> + if (attrmiss[missingnum].ammissingNull) >> + { >> + *isnull = true; >> + return (Datum) 0; >> + } >> + else >> + { >> + *isnull = false; >> + return attrmiss[missingnum].ammissing; >> + } >> + } >> + } > > I think requiring this is a bad idea. If, and only if, there's default > attributes with missing values we should build an array that can be > directly indexed by attribute number, accessing the missing value. What > you're doing here is essentially O(n^2) afaict, with n being the number > of columns missing values. > > Done >> +/* >> + * Fill in missing values for a TupleTableSlot >> + */ >> +static void >> +slot_getmissingattrs(TupleTableSlot *slot, int startAttNum, int lastAttNum) >> +{ >> + AttrMissing *attrmiss; >> + int missingnum; >> + int missattnum; >> + int i; >> + >> + /* initialize all the missing attributes to null */ >> + for (missattnum = lastAttNum - 1; missattnum >= startAttNum; missattnum--) >> + { >> + slot->tts_values[missattnum] = PointerGetDatum(NULL); >> + slot->tts_isnull[missattnum] = true; >> + } > > Any reason not to memset this in one (well two) go with memset? Also, > how come you're using PointerGetDatum()? Normally we just use (Datum) 0, > imo it's confusing to use PointerGetDatum(), as it implies things that > are quite possibly not the case. > > rewritten. >> +/* >> + * Fill in one attribute, either a data value or a bit in the null bitmask >> + */ > > Imo this isn't sufficient documentation to explain what this function > does. Even if you just add "per-attribute helper for heap_fill_tuple > and other routines building tuples" or such, I think it'd be clearer. Changed that way. > >> +static inline void >> +fill_val(Form_pg_attribute att, >> + bits8 **bit, >> + int *bitmask, >> + char **dataP, >> + uint16 *infomask, >> + Datum datum, >> + bool isnull) >> +{ >> + Size data_length; >> + char *data = *dataP; >> + >> + /* >> + * If we're building a null bitmap, set the appropriate bit for the >> + * current column value here. >> + */ >> + if (bit != NULL) >> + { >> + if (*bitmask != HIGHBIT) >> + *bitmask <<= 1; >> + else >> + { >> + *bit += 1; >> + **bit = 0x0; >> + *bitmask = 1; >> + } >> + >> + if (isnull) >> + { >> + *infomask |= HEAP_HASNULL; >> + return; >> + } >> + >> + **bit |= *bitmask; >> + } >> + >> + Assert(att); > > Why isn't this asserted at the beginning? > > I don't think it's worth very much anyway, so I just removed it. > >> @@ -293,10 +408,18 @@ heap_fill_tuple(TupleDesc tupleDesc, >> * ---------------- >> */ >> bool >> -heap_attisnull(HeapTuple tup, int attnum) >> +heap_attisnull(HeapTuple tup, int attnum, TupleDesc tupleDesc) >> { >> + /* >> + * We allow a NULL tupledesc for relations not expected to have >> + * missing values, such as catalog relations and indexes. >> + */ >> + Assert(!tupleDesc || attnum <= tupleDesc->natts); > > This seems fairly likely to hide bugs. > How? There are plenty of cases where we simply expect quite reasonably that there won't be any missing attributes, and we don't need a tupleDesc at all in such cases. > >> if (attnum > (int) HeapTupleHeaderGetNatts(tup->t_data)) >> - return true; >> + { >> + return !(tupleDesc && >> + TupleDescAttr(tupleDesc, attnum - 1)->atthasmissing); > > Brrr. > > rewritten. >> +/* >> + * Expand a tuple with missing values, or NULLs. The source tuple must have >> + * less attributes than the required number. Only one of targetHeapTuple >> + * and targetMinimalTuple may be supplied. The other argument must be NULL. >> + */ > > I can't quite make head or tails of this comment "with missing values, or NULLs"? > > Rewritten. >> +static void >> +expand_tuple(HeapTuple *targetHeapTuple, >> + MinimalTuple *targetMinimalTuple, >> + HeapTuple sourceTuple, >> + TupleDesc tupleDesc) >> +{ >> + AttrMissing *attrmiss = NULL; >> + int attnum; >> + int missingnum; >> + int firstmissingnum = 0; >> + int num_missing = 0; >> + bool hasNulls = HeapTupleHasNulls(sourceTuple); >> + HeapTupleHeader targetTHeader; >> + HeapTupleHeader sourceTHeader = sourceTuple->t_data; >> + int sourceNatts = HeapTupleHeaderGetNatts(sourceTHeader); >> + int natts = tupleDesc->natts; >> + int sourceIndicatorLen; >> + int targetIndicatorLen; >> + Size sourceDataLen = sourceTuple->t_len - sourceTHeader->t_hoff; >> + Size targetDataLen; >> + Size len; >> + int hoff; >> + bits8 *nullBits = NULL; >> + int bitMask = 0; >> + char *targetData; >> + uint16 *infoMask; >> + >> + Assert((targetHeapTuple && !targetMinimalTuple) >> + || (!targetHeapTuple && targetMinimalTuple)); >> + >> + Assert(sourceNatts < natts); >> + >> + sourceIndicatorLen = (hasNulls ? BITMAPLEN(sourceNatts) : 0); > > Maybe I'm just grumpy right now (did response work on 2016 taxes just > now, brrrrr), but I don't find "indicatorLen" a descriptive term. > > changed >> + targetDataLen = sourceDataLen; >> + >> + if (tupleDesc->constr && >> + tupleDesc->constr->num_missing) >> + { >> + /* >> + * If there are missing values we want to put them into the tuple. >> + * Before that we have to compute the extra length for the values >> + * array and the variable length data. >> + */ >> + num_missing = tupleDesc->constr->num_missing; >> + attrmiss = tupleDesc->constr->missing; > > You're accessing a hell of a lot of expensive stuff outside outside of > this if block. E.g. HeapTupleHeaderGetNatts specifically is the prime > source of cache misses in a lot of workloads. Now a smart compiler might > be able to optimize this away, but... > > >> + /* >> + * Find the first item in attrmiss for which we don't have a value in >> + * the source (i.e. where its amnum is > sourceNatts). We can ignore >> + * all the missing entries before that. >> + */ >> + for (firstmissingnum = 0; >> + firstmissingnum < num_missing >> + && attrmiss[firstmissingnum].amnum <= sourceNatts; >> + firstmissingnum++) >> + { /* empty */ >> + } >> + >> + /* >> + * If there are no more missing values everything else must be >> + * NULL >> + */ >> + if (firstmissingnum >= num_missing) >> + { >> + hasNulls = true; >> + } >> + >> + /* >> + * Now walk the missing attributes one by one. If there is a >> + * missing value make space for it. Otherwise, it's going to be NULL. >> + */ >> + for (attnum = sourceNatts + 1; >> + attnum <= natts; >> + attnum++) >> + { >> + Form_pg_attribute att = TupleDescAttr(tupleDesc, attnum - 1); >> + >> + for (missingnum = firstmissingnum; missingnum < num_missing; missingnum++) >> + { >> + >> + /* >> + * If there is a missing value for this attribute calculate >> + * the required space if it's not NULL. >> + */ >> + if (attrmiss[missingnum].amnum == attnum) >> + { >> + if (attrmiss[missingnum].ammissingNull) >> + hasNulls = true; >> + else >> + { >> + targetDataLen = att_align_datum(targetDataLen, >> + att->attalign, >> + att->attlen, >> + attrmiss[missingnum].ammissing); >> + >> + targetDataLen = att_addlength_pointer(targetDataLen, >> + att->attlen, >> + attrmiss[missingnum].ammissing); >> + } >> + break; >> + } >> + } >> + if (missingnum > num_missing) >> + { >> + /* there is no missing value for this attribute */ >> + hasNulls = true; >> + } >> + } >> + } /* end if have missing values */ >> + else >> + { >> + /* >> + * If there are no missing values at all then NULLS must be allowed, >> + * since some of the attributes are known to be absent. >> + */ >> + hasNulls = true; > > Why are we doing anything at all in this branch? ISTM we're reforming > the tuple for absolutely no gain here, just making it larger by > including additionall null bits? > > >> + /* >> + * Allocate and zero the space needed. Note that the tuple body and >> + * HeapTupleData management structure are allocated in one chunk. >> + */ >> + if (targetHeapTuple) >> + { >> + len += offsetof(HeapTupleHeaderData, t_bits); >> + hoff = len = MAXALIGN(len); /* align user data safely */ >> + len += targetDataLen; > > So len isn't the length of the tuple, but the null bitmap length? That > seems, uh, non-obvious. > > >> + *targetHeapTuple = (HeapTuple) palloc0(HEAPTUPLESIZE + len); >> + (*targetHeapTuple)->t_data >> + = targetTHeader >> + = (HeapTupleHeader) ((char *) *targetHeapTuple + HEAPTUPLESIZE); > > Yuck. > > >> + (*targetHeapTuple)->t_len = len; >> + (*targetHeapTuple)->t_tableOid = sourceTuple->t_tableOid; >> + ItemPointerSetInvalid(&((*targetHeapTuple)->t_self)); > > Seems like a lot of this code would be more readable by storing > *targetHeapTuple in a local var. > > >> + targetTHeader->t_infomask = sourceTHeader->t_infomask; >> + targetTHeader->t_hoff = hoff; >> + HeapTupleHeaderSetNatts(targetTHeader, natts); >> + HeapTupleHeaderSetDatumLength(targetTHeader, len); >> + HeapTupleHeaderSetTypeId(targetTHeader, tupleDesc->tdtypeid); >> + HeapTupleHeaderSetTypMod(targetTHeader, tupleDesc->tdtypmod); >> + /* We also make sure that t_ctid is invalid unless explicitly set */ >> + ItemPointerSetInvalid(&(targetTHeader->t_ctid)); > > This seems fairly expensive, why aren't we just memcpy'ing the old > header into the new one? > > > > >> + if (targetIndicatorLen > 0) >> + { >> + if (sourceIndicatorLen > 0) >> + { >> + /* if bitmap pre-existed copy in - all is set */ > > can't parse. > >> + memcpy(nullBits, >> + ((char *) sourceTHeader) >> + + offsetof(HeapTupleHeaderData, t_bits), >> + sourceIndicatorLen); >> + nullBits += sourceIndicatorLen - 1; >> + } >> + else >> + { >> + sourceIndicatorLen = BITMAPLEN(sourceNatts); >> + /* Set NOT NULL for all existing attributes */ >> + memset(nullBits, 0xff, sourceIndicatorLen); >> + >> + nullBits += sourceIndicatorLen - 1; >> + >> + if (sourceNatts & 0x07) >> + { >> + /* build the mask (inverted!) */ >> + bitMask = 0xff << (sourceNatts & 0x07); >> + /* Voila */ >> + *nullBits = ~bitMask; >> + } > > It's late, I'm tired. But how can this be correct? Also, this code is > massively under-commented. > > I'm going to revisit expand_tuple from scratch. > >> + >> +/* >> + * Fill in the missing values for an ordinary HeapTuple >> + */ >> +MinimalTuple >> +minimal_expand_tuple(HeapTuple sourceTuple, TupleDesc tupleDesc) >> +{ >> + MinimalTuple minimalTuple; >> + >> + expand_tuple(NULL, &minimalTuple, sourceTuple, tupleDesc); >> + return minimalTuple; >> +} >> + >> +/* >> + * Fill in the missing values for a minimal HeapTuple >> + */ >> +HeapTuple >> +heap_expand_tuple(HeapTuple sourceTuple, TupleDesc tupleDesc) >> +{ >> + HeapTuple heapTuple; >> + >> + expand_tuple(&heapTuple, NULL, sourceTuple, tupleDesc); >> + return heapTuple; >> +} > > Comments appear to be swapped. > > Fixed. > >> /* ---------------- >> * heap_copy_tuple_as_datum >> * >> @@ -1012,13 +1422,10 @@ heap_deform_tuple(HeapTuple tuple, TupleDesc tupleDesc, >> >> /* >> * If tuple doesn't have all the atts indicated by tupleDesc, read the >> - * rest as null >> + * rest as null, or a missing value if there is one. >> */ > > "a missing value" isn't there quite possibly multiple ones? > > Rewritten >> /* >> @@ -1192,8 +1599,7 @@ slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull) >> tup = tuple->t_data; >> if (attnum > HeapTupleHeaderGetNatts(tup)) >> { >> - *isnull = true; >> - return (Datum) 0; >> + return getmissingattr(slot->tts_tupleDescriptor, attnum, isnull); >> } > > This turned a ~4 instruction case into considerably more. I've recently > removed one of the major slot_getattr sources (hashagg comparisons), but > we still call this super frequently for hashagg's hash computations. I > suggest micro-benchmarking that case. > > I'll look at it. I have rewritten getmissingattr along the line you've suggested, so it should be more efficient, and most compilers will inline it. > >> Oid >> StoreAttrDefault(Relation rel, AttrNumber attnum, >> - Node *expr, bool is_internal) >> + Node *expr, bool is_internal, bool add_column_mode) >> { >> char *adbin; >> char *adsrc; >> @@ -1939,9 +1945,20 @@ StoreAttrDefault(Relation rel, AttrNumber attnum, >> Relation attrrel; >> HeapTuple atttup; >> Form_pg_attribute attStruct; >> + Form_pg_attribute defAttStruct; >> Oid attrdefOid; >> ObjectAddress colobject, >> defobject; >> + ExprState *exprState; >> + Expr *expr2 = (Expr *) expr; >> + EState *estate = NULL; >> + ExprContext *econtext; >> + char *missingBuf = NULL; >> + Datum valuesAtt[Natts_pg_attribute]; >> + bool nullsAtt[Natts_pg_attribute]; >> + bool replacesAtt[Natts_pg_attribute]; >> + Datum missingval = (Datum) 0; >> + bool missingIsNull = true; >> >> /* >> * Flatten expression to string form for storage. >> @@ -1956,6 +1973,44 @@ StoreAttrDefault(Relation rel, AttrNumber attnum, >> RelationGetRelid(rel)), >> false, false); >> >> + /* >> + * Compute the missing value >> + */ >> + expr2 = expression_planner(expr2); >> + >> + exprState = ExecInitExpr(expr2, NULL); > > This should instead use ExecPrepareExpr() > Fixed > >> + if (add_column_mode) >> + { >> + missingval = ExecEvalExpr(exprState, econtext, >> + &missingIsNull); >> + >> + defAttStruct = TupleDescAttr(rel->rd_att, attnum - 1); >> + >> + if (missingIsNull) >> + { >> + missingval = PointerGetDatum(NULL); > > (Datum) 0; > > >> + } >> + else if (defAttStruct->attbyval) >> + { >> + missingBuf = palloc(VARHDRSZ + sizeof(Datum)); >> + memcpy(VARDATA(missingBuf), &missingval, sizeof(Datum)); >> + SET_VARSIZE(missingBuf, VARHDRSZ + sizeof(Datum)); >> + missingval = PointerGetDatum(missingBuf); >> + } >> + else if (defAttStruct->attlen >= 0) >> + { >> + missingBuf = palloc(VARHDRSZ + defAttStruct->attlen); >> + memcpy(VARDATA(missingBuf), DatumGetPointer(missingval), >> + defAttStruct->attlen); >> + SET_VARSIZE(missingBuf, >> + VARHDRSZ + defAttStruct->attlen); >> + missingval = PointerGetDatum(missingBuf); >> + } > > What if we get an extended datum back in the varlena case? Even if we > wouldn't change the storage from bytea to something else, you really > would need to force detoasting here. > This is now rewritten. > >> + } >> + >> /* >> * Make the pg_attrdef entry. >> */ >> @@ -1996,7 +2051,22 @@ StoreAttrDefault(Relation rel, AttrNumber attnum, >> attStruct = (Form_pg_attribute) GETSTRUCT(atttup); >> if (!attStruct->atthasdef) >> { >> - attStruct->atthasdef = true; >> + MemSet(valuesAtt, 0, sizeof(valuesAtt)); >> + MemSet(nullsAtt, false, sizeof(nullsAtt)); >> + MemSet(replacesAtt, false, sizeof(replacesAtt)); >> + valuesAtt[Anum_pg_attribute_atthasdef - 1] = true; >> + replacesAtt[Anum_pg_attribute_atthasdef - 1] = true; >> + if (add_column_mode) >> + { >> + valuesAtt[Anum_pg_attribute_atthasmissing - 1] = true; >> + replacesAtt[Anum_pg_attribute_atthasmissing - 1] = true; >> + valuesAtt[Anum_pg_attribute_attmissingval - 1] = missingval; >> + replacesAtt[Anum_pg_attribute_attmissingval - 1] = true; >> + nullsAtt[Anum_pg_attribute_attmissingval - 1] = missingIsNull; >> + } >> + atttup = heap_modify_tuple(atttup, RelationGetDescr(attrrel), >> + valuesAtt, nullsAtt, replacesAtt); >> + > > This seems weird. We've above formed a tuple, now we just modify it > again? > > I'll take a look at making it more straightforward. But It's not as simple as the atthasdef case, since attmissing is part of the variable portion of pg_attribute. >> @@ -2296,7 +2377,14 @@ AddRelationNewConstraints(Relation rel, >> (IsA(expr, Const) &&((Const *) expr)->constisnull)) >> continue; >> >> - defOid = StoreAttrDefault(rel, colDef->attnum, expr, is_internal); >> + /* If the default is volatile we cannot use a missing value */ >> + if (contain_volatile_functions((Node *) expr)) >> + { >> + colDef->missingMode = false; >> + } >> + defOid = StoreAttrDefault(rel, colDef->attnum, expr, is_internal, >> + colDef->missingMode); > > Doesn't this have to be immutable rather than just stable? I mean just > storing the value of now() doesn't seem like it'd do the trick? > > I don't see why. We've stated explicitly in the docs that the existing rows get the value of the default expression as evaluated at the time the column is created. I don't think there is any need to restrict that to an immutable expression. > >> @@ -663,7 +671,23 @@ ExecFetchSlotTuple(TupleTableSlot *slot) >> * If we have a regular physical tuple then just return it. >> */ >> if (TTS_HAS_PHYSICAL_TUPLE(slot)) >> - return slot->tts_tuple; >> + { >> + if (HeapTupleHeaderGetNatts(slot->tts_tuple->t_data) < >> + slot->tts_tupleDescriptor->natts) >> + { >> + MemoryContext oldContext = MemoryContextSwitchTo(slot->tts_mcxt); >> + >> + slot->tts_tuple = heap_expand_tuple(slot->tts_tuple, >> + slot->tts_tupleDescriptor); >> + slot->tts_shouldFree = true; >> + MemoryContextSwitchTo(oldContext); >> + return slot->tts_tuple; >> + } >> + else >> + { >> + return slot->tts_tuple; >> + } >> + } > > I'm extremely dubious about this. HeapTupleHeaderGetNatts() commonly > causes cachemisses and therefore causes slowdowns when unnecessarily > done. Additionally here we're expanding the tuple to include columns > that aren't even likely to be accessed. This is a fairly hot codepath, > and made hugely more expensive by this. Will look further. > > >> @@ -508,6 +508,8 @@ tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc >> return false; /* out of order */ >> if (att_tup->attisdropped) >> return false; /* table contains dropped columns */ >> + if (att_tup->atthasmissing) >> + return false; /* table contains cols with missing values */ > > Hm, so incrementally building a table definitions with defaults, even if > there aren't ever any rows, or if there's a subsequent table rewrite, > will cause slowdowns. If so, shouldn't a rewrite remove all > atthasmissing values? > > Yes, I think so. Working on it. OTOH, I'm somewhat mystified by having had to make this change. Still, it looks like dropping a column has the same effect. > > >> @@ -493,7 +494,10 @@ RelationBuildTupleDesc(Relation relation) > >> @@ -561,10 +568,73 @@ RelationBuildTupleDesc(Relation relation) >> MemoryContextAllocZero(CacheMemoryContext, >> relation->rd_rel->relnatts * >> sizeof(AttrDefault)); >> - attrdef[ndef].adnum = attp->attnum; >> + attrdef[ndef].adnum = attnum; >> attrdef[ndef].adbin = NULL; >> + >> ndef++; >> } >> + >> + /* Likewise for a missing value */ >> + if (attp->atthasmissing) >> + { > > As I've written somewhere above, I don't think it's a good idea to do > this preemptively. Tupledescs are very commonly copied, and the > missing attributes are quite likely never used. IOW, we should just > remember the attmissingattr value and fill in the corresponding > attmissingval on-demand. > > Defaults are frequently not used either, yet this function fills those in regardless. I don't see why we need to treat missing values differently. >> + Datum missingval; >> + bool missingNull; >> + >> + if (attrmiss == NULL) >> + attrmiss = (AttrMissing *) >> + MemoryContextAllocZero(CacheMemoryContext, >> + relation->rd_rel->relnatts * >> + sizeof(AttrMissing)); >> + >> + /* Do we have a missing value? */ >> + missingval = SysCacheGetAttr(ATTNUM, pg_attribute_tuple, >> + Anum_pg_attribute_attmissingval, >> + &missingNull); > > Shouldn't this be a normal heap_getattr rather than a SysCacheGetAttr > access? > > * SysCacheGetAttr > * > * Given a tuple previously fetched by SearchSysCache(), > * extract a specific attribute. > > Fixed. >> @@ -620,7 +691,19 @@ RelationBuildTupleDesc(Relation relation) >> AttrDefaultFetch(relation); >> } >> else >> + { >> constr->num_defval = 0; >> + } >> + >> + if (nmissing > 0) >> + { >> + if (nmissing < relation->rd_rel->relnatts) >> + constr->missing = (AttrMissing *) >> + repalloc(attrmiss, nmissing * sizeof(AttrMissing)); >> + else >> + constr->missing = attrmiss; >> + } >> + constr->num_missing = nmissing; > > Am I understand correctly that you're *downsizing* the allocation with > repalloc here? That seems likely to commonly lead to some wastage... > This code is gone. But note the similar code nearby that is already there. >> @@ -4059,10 +4142,6 @@ AttrDefaultFetch(Relation relation) >> >> systable_endscan(adscan); >> heap_close(adrel, AccessShareLock); >> - >> - if (found != ndef) >> - elog(WARNING, "%d attrdef record(s) missing for rel %s", >> - ndef - found, RelationGetRelationName(relation)); >> } > > Hm, it's not obvious why this is a good thing? > > > >> diff --git a/src/include/access/tupdesc.h b/src/include/access/tupdesc.h >> index 415efbab97..ed30f98e33 100644 >> --- a/src/include/access/tupdesc.h >> +++ b/src/include/access/tupdesc.h >> @@ -14,6 +14,7 @@ >> #ifndef TUPDESC_H >> #define TUPDESC_H >> >> +#include "postgres.h" >> #include "access/attnum.h" > > postgres.h should never be included in headers, why was it added here? > > Fixed. > This doesn't seem ready yet. > I hope we're a lot closer now. I'm working on rejigging expand_tuple and on clearing the missing attributes on a table rewrite. Attached is the current state of things, with the fixes indicated above, as well as some things pointed out by David Rowley. None of this seems to have had much effect on performance. Here's what oprofile reports from a run of pgbench with create-alter.sql and the query "select sum(c1000) from t": Profiling through timer interrupt samples % image name symbol name 22584 28.5982 postgres ExecInterpExpr 11950 15.1323 postgres slot_getmissingattrs 5471 6.9279 postgres AllocSetAlloc 3018 3.8217 postgres TupleDescInitEntry 2042 2.5858 postgres MemoryContextAllocZeroAligned 1807 2.2882 postgres SearchCatCache1 1638 2.0742 postgres expression_tree_walker That's very different from what we see on the master branch. The time spent in slot_getmissingattrs is perhaps not unexpected, but I don't (at least yet) understand why we're getting so much time spent in ExecInterpExpr, which doesn't show up at all when the same benchmark is run on master. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hi, On 2018-02-27 14:29:44 +1030, Andrew Dunstan wrote: > This was debated quite some time ago. See for example > <https://www.postgresql.org/message-id/22999.1475770835%40sss.pgh.pa.us> > The consensus then seemed to be to store them in pg_attribute. If not, > I think we'd need a new catalog - pg_attrdef doesn't seem right. They > would have to go on separate rows, and we'd be storing values rather > than expressions, so it would get somewhat ugly. I know that I'm concerned with storing a number of large values in pg_attributes, and I haven't seen that argument addressed much in a quick scan of the discussion. > >> @@ -293,10 +408,18 @@ heap_fill_tuple(TupleDesc tupleDesc, > >> * ---------------- > >> */ > >> bool > >> -heap_attisnull(HeapTuple tup, int attnum) > >> +heap_attisnull(HeapTuple tup, int attnum, TupleDesc tupleDesc) > >> { > >> + /* > >> + * We allow a NULL tupledesc for relations not expected to have > >> + * missing values, such as catalog relations and indexes. > >> + */ > >> + Assert(!tupleDesc || attnum <= tupleDesc->natts); > > > > This seems fairly likely to hide bugs. > How? There are plenty of cases where we simply expect quite reasonably > that there won't be any missing attributes, and we don't need a > tupleDesc at all in such cases. Missing to pass a tupledesc for tables that can have defaults with not have directly visible issues, you'll just erroneously get back a null. > >> @@ -508,6 +508,8 @@ tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc > >> return false; /* out of order */ > >> if (att_tup->attisdropped) > >> return false; /* table contains dropped columns */ > >> + if (att_tup->atthasmissing) > >> + return false; /* table contains cols with missing values */ > > > > Hm, so incrementally building a table definitions with defaults, even if > > there aren't ever any rows, or if there's a subsequent table rewrite, > > will cause slowdowns. If so, shouldn't a rewrite remove all > > atthasmissing values? > Yes, I think so. Working on it. OTOH, I'm somewhat mystified by having > had to make this change. Still, it looks like dropping a column has > the same effect. What exactly is mystifying you? That the new default stuff doesn't work when the physical tlist optimization is in use? > >> @@ -561,10 +568,73 @@ RelationBuildTupleDesc(Relation relation) > >> MemoryContextAllocZero(CacheMemoryContext, > >> relation->rd_rel->relnatts * > >> sizeof(AttrDefault)); > >> - attrdef[ndef].adnum = attp->attnum; > >> + attrdef[ndef].adnum = attnum; > >> attrdef[ndef].adbin = NULL; > >> + > >> ndef++; > >> } > >> + > >> + /* Likewise for a missing value */ > >> + if (attp->atthasmissing) > >> + { > > > > As I've written somewhere above, I don't think it's a good idea to do > > this preemptively. Tupledescs are very commonly copied, and the > > missing attributes are quite likely never used. IOW, we should just > > remember the attmissingattr value and fill in the corresponding > > attmissingval on-demand. > Defaults are frequently not used either, yet this function fills those > in regardless. I don't see why we need to treat missing values > differently. It's already hurting us quite a bit in workloads with a lot of trivial queries. Adding more makes it worse. > Attached is the current state of things, with the fixes indicated > above, as well as some things pointed out by David Rowley. > > None of this seems to have had much effect on performance. > > Here's what oprofile reports from a run of pgbench with > create-alter.sql and the query "select sum(c1000) from t": > > Profiling through timer interrupt > samples % image name symbol name > 22584 28.5982 postgres ExecInterpExpr > 11950 15.1323 postgres slot_getmissingattrs > 5471 6.9279 postgres AllocSetAlloc > 3018 3.8217 postgres TupleDescInitEntry > 2042 2.5858 postgres MemoryContextAllocZeroAligned > 1807 2.2882 postgres SearchCatCache1 > 1638 2.0742 postgres expression_tree_walker > > > That's very different from what we see on the master branch. The time > spent in slot_getmissingattrs is perhaps not unexpected, but I don't > (at least yet) understand why we're getting so much time spent in > ExecInterpExpr, which doesn't show up at all when the same benchmark > is run on master. I'd guess that's because the physical tlist optimization is disabled. I assume you'd see something similar on master if you dropped a column. - Andres
On Wed, Feb 28, 2018 at 5:39 AM, Andres Freund <andres@anarazel.de> wrote: > Hi, > > On 2018-02-27 14:29:44 +1030, Andrew Dunstan wrote: >> This was debated quite some time ago. See for example >> <https://www.postgresql.org/message-id/22999.1475770835%40sss.pgh.pa.us> >> The consensus then seemed to be to store them in pg_attribute. If not, >> I think we'd need a new catalog - pg_attrdef doesn't seem right. They >> would have to go on separate rows, and we'd be storing values rather >> than expressions, so it would get somewhat ugly. > > I know that I'm concerned with storing a number of large values in > pg_attributes, and I haven't seen that argument addressed much in a > quick scan of the discussion. > > I'd like to have a consensus on this before I start making such a change as adding a new catalog table. >> >> @@ -293,10 +408,18 @@ heap_fill_tuple(TupleDesc tupleDesc, >> >> * ---------------- >> >> */ >> >> bool >> >> -heap_attisnull(HeapTuple tup, int attnum) >> >> +heap_attisnull(HeapTuple tup, int attnum, TupleDesc tupleDesc) >> >> { >> >> + /* >> >> + * We allow a NULL tupledesc for relations not expected to have >> >> + * missing values, such as catalog relations and indexes. >> >> + */ >> >> + Assert(!tupleDesc || attnum <= tupleDesc->natts); >> > >> > This seems fairly likely to hide bugs. > >> How? There are plenty of cases where we simply expect quite reasonably >> that there won't be any missing attributes, and we don't need a >> tupleDesc at all in such cases. > > Missing to pass a tupledesc for tables that can have defaults with not > have directly visible issues, you'll just erroneously get back a null. > > What would you prefer? Two versions of the function? >> >> @@ -508,6 +508,8 @@ tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc >> >> return false; /* out of order */ >> >> if (att_tup->attisdropped) >> >> return false; /* table contains dropped columns */ >> >> + if (att_tup->atthasmissing) >> >> + return false; /* table contains cols with missing values */ >> > >> > Hm, so incrementally building a table definitions with defaults, even if >> > there aren't ever any rows, or if there's a subsequent table rewrite, >> > will cause slowdowns. If so, shouldn't a rewrite remove all >> > atthasmissing values? > >> Yes, I think so. Working on it. OTOH, I'm somewhat mystified by having >> had to make this change. Still, it looks like dropping a column has >> the same effect. > > What exactly is mystifying you? That the new default stuff doesn't work > when the physical tlist optimization is in use? > I think I have a better handle on that now. I'm trying to think of a way around it, no massive inspiration yet. > >> >> @@ -561,10 +568,73 @@ RelationBuildTupleDesc(Relation relation) >> >> MemoryContextAllocZero(CacheMemoryContext, >> >> relation->rd_rel->relnatts * >> >> sizeof(AttrDefault)); >> >> - attrdef[ndef].adnum = attp->attnum; >> >> + attrdef[ndef].adnum = attnum; >> >> attrdef[ndef].adbin = NULL; >> >> + >> >> ndef++; >> >> } >> >> + >> >> + /* Likewise for a missing value */ >> >> + if (attp->atthasmissing) >> >> + { >> > >> > As I've written somewhere above, I don't think it's a good idea to do >> > this preemptively. Tupledescs are very commonly copied, and the >> > missing attributes are quite likely never used. IOW, we should just >> > remember the attmissingattr value and fill in the corresponding >> > attmissingval on-demand. > >> Defaults are frequently not used either, yet this function fills those >> in regardless. I don't see why we need to treat missing values >> differently. > > It's already hurting us quite a bit in workloads with a lot of trivial > queries. Adding more makes it worse. > > Please expand on your view of how we should "just remember the attmissingattr value and fill in the corresponding attmissingval on-demand." I don't mind changing how we do things, but I need a bit more detail than this. If copying defaults around has been hurting us so much it seems surprising nobody has put forward an improvement. >> Attached is the current state of things, with the fixes indicated >> above, as well as some things pointed out by David Rowley. >> >> None of this seems to have had much effect on performance. >> >> Here's what oprofile reports from a run of pgbench with >> create-alter.sql and the query "select sum(c1000) from t": >> >> Profiling through timer interrupt >> samples % image name symbol name >> 22584 28.5982 postgres ExecInterpExpr >> 11950 15.1323 postgres slot_getmissingattrs >> 5471 6.9279 postgres AllocSetAlloc >> 3018 3.8217 postgres TupleDescInitEntry >> 2042 2.5858 postgres MemoryContextAllocZeroAligned >> 1807 2.2882 postgres SearchCatCache1 >> 1638 2.0742 postgres expression_tree_walker >> >> >> That's very different from what we see on the master branch. The time >> spent in slot_getmissingattrs is perhaps not unexpected, but I don't >> (at least yet) understand why we're getting so much time spent in >> ExecInterpExpr, which doesn't show up at all when the same benchmark >> is run on master. > > I'd guess that's because the physical tlist optimization is disabled. I > assume you'd see something similar on master if you dropped a column. > Yeah. That's kinda ugly in fact. Here's a version of the patch which cleans up the missing attribute settings on a table rewrite. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 1 March 2018 at 11:49, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > On Wed, Feb 28, 2018 at 5:39 AM, Andres Freund <andres@anarazel.de> wrote: >> On 2018-02-27 14:29:44 +1030, Andrew Dunstan wrote: >>> Profiling through timer interrupt >>> samples % image name symbol name >>> 22584 28.5982 postgres ExecInterpExpr >>> 11950 15.1323 postgres slot_getmissingattrs >>> 5471 6.9279 postgres AllocSetAlloc >>> 3018 3.8217 postgres TupleDescInitEntry >>> 2042 2.5858 postgres MemoryContextAllocZeroAligned >>> 1807 2.2882 postgres SearchCatCache1 >>> 1638 2.0742 postgres expression_tree_walker >>> >>> >>> That's very different from what we see on the master branch. The time >>> spent in slot_getmissingattrs is perhaps not unexpected, but I don't >>> (at least yet) understand why we're getting so much time spent in >>> ExecInterpExpr, which doesn't show up at all when the same benchmark >>> is run on master. >> >> I'd guess that's because the physical tlist optimization is disabled. I >> assume you'd see something similar on master if you dropped a column. I put that to the test and there's still something fishy going on. I created some benchmark scripts to help out a bit. Basically, they'll allow you to choose how many columns you want in the test tables and how many rows to put in them. The following test is a 60 second single threaded pgbench test with 1000 columns, 400 rows, testing select sum(c10) from <table> *** Benchmarking normal table... tps = 1972.985639 (excluding connections establishing) *** Benchmarking missing table... tps = 433.779008 (excluding connections establishing) *** Benchmarking dropped table... tps = 3661.063986 (excluding connections establishing) The dropped table had 1001 columns, but the script drops the first. The missing table has all 1000 columns missing. In the normal table all tuples have all attrs. I imagine it should be possible to make the missing table case faster than either of the others. The reason it's slower is down to slot_getsomeattrs being called to get all 1000 attributes in all but the 2nd call to the function... ? whereas only the 10th attr is deformed in the Normal table case. Here's some perf output for each case: Normal: 15.88% postgres postgres [.] expression_tree_walker 7.82% postgres postgres [.] fix_expr_common 6.12% postgres [kernel.kallsyms] [k] __lock_text_start 5.45% postgres postgres [.] SearchCatCache3 3.98% postgres postgres [.] TupleDescInitEntry Missing: 47.80% postgres postgres [.] ExecInterpExpr 25.65% postgres postgres [.] slot_getmissingattrs 6.86% postgres postgres [.] build_tlist_index 4.43% postgres libc-2.17.so [.] __memset_sse2 2.57% postgres postgres [.] expression_tree_walker Dropped: 19.59% postgres postgres [.] slot_deform_tuple 6.63% postgres postgres [.] hash_search_with_hash_value 6.55% postgres postgres [.] heap_getnext 5.35% postgres postgres [.] ExecInterpExpr 4.79% postgres postgres [.] heap_page_prune_opt /me studies the code for a bit... Okay, it looks like the patch should disable physical tlists when there's a missing column the same way as we do for dropped columns. Patch attached. TPS looks much better now; *** Benchmarking missing table... tps = 5666.117627 (excluding connections establishing) everybody's going to want missing columns in their tables now... -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 6 March 2018 at 22:40, David Rowley <david.rowley@2ndquadrant.com> wrote: > Okay, it looks like the patch should disable physical tlists when > there's a missing column the same way as we do for dropped columns. > Patch attached. Please disregard the previous patch in favour of the attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Tue, Mar 6, 2018 at 8:15 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 6 March 2018 at 22:40, David Rowley <david.rowley@2ndquadrant.com> wrote: >> Okay, it looks like the patch should disable physical tlists when >> there's a missing column the same way as we do for dropped columns. >> Patch attached. > > Please disregard the previous patch in favour of the attached. > OK, nice work, thanks. We're making good progress. Two of the cases I was testing have gone from worse than master to better than master. Here are the results I got, all against Tomas' 100-col 64-row table. using pgbench -T 100: select sum(c1000) from t; fastdef tps = 4724.988619 master tps = 1590.843085 ============ select c1000 from t; fastdef tps = 5093.667203 master tps = 2437.613368 ============ select sum(c1000) from (select c1000 from t offset 0) x; fastdef tps = 3315.900091 master tps = 2067.574581 ============ select * from t; fastdef tps = 107.145811 master tps = 150.207957 ============ select sum(c1), count(c500), avg(c1000) from t; fastdef tps = 2304.636410 master tps = 1409.791975 ============ select sum(c10) from t; fastdef tps = 4332.625917 master tps = 2208.757119 "select * from t" used to be about a wash, but with this patch it's got worse. The last two queries were worse and are now better, so that's a win. I'm going to do a test to see if I can find the break-even point between the second query and the fourth. If it turns out to be at quite a large number of columns selected then I think it might be something we can live with. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 8 March 2018 at 18:40, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > select * from t; > fastdef tps = 107.145811 > master tps = 150.207957 > > "select * from t" used to be about a wash, but with this patch it's > got worse. The last two queries were worse and are now better, so > that's a win. How does it compare to master if you drop a column out the table? Physical tlists will be disabled in that case too. I imagine the performance of master will drop much lower than the all columns missing case. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 9 March 2018 at 02:11, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 8 March 2018 at 18:40, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: >> select * from t; >> fastdef tps = 107.145811 >> master tps = 150.207957 >> >> "select * from t" used to be about a wash, but with this patch it's >> got worse. The last two queries were worse and are now better, so >> that's a win. > > How does it compare to master if you drop a column out the table? > Physical tlists will be disabled in that case too. I imagine the > performance of master will drop much lower than the all columns > missing case. I decided to test this for myself, and the missing version is still slightly slower than the dropped column version, but not by much. I'm not personally concerned about this. The following results are with 1000 column tables with 64 rows each. *** Benchmarking normal table... tps = 232.794734 (excluding connections establishing) *** Benchmarking missing table... tps = 202.827754 (excluding connections establishing) *** Benchmarking dropped table... tps = 217.339255 (excluding connections establishing) There's nothing particularly interesting in the profiles for the missing and normal case either: -- Missing case for select * from missing; 12.87% postgres postgres [.] AllocSetAlloc 10.09% postgres libc-2.17.so [.] __strlen_sse2_pminub 6.35% postgres postgres [.] pq_sendcountedtext 5.97% postgres postgres [.] enlargeStringInfo 5.47% postgres postgres [.] ExecInterpExpr 4.45% postgres libc-2.17.so [.] __memcpy_ssse3_back 4.39% postgres postgres [.] palloc 4.31% postgres postgres [.] printtup 3.81% postgres postgres [.] pg_ltoa 3.72% postgres [kernel.kallsyms] [k] __lock_text_start 3.38% postgres postgres [.] FunctionCall1Coll 3.11% postgres postgres [.] int4out 2.97% postgres postgres [.] appendBinaryStringInfoNT 2.49% postgres postgres [.] slot_getmissingattrs 1.66% postgres postgres [.] pg_server_to_any 0.99% postgres postgres [.] MemoryContextAllocZeroAligned 0.94% postgres postgres [.] expression_tree_walker 0.93% postgres postgres [.] lappend 0.89% postgres [kernel.kallsyms] [k] __do_page_fault 0.72% postgres [kernel.kallsyms] [k] clear_page_c_e 0.72% postgres postgres [.] SendRowDescriptionMessage 0.71% postgres postgres [.] expandRelAttrs 0.64% postgres [kernel.kallsyms] [k] get_page_from_freelist 0.64% postgres [kernel.kallsyms] [k] copy_user_enhanced_fast_string 0.62% postgres postgres [.] pg_server_to_client 0.59% postgres libc-2.17.so [.] __memset_sse2 0.58% postgres postgres [.] set_pathtarget_cost_width 0.56% postgres postgres [.] OutputFunctionCall 0.56% postgres postgres [.] memcpy@plt 0.54% postgres postgres [.] makeTargetEntry 0.50% postgres postgres [.] SearchCatCache3 -- Normal case for select * from normal; 12.57% postgres postgres [.] AllocSetAlloc 10.50% postgres libc-2.17.so [.] __strlen_sse2_pminub 7.65% postgres postgres [.] slot_deform_tuple 6.52% postgres postgres [.] pq_sendcountedtext 6.03% postgres postgres [.] enlargeStringInfo 4.51% postgres postgres [.] printtup 4.35% postgres postgres [.] palloc 4.34% postgres libc-2.17.so [.] __memcpy_ssse3_back 3.90% postgres postgres [.] pg_ltoa 3.49% postgres [kernel.kallsyms] [k] __lock_text_start 3.35% postgres postgres [.] int4out 3.31% postgres postgres [.] FunctionCall1Coll 2.82% postgres postgres [.] appendBinaryStringInfoNT 1.68% postgres postgres [.] pg_server_to_any 1.30% postgres postgres [.] SearchCatCache3 1.01% postgres postgres [.] SendRowDescriptionMessage 0.89% postgres postgres [.] lappend 0.88% postgres postgres [.] MemoryContextAllocZeroAligned 0.79% postgres postgres [.] expression_tree_walker 0.67% postgres postgres [.] SearchCatCache1 0.67% postgres postgres [.] expandRelAttrs 0.64% postgres [kernel.kallsyms] [k] copy_user_enhanced_fast_string 0.60% postgres postgres [.] pg_server_to_client 0.57% postgres [kernel.kallsyms] [k] __do_page_fault 0.56% postgres postgres [.] OutputFunctionCall 0.53% postgres postgres [.] memcpy@plt 0.53% postgres postgres [.] makeTargetEntry 0.51% postgres [kernel.kallsyms] [k] get_page_from_freelist The difference appears to be in ExecInterpExpr, although, that only accounts for about 5.5%, and we're missing 15% in performance. Going by the commitfest app, the patch still does appear to be waiting on Author. Never-the-less, I've made another pass over it and found a few mistakes and a couple of ways to improve things: 1. Should <structname> not be <structfield>? <entry> This column has a value which is used where the column is entirely missing from the row, as happens when a column is added after the row is created. The actual value used is stored in the <structname>attmissingval</structname> column. </entry> Also additional surplus space before "column". The same mixup appears in: <structname>atthasmissing</structname> is true. If there is no value 2. Instead of slot_getmissingattrs() having a memset() to set all the missing attributes to NULL before resetting them to the missing value, if one is found. Would it not be better to just have the missing array store both the NULLs and the DEFAULTs together? With that, the code could become: static void slot_getmissingattrs(TupleTableSlot *slot, int startAttNum, int lastAttNum) { AttrMissing *attrmiss; int attnum; if (slot->tts_tupleDescriptor->constr && slot->tts_tupleDescriptor->constr->missing) attrmiss = slot->tts_tupleDescriptor->constr->missing; else return; for (attnum = startAttNum; attnum < lastAttNum; attnum++) { slot->tts_values[attnum] = attrmiss[attnum].ammissing; slot->tts_isnull[attnum] = false; } } You may even be able to just: Assert(slot->tts_tupleDescriptor->constr); Assert(slot->tts_tupleDescriptor->constr->missing); instead of the run-time check. This would also simplify expand_tuple() and getmissingattr() I think this change makes sense as slot_getmissingattrs() is getting NULLs and the missing values, not just the missing values as the name indicates. The name would become more fitting if you class a NULL as a missing value. 3. Not sure what amnum means. Is it meant to be adnum? * the source (i.e. where its amnum is > sourceNatts). We can ignore 4. In StoreAttrDefault(), is there any point in creating the executor state when add_column_mode == false? It's probably also worth commenting what add_column_mode does. If any caller mistakenly passes it as 'true' then you'll overwrite the original missing value which would appear to update tuples that already exist in the table. It also looks like you could move the bulk of the new code in that function, including the new variables into where you're modifying the pg_attribute tuple. This function also incorrectly sets atthasmissing to true even when the DEFAULT expression evaluates to NULL. This seems to contradict what you're doing in RelationBuildTupleDesc(), which skips building the missing array unless it sees a non-null attmissingval. This will cause getmissingattr() to Assert fail with the following case: drop table if exists t1; create table t1 (a int); insert into t1 values(1); alter table t1 add column b text default 'test' || null; alter table t1 add column c text null; select attname,atthasmissing,attmissingval from pg_attribute where attrelid = 't1'::regclass and attnum>0; select b from t1; TRAP: FailedAssertion("!(tupleDesc->constr->missing)", File: "src/backend/access/common/heaptuple.c", Line: 98) Probably another good reason to do the refactor work mentioned in #2. It might be worth adding a regression test to ensure atthasmissing remains false for a column where the default expression evaluated to NULL; 5. It's not really required, but in AddRelationNewConstraints() you could get away with only performing the volatile function check if missingMode is true. That would save calling contain_volatile_functions() needlessly. 6. The comment above RelationClearMissing() should probably mention that the caller must hold an AccessExclusiveLock on rel. There is also a surplus newline between the header comment and the function. 7. Can you pg_ident the patch? There's quite a few places where the whitespace is messed up, and also quite a few places where you've changed lines so they're longer than 80 chars. 8. In RelationBuildTupleDesc() it looks like attnum can be declared in the scope of the while loop. An updated patch should probably also included the patch I sent to disable the physical tlists when there are missing values mentioned in the TupleDesc. Although I admit to not having properly updated the comments even in my v2 attempt. * NIL. Ideally we would like to handle the dropped-column case too. However should read: * NIL. Ideally, we would like to handle these cases too. However Thanks again for working on this feature. I hope we can get this into PG11. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Mar 12, 2018 at 1:29 AM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 9 March 2018 at 02:11, David Rowley <david.rowley@2ndquadrant.com> wrote: >> On 8 March 2018 at 18:40, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: >>> select * from t; >>> fastdef tps = 107.145811 >>> master tps = 150.207957 >>> >>> "select * from t" used to be about a wash, but with this patch it's >>> got worse. The last two queries were worse and are now better, so >>> that's a win. >> >> How does it compare to master if you drop a column out the table? >> Physical tlists will be disabled in that case too. I imagine the >> performance of master will drop much lower than the all columns >> missing case. > > I decided to test this for myself, and the missing version is still > slightly slower than the dropped column version, but not by much. I'm > not personally concerned about this. > > The following results are with 1000 column tables with 64 rows each. I've done some more extensive benchmarking now. Here are some fairly typical results from pgbench runs done on standard scale 100 pgbench data: [andrew@foo tests]$ PATH=$HOME/pg_fast_def/root/HEAD/inst/bin:$PATH pgbench -S -c 10 -j 5 -T 60 test starting vacuum...end. transaction type: <builtin: select only> scaling factor: 100 query mode: simple number of clients: 10 number of threads: 5 duration: 60 s number of transactions actually processed: 2235601 latency average = 0.268 ms tps = 37256.886332 (including connections establishing) tps = 37258.562925 (excluding connections establishing) [andrew@foo tests]$ PATH=$HOME/pg_head/root/HEAD/inst/bin:$PATH pgbench -S -c 10 -j 5 -T 60 test starting vacuum...end. transaction type: <builtin: select only> scaling factor: 100 query mode: simple number of clients: 10 number of threads: 5 duration: 60 s number of transactions actually processed: 2230085 latency average = 0.269 ms tps = 37164.696271 (including connections establishing) tps = 37166.647971 (excluding connections establishing) So generally the patched code and master are pretty much on a par. I have also done some testing on cases meant to stress-test the feature a bit - two 1000 column, all columns having defaults, one with a dropped column. For the fast_default case I then also copied the tables (and again dropped a column) so that the data files and table definitions would match fairly closely what was being tested in the master branch. The scripts in the attached tests.tgz. The test platform is an Amazon r4.2xlarge instance running RHEL7. There are two sets of results attached, one for 64 row tables and one for 50k row tables. The 50k row results are fairly unambiguous, the patched code performs as well as or better (in some cases spectacularly better) than master. In a few cases the patched code performs slightly worse than master in the last (fdnmiss) case with the copied tables. > > Going by the commitfest app, the patch still does appear to be waiting > on Author. Never-the-less, I've made another pass over it and found a > few mistakes and a couple of ways to improve things: > working on these. Should have a new patch tomorrow. > Thanks again for working on this feature. I hope we can get this into PG11. > Thanks for you help. I hope so too. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Tue, Mar 13, 2018 at 2:40 PM, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: >> >> Going by the commitfest app, the patch still does appear to be waiting >> on Author. Never-the-less, I've made another pass over it and found a >> few mistakes and a couple of ways to improve things: >> > > working on these. Should have a new patch tomorrow. > Here is a patch that attends to most of these, except that I haven't re-indented it. Your proposed changes to slot_getmissingattrs() wouldn't work, but I have rewritten it in a way that avoids the double work you disliked. I'll rerun the benchmark tests I posted upthread and let you know the results. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Tue, Mar 13, 2018 at 10:58 PM, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > On Tue, Mar 13, 2018 at 2:40 PM, Andrew Dunstan > <andrew.dunstan@2ndquadrant.com> wrote: > >>> >>> Going by the commitfest app, the patch still does appear to be waiting >>> on Author. Never-the-less, I've made another pass over it and found a >>> few mistakes and a couple of ways to improve things: >>> >> >> working on these. Should have a new patch tomorrow. >> > > > Here is a patch that attends to most of these, except that I haven't > re-indented it. > > Your proposed changes to slot_getmissingattrs() wouldn't work, but I > have rewritten it in a way that avoids the double work you disliked. > > I'll rerun the benchmark tests I posted upthread and let you know the results. > Here are the benchmark results from the v15 patch. Fairly similar to previous results. I'm going to run some profiling again to see if I can identify any glaring hotspots. I do suspect that the "physical tlist" optimization sometimes turns out not to be one. It seems perverse to be able to improve a query's performance by dropping a column. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 14 March 2018 at 11:36, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > Here are the benchmark results from the v15 patch. Fairly similar to > previous results. I'm going to run some profiling again to see if I > can identify any glaring hotspots. I do suspect that the "physical > tlist" optimization sometimes turns out not to be one. It seems > perverse to be able to improve a query's performance by dropping a > column. Can you explain what "fdnmiss" is that appears in the results? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, On 2018-03-14 09:06:54 +1030, Andrew Dunstan wrote: > I do suspect that the "physical tlist" optimization sometimes turns > out not to be one. It seems perverse to be able to improve a query's > performance by dropping a column. Yea, it's definitely not always a boon. Especially w/ the newer v10+ project code. I suspect we should largely get rid of it in v12, which presumably will see a storage layer abstraction... It'll be even more worthwhile to get rid of it if we manage to avoid deforming columns that aren't needed but are lower than the currently required columns. I still think we should build bitmasks of required columns during planning. Greetings, Andres Freund
> On Mar 14, 2018, at 10:58 AM, David Rowley <david.rowley@2ndquadrant.com> wrote: > > On 14 March 2018 at 11:36, Andrew Dunstan > <andrew.dunstan@2ndquadrant.com> wrote: >> Here are the benchmark results from the v15 patch. Fairly similar to >> previous results. I'm going to run some profiling again to see if I >> can identify any glaring hotspots. I do suspect that the "physical >> tlist" optimization sometimes turns out not to be one. It seems >> perverse to be able to improve a query's performance by dropping a >> column. > > Can you explain what "fdnmiss" is that appears in the results? > > It’s the patched code run against a materialized version of the table, i.e. one with no missing attributes. Cheers Andrew
On Tue, Mar 13, 2018 at 10:58 PM, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > On Tue, Mar 13, 2018 at 2:40 PM, Andrew Dunstan > <andrew.dunstan@2ndquadrant.com> wrote: > >>> >>> Going by the commitfest app, the patch still does appear to be waiting >>> on Author. Never-the-less, I've made another pass over it and found a >>> few mistakes and a couple of ways to improve things: >>> >> >> working on these. Should have a new patch tomorrow. >> > > > Here is a patch that attends to most of these, except that I haven't > re-indented it. > > Your proposed changes to slot_getmissingattrs() wouldn't work, but I > have rewritten it in a way that avoids the double work you disliked. > rebased and mostly indented patch version attached. It's actually moderately difficult to produce a nicely indented patch that is against non-pgindented code. We should look at that a bit. Another item for my TODO list. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 3/15/18 4:33 AM, Andrew Dunstan wrote: > > rebased and mostly indented patch version attached. It's actually > moderately difficult to produce a nicely indented patch that is > against non-pgindented code. We should look at that a bit. Another > item for my TODO list. It looks like this should be marked Needs Review so I have done so. If that's not right please change it back or let me know and I will. Regards, -- -David david@pgmasters.net
On 15 March 2018 at 21:33, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > rebased and mostly indented patch version attached. Thanks. I've attached a version of this which applies, builds and passes the regression tests on current master. Some conflicts were caused by 325f2ec555 and there was a new call to heap_attisnull which needed to be updated. I'll look over this now. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 25 March 2018 at 20:09, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 15 March 2018 at 21:33, Andrew Dunstan > <andrew.dunstan@2ndquadrant.com> wrote: >> rebased and mostly indented patch version attached. > > Thanks. I've attached a version of this which applies, builds and > passes the regression tests on current master. > > Some conflicts were caused by 325f2ec555 and there was a new call to > heap_attisnull which needed to be updated. > > I'll look over this now. I've attached a delta patch against the v17 patch that I attached earlier. I didn't change much, but there did seem to be a few places where the patch was not properly setting atthasmissing to false. Most of the rest is just cosmetic stuff With the attached applied, I'm happy to mark the patch as ready for committer, however, Petr is also signed up to review, so will defer to him to see if he has any comments before altering the commitfest app's state. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Sun, Mar 25, 2018 at 9:13 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 25 March 2018 at 20:09, David Rowley <david.rowley@2ndquadrant.com> wrote: >> On 15 March 2018 at 21:33, Andrew Dunstan >> <andrew.dunstan@2ndquadrant.com> wrote: >>> rebased and mostly indented patch version attached. >> >> Thanks. I've attached a version of this which applies, builds and >> passes the regression tests on current master. >> >> Some conflicts were caused by 325f2ec555 and there was a new call to >> heap_attisnull which needed to be updated. >> >> I'll look over this now. > > I've attached a delta patch against the v17 patch that I attached > earlier. I didn't change much, but there did seem to be a few places > where the patch was not properly setting atthasmissing to false. Most > of the rest is just cosmetic stuff > > With the attached applied, I'm happy to mark the patch as ready for > committer, however, Petr is also signed up to review, so will defer to > him to see if he has any comments before altering the commitfest app's > state. Thanks for this, all looks good. Here is the consolidate patch rebased. If there are no further comments I propose to commit this in a few days time. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 25 March 2018 at 23:43, David Rowley <david.rowley@2ndquadrant.com> wrote: > With the attached applied, I'm happy to mark the patch as ready for > committer, however, Petr is also signed up to review, so will defer to > him to see if he has any comments before altering the commitfest app's > state. Petr won't have time to look so I'll adjust the commitfest app's state now. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, On 2018-03-26 09:02:10 +1030, Andrew Dunstan wrote: > Thanks for this, all looks good. Here is the consolidate patch > rebased. If there are no further comments I propose to commit this in > a few days time. Here's a bit of post commit review: @@ -1310,13 +1679,11 @@ slot_getsomeattrs(TupleTableSlot *slot, int attnum) /* * If tuple doesn't have all the atts indicated by tupleDesc, read the - * rest as null + * rest as NULLs or missing values */ - for (; attno < attnum; attno++) - { - slot->tts_values[attno] = (Datum) 0; - slot->tts_isnull[attno] = true; - } + if (attno < attnum) + slot_getmissingattrs(slot, attno, attnum); + slot->tts_nvalid = attnum; } It's worthwhile to note that this'll re-process *all* missing values, even if they've already been deformed. I.e. if slot_getmissingattrs(.., first-missing) slot_getmissingattrs(.., first-missing + 1) slot_getmissingattrs(.., first-missing + 2) is called, all three missing values will be copied every time. That's because tts_nvalid isn't taken into account. I wonder if slot_getmissingattrs could take tts_nvalid into account? I also wonder if this doesn't deserve an unlikely(), to avoid the cost of spilling registers in the hot branch of not missing any values. + else + { + /* if there is a missing values array we must process them one by one */ + for (missattnum = lastAttNum - 1; + missattnum >= startAttNum; + missattnum--) + { + slot->tts_values[missattnum] = attrmiss[missattnum].ammissing; + slot->tts_isnull[missattnum] = + !attrmiss[missattnum].ammissingPresent; + } + } +} Why is this done backwards? It's noticeably slower to walk arrays backwards on some CPU microarchitectures. @@ -176,6 +179,23 @@ CreateTupleDescCopyConstr(TupleDesc tupdesc) } } @@ -469,6 +503,29 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2) if (strcmp(defval1->adbin, defval2->adbin) != 0) return false; } + if (constr1->missing) + { + if (!constr2->missing) + return false; + for (i = 0; i < tupdesc1->natts; i++) + { + AttrMissing *missval1 = constr1->missing + i; + AttrMissing *missval2 = constr2->missing + i; It's a bit odd to not use array indexing here? @@ -625,7 +625,15 @@ ExecCopySlotMinimalTuple(TupleTableSlot *slot) if (slot->tts_mintuple) return heap_copy_minimal_tuple(slot->tts_mintuple); if (slot->tts_tuple) - return minimal_tuple_from_heap_tuple(slot->tts_tuple); + { + if (TTS_HAS_PHYSICAL_TUPLE(slot) && + HeapTupleHeaderGetNatts(slot->tts_tuple->t_data) + < slot->tts_tupleDescriptor->natts) + return minimal_expand_tuple(slot->tts_tuple, + slot->tts_tupleDescriptor); + else + return minimal_tuple_from_heap_tuple(slot->tts_tuple); + } What's the TTS_HAS_PHYSICAL_TUPLE doing here? How can this be relevant given the previous tts_mintuple check? Am I missing something? @@ -563,10 +569,63 @@ RelationBuildTupleDesc(Relation relation) MemoryContextAllocZero(CacheMemoryContext, relation->rd_rel->relnatts * sizeof(AttrDefault)); - attrdef[ndef].adnum = attp->attnum; + attrdef[ndef].adnum = attnum; attrdef[ndef].adbin = NULL; + ndef++; } + + /* Likewise for a missing value */ + if (attp->atthasmissing) + { + Datum missingval; + bool missingNull; + + /* Do we have a missing value? */ + missingval = heap_getattr(pg_attribute_tuple, + Anum_pg_attribute_attmissingval, + pg_attribute_desc->rd_att, + &missingNull); + if (!missingNull) + { + /* Yes, fetch from the array */ + MemoryContext oldcxt; + bool is_null; + int one = 1; + Datum missval; + + if (attrmiss == NULL) + attrmiss = (AttrMissing *) + MemoryContextAllocZero(CacheMemoryContext, + relation->rd_rel->relnatts * + sizeof(AttrMissing)); + + missval = array_get_element(missingval, + 1, + &one, + -1, + attp->attlen, + attp->attbyval, + attp->attalign, + &is_null); + Assert(!is_null); + if (attp->attbyval) + { + /* for copy by val just copy the datum direct */ + attrmiss[attnum - 1].ammissing = missval; + } + else + { + /* otherwise copy in the correct context */ + oldcxt = MemoryContextSwitchTo(CacheMemoryContext); + attrmiss[attnum - 1].ammissing = datumCopy(missval, + attp->attbyval, + attp->attlen); + MemoryContextSwitchTo(oldcxt); + } + attrmiss[attnum - 1].ammissingPresent = true; + } + } need--; if (need == 0) break; I'm still strongly object to do doing this unconditionally for queries that never need any of this. We're can end up with a significant number of large tuples in memory here, and then copy that through dozens of tupledesc copies in queries. @@ -167,6 +170,12 @@ CATALOG(pg_attribute,1249) BKI_BOOTSTRAP BKI_WITHOUT_OIDS BKI_ROWTYPE_OID(75) BK /* Column-level FDW options */ text attfdwoptions[1] BKI_DEFAULT(_null_); + + /* + * Missing value for added columns. This is a one element array which lets + * us store a value of the attribute type here. + */ + anyarray attmissingval BKI_DEFAULT(_null_); #endif } FormData_pg_attribute; Still think this is a bad location, and it'll reduce cache hit ratio for catalog lookups. Greetings, Andres Freund
On Wed, Mar 28, 2018 at 5:30 PM, Andres Freund <andres@anarazel.de> wrote: > Hi, > > On 2018-03-26 09:02:10 +1030, Andrew Dunstan wrote: >> Thanks for this, all looks good. Here is the consolidate patch >> rebased. If there are no further comments I propose to commit this in >> a few days time. > > Here's a bit of post commit review: > > @@ -1310,13 +1679,11 @@ slot_getsomeattrs(TupleTableSlot *slot, int attnum) > > /* > * If tuple doesn't have all the atts indicated by tupleDesc, read the > - * rest as null > + * rest as NULLs or missing values > */ > - for (; attno < attnum; attno++) > - { > - slot->tts_values[attno] = (Datum) 0; > - slot->tts_isnull[attno] = true; > - } > + if (attno < attnum) > + slot_getmissingattrs(slot, attno, attnum); > + > slot->tts_nvalid = attnum; > } > > It's worthwhile to note that this'll re-process *all* missing values, > even if they've already been deformed. I.e. if > slot_getmissingattrs(.., first-missing) > slot_getmissingattrs(.., first-missing + 1) > slot_getmissingattrs(.., first-missing + 2) > is called, all three missing values will be copied every time. That's > because tts_nvalid isn't taken into account. I wonder if slot_getmissingattrs > could take tts_nvalid into account? > > I also wonder if this doesn't deserve an unlikely(), to avoid the cost > of spilling registers in the hot branch of not missing any values. > > One of us at least is very confused about this function. slot_getmissingattrs() gets called at most once by slot_getsomeattrs and never for any attribute that's already been deformed. > + else > + { > + /* if there is a missing values array we must process them one by one */ > + for (missattnum = lastAttNum - 1; > + missattnum >= startAttNum; > + missattnum--) > + { > + slot->tts_values[missattnum] = attrmiss[missattnum].ammissing; > + slot->tts_isnull[missattnum] = > + !attrmiss[missattnum].ammissingPresent; > + } > + } > +} > > Why is this done backwards? It's noticeably slower to walk arrays > backwards on some CPU microarchitectures. > > I'll change it. > > @@ -176,6 +179,23 @@ CreateTupleDescCopyConstr(TupleDesc tupdesc) > } > } > > > > @@ -469,6 +503,29 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2) > if (strcmp(defval1->adbin, defval2->adbin) != 0) > return false; > } > + if (constr1->missing) > + { > + if (!constr2->missing) > + return false; > + for (i = 0; i < tupdesc1->natts; i++) > + { > + AttrMissing *missval1 = constr1->missing + i; > + AttrMissing *missval2 = constr2->missing + i; > > It's a bit odd to not use array indexing here? > *shrug* Maybe. I'll change it if you like, doesn't seem that important or odd. > > @@ -625,7 +625,15 @@ ExecCopySlotMinimalTuple(TupleTableSlot *slot) > if (slot->tts_mintuple) > return heap_copy_minimal_tuple(slot->tts_mintuple); > if (slot->tts_tuple) > - return minimal_tuple_from_heap_tuple(slot->tts_tuple); > + { > + if (TTS_HAS_PHYSICAL_TUPLE(slot) && > + HeapTupleHeaderGetNatts(slot->tts_tuple->t_data) > + < slot->tts_tupleDescriptor->natts) > + return minimal_expand_tuple(slot->tts_tuple, > + slot->tts_tupleDescriptor); > + else > + return minimal_tuple_from_heap_tuple(slot->tts_tuple); > + } > > > What's the TTS_HAS_PHYSICAL_TUPLE doing here? How can this be relevant > given the previous tts_mintuple check? Am I missing something? > > Hmm, that dates back to the original patch. My bad, I should have picked that up. I'll just remove it. > > @@ -563,10 +569,63 @@ RelationBuildTupleDesc(Relation relation) > MemoryContextAllocZero(CacheMemoryContext, > relation->rd_rel->relnatts * > sizeof(AttrDefault)); > - attrdef[ndef].adnum = attp->attnum; > + attrdef[ndef].adnum = attnum; > attrdef[ndef].adbin = NULL; > + > ndef++; > } > + > + /* Likewise for a missing value */ > + if (attp->atthasmissing) > + { > + Datum missingval; > + bool missingNull; > + > + /* Do we have a missing value? */ > + missingval = heap_getattr(pg_attribute_tuple, > + Anum_pg_attribute_attmissingval, > + pg_attribute_desc->rd_att, > + &missingNull); > + if (!missingNull) > + { > + /* Yes, fetch from the array */ > + MemoryContext oldcxt; > + bool is_null; > + int one = 1; > + Datum missval; > + > + if (attrmiss == NULL) > + attrmiss = (AttrMissing *) > + MemoryContextAllocZero(CacheMemoryContext, > + relation->rd_rel->relnatts * > + sizeof(AttrMissing)); > + > + missval = array_get_element(missingval, > + 1, > + &one, > + -1, > + attp->attlen, > + attp->attbyval, > + attp->attalign, > + &is_null); > + Assert(!is_null); > + if (attp->attbyval) > + { > + /* for copy by val just copy the datum direct */ > + attrmiss[attnum - 1].ammissing = missval; > + } > + else > + { > + /* otherwise copy in the correct context */ > + oldcxt = MemoryContextSwitchTo(CacheMemoryContext); > + attrmiss[attnum - 1].ammissing = datumCopy(missval, > + attp->attbyval, > + attp->attlen); > + MemoryContextSwitchTo(oldcxt); > + } > + attrmiss[attnum - 1].ammissingPresent = true; > + } > + } > need--; > if (need == 0) > break; > > I'm still strongly object to do doing this unconditionally for queries > that never need any of this. We're can end up with a significant number > of large tuples in memory here, and then copy that through dozens of > tupledesc copies in queries. > > We're just doing the same thing we do for default values. None of the tests I did with large numbers of missing values seemed to show performance impacts of the kind you describe. Now, none of the queries were particularly complex, but the worst case was from actually using very large numbers of attributes with missing values, where we'd need this data anyway. If we just selected a few attributes performance went up rather than down. > @@ -167,6 +170,12 @@ CATALOG(pg_attribute,1249) BKI_BOOTSTRAP BKI_WITHOUT_OIDS BKI_ROWTYPE_OID(75) BK > > /* Column-level FDW options */ > text attfdwoptions[1] BKI_DEFAULT(_null_); > + > + /* > + * Missing value for added columns. This is a one element array which lets > + * us store a value of the attribute type here. > + */ > + anyarray attmissingval BKI_DEFAULT(_null_); > #endif > } FormData_pg_attribute; > > > Still think this is a bad location, and it'll reduce cache hit ratio for > catalog lookups. > As I think I mentioned before, this was discussed previously and as I understood it this was the consensus location for it. pg_attrdef isn't really a good place for it (what if they drop the default?). So the only alternative then would be a completely new catalog. I'd need a deal of convincing that that was justified. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > On Wed, Mar 28, 2018 at 5:30 PM, Andres Freund <andres@anarazel.de> wrote: >> + /* >> + * Missing value for added columns. This is a one element array which lets >> + * us store a value of the attribute type here. >> + */ >> + anyarray attmissingval BKI_DEFAULT(_null_); >> #endif >> } FormData_pg_attribute; >> >> Still think this is a bad location, and it'll reduce cache hit ratio for >> catalog lookups. > As I think I mentioned before, this was discussed previously and as I > understood it this was the consensus location for it. I don't have a problem with putting that in pg_attribute (and I certainly agree with not putting it in pg_attrdef). But "anyarray" seems like a damn strange, and bulky, choice. Why not just make it a bytea holding the bits for the value, nothing more? regards, tom lane
On Thu, Mar 29, 2018 at 10:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> On Wed, Mar 28, 2018 at 5:30 PM, Andres Freund <andres@anarazel.de> wrote: >>> + /* >>> + * Missing value for added columns. This is a one element array which lets >>> + * us store a value of the attribute type here. >>> + */ >>> + anyarray attmissingval BKI_DEFAULT(_null_); >>> #endif >>> } FormData_pg_attribute; >>> >>> Still think this is a bad location, and it'll reduce cache hit ratio for >>> catalog lookups. > >> As I think I mentioned before, this was discussed previously and as I >> understood it this was the consensus location for it. > > I don't have a problem with putting that in pg_attribute (and I certainly > agree with not putting it in pg_attrdef). But "anyarray" seems like a > damn strange, and bulky, choice. Why not just make it a bytea holding the > bits for the value, nothing more? > That's what we started with and Andres suggested we change it. One virtue of the array is that is displays nicely when examining pg_attribute. But changing it back would be easy enough. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Mar 26, 2018 at 9:32 AM, Andrew Dunstan <andrew.dunstan@2ndquadrant. com> wrote:
Thanks for this, all looks good. Here is the consolidate patch
rebased. If there are no further comments I propose to commit this in
a few days time.
I have some comments with the committed patch.
@@ -663,7 +671,23 @@ ExecFetchSlotTuple( TupleTableSlot *slot)
* If we have a regular physical tuple then just return it.
*/
if (TTS_HAS_PHYSICAL_TUPLE(slot))
- return slot->tts_tuple;
+ {
+ if (HeapTupleHeaderGetNatts(slot- >tts_tuple->t_data) <
+ slot->tts_tupleDescriptor-> natts)
+ {
+ MemoryContext oldContext = MemoryContextSwitchTo(slot-> tts_mcxt);
+
+ slot->tts_tuple = heap_expand_tuple(slot->tts_ tuple,
+ slot->tts_tupleDescriptor);
+ slot->tts_shouldFree = true;
+ MemoryContextSwitchTo( oldContext);
+ return slot->tts_tuple;
+ }
+ else
+ {
+ return slot->tts_tuple;
+ }
+ }
In the above scenario, directly replacing the slot->tts_tuple without freeing the exisitng
tuple will unnecessarily increase the slot context memory size, this may lead to a problem
if the same slot is used for many tuples. Better to use ExecStoreTuple() function to update
the slot tuple.
Regards,
Hari Babu
Fujitsu Australia
Hi, On 2018-03-29 10:16:23 +1030, Andrew Dunstan wrote: > On Wed, Mar 28, 2018 at 5:30 PM, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > > > On 2018-03-26 09:02:10 +1030, Andrew Dunstan wrote: > >> Thanks for this, all looks good. Here is the consolidate patch > >> rebased. If there are no further comments I propose to commit this in > >> a few days time. > > > > Here's a bit of post commit review: > > > > @@ -1310,13 +1679,11 @@ slot_getsomeattrs(TupleTableSlot *slot, int attnum) > > > > /* > > * If tuple doesn't have all the atts indicated by tupleDesc, read the > > - * rest as null > > + * rest as NULLs or missing values > > */ > > - for (; attno < attnum; attno++) > > - { > > - slot->tts_values[attno] = (Datum) 0; > > - slot->tts_isnull[attno] = true; > > - } > > + if (attno < attnum) > > + slot_getmissingattrs(slot, attno, attnum); > > + > > slot->tts_nvalid = attnum; > > } > > > > It's worthwhile to note that this'll re-process *all* missing values, > > even if they've already been deformed. I.e. if > > slot_getmissingattrs(.., first-missing) > > slot_getmissingattrs(.., first-missing + 1) > > slot_getmissingattrs(.., first-missing + 2) > > is called, all three missing values will be copied every time. That's > > because tts_nvalid isn't taken into account. I wonder if slot_getmissingattrs > > could take tts_nvalid into account? > > > > I also wonder if this doesn't deserve an unlikely(), to avoid the cost > > of spilling registers in the hot branch of not missing any values. > One of us at least is very confused about this function. > slot_getmissingattrs() gets called at most once by slot_getsomeattrs > and never for any attribute that's already been deformed. Imagine the same slot being used in two different expressions. The typical case for that is e.g. something like SELECT a,b,c,d FROM foo WHERE b = 1; this'll trigger one slot_getsomeattrs(slot, 2) call from within qual evaluation, and then a slot_getsomeattrs(slot, 4) from within the projection code. If you then imagine a tuple where only the first column exists physically, we'd copy b twice, because attno is only going to be one 1. I think we might just want to check tts nvalid in getmissingattrs? > > I'm still strongly object to do doing this unconditionally for queries > > that never need any of this. We're can end up with a significant number > > of large tuples in memory here, and then copy that through dozens of > > tupledesc copies in queries. > We're just doing the same thing we do for default values. That's really not a defense. It's hurting us there too. > None of the tests I did with large numbers of missing values seemed to > show performance impacts of the kind you describe. Now, none of the > queries were particularly complex, but the worst case was from > actually using very large numbers of attributes with missing values, > where we'd need this data anyway. If we just selected a few attributes > performance went up rather than down. Now imagine a partitioned workload with a few thousand partitions over a few tables. The additional memory is going to start being noticeable. > pg_attrdef isn't really a good place for it (what if they drop the > default?). So the only alternative then would be a completely new > catalog. I'd need a deal of convincing that that was justified. There's plenty databases with pg_attribute being many gigabytes large, and this is going to make that even worse. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > There's plenty databases with pg_attribute being many gigabytes large, > and this is going to make that even worse. Only if you imagine that a sizable fraction of the columns have fast default values, which seems somewhat unlikely. regards, tom lane
Hi, On 2018-03-29 17:27:47 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > There's plenty databases with pg_attribute being many gigabytes large, > > and this is going to make that even worse. > > Only if you imagine that a sizable fraction of the columns have fast > default values, which seems somewhat unlikely. Why is that unlikely? In the field it's definitely not uncommon to define default values for just about every column. And in a lot of cases that'll mean we'll end up with pg_attribute containing default values for most columns but the ones defined at table creation. A lot of frameworks make it a habit to add columns near exclusively in incremental steps. You'd only get rid of them if you force an operation that does a full table rewrite, which often enough is impractical. Greetings, Andres Freund
On 03/29/2018 11:31 PM, Andres Freund wrote: > Hi, > > On 2018-03-29 17:27:47 -0400, Tom Lane wrote: >> Andres Freund <andres@anarazel.de> writes: >>> There's plenty databases with pg_attribute being many gigabytes large, >>> and this is going to make that even worse. >> >> Only if you imagine that a sizable fraction of the columns have fast >> default values, which seems somewhat unlikely. > > Why is that unlikely? In the field it's definitely not uncommon to > define default values for just about every column. And in a lot of cases > that'll mean we'll end up with pg_attribute containing default values > for most columns but the ones defined at table creation. A lot of > frameworks make it a habit to add columns near exclusively in > incremental steps. You'd only get rid of them if you force an operation > that does a full table rewrite, which often enough is impractical. > I don't quite see how moving that gets solved by moving the info into a different catalog? We would need to fetch it whenever attribute meta-data from pg_attribute are loaded. cheers -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2018-03-30 00:28:43 +0200, Tomas Vondra wrote: > > Why is that unlikely? In the field it's definitely not uncommon to > > define default values for just about every column. And in a lot of cases > > that'll mean we'll end up with pg_attribute containing default values > > for most columns but the ones defined at table creation. A lot of > > frameworks make it a habit to add columns near exclusively in > > incremental steps. You'd only get rid of them if you force an operation > > that does a full table rewrite, which often enough is impractical. > > > > I don't quite see how moving that gets solved by moving the info into a > different catalog? We would need to fetch it whenever attribute > meta-data from pg_attribute are loaded. I'm also complaining nearby that the "stored default" values should only be loaded on demand. We very regularly build relcache entries that'll never have their default values queried. Even moreso with partitioned tables. Greetings, Andres Freund
On Fri, Mar 30, 2018 at 5:00 AM, Andres Freund <andres@anarazel.de> wrote: > Hi, > > On 2018-03-29 10:16:23 +1030, Andrew Dunstan wrote: >> On Wed, Mar 28, 2018 at 5:30 PM, Andres Freund <andres@anarazel.de> wrote: >> > Hi, >> > >> > On 2018-03-26 09:02:10 +1030, Andrew Dunstan wrote: >> >> Thanks for this, all looks good. Here is the consolidate patch >> >> rebased. If there are no further comments I propose to commit this in >> >> a few days time. >> > >> > Here's a bit of post commit review: >> > >> > @@ -1310,13 +1679,11 @@ slot_getsomeattrs(TupleTableSlot *slot, int attnum) >> > >> > /* >> > * If tuple doesn't have all the atts indicated by tupleDesc, read the >> > - * rest as null >> > + * rest as NULLs or missing values >> > */ >> > - for (; attno < attnum; attno++) >> > - { >> > - slot->tts_values[attno] = (Datum) 0; >> > - slot->tts_isnull[attno] = true; >> > - } >> > + if (attno < attnum) >> > + slot_getmissingattrs(slot, attno, attnum); >> > + >> > slot->tts_nvalid = attnum; >> > } >> > >> > It's worthwhile to note that this'll re-process *all* missing values, >> > even if they've already been deformed. I.e. if >> > slot_getmissingattrs(.., first-missing) >> > slot_getmissingattrs(.., first-missing + 1) >> > slot_getmissingattrs(.., first-missing + 2) >> > is called, all three missing values will be copied every time. That's >> > because tts_nvalid isn't taken into account. I wonder if slot_getmissingattrs >> > could take tts_nvalid into account? >> > >> > I also wonder if this doesn't deserve an unlikely(), to avoid the cost >> > of spilling registers in the hot branch of not missing any values. > >> One of us at least is very confused about this function. >> slot_getmissingattrs() gets called at most once by slot_getsomeattrs >> and never for any attribute that's already been deformed. > > Imagine the same slot being used in two different expressions. The > typical case for that is e.g. something like > SELECT a,b,c,d FROM foo WHERE b = 1; > > this'll trigger one slot_getsomeattrs(slot, 2) call from within qual > evaluation, and then a slot_getsomeattrs(slot, 4) from within the > projection code. If you then imagine a tuple where only the first > column exists physically, we'd copy b twice, because attno is only going > to be one 1. I think we might just want to check tts nvalid in > getmissingattrs? > > OK. so add something like this to the top of slot_getmissingattrs()? startAttNum = Max(startAttNum, slot->tts_nvalid); >> > I'm still strongly object to do doing this unconditionally for queries >> > that never need any of this. We're can end up with a significant number >> > of large tuples in memory here, and then copy that through dozens of >> > tupledesc copies in queries. > >> We're just doing the same thing we do for default values. > > That's really not a defense. It's hurting us there too. > I will take a look and see if it can be avoided easily. > >> None of the tests I did with large numbers of missing values seemed to >> show performance impacts of the kind you describe. Now, none of the >> queries were particularly complex, but the worst case was from >> actually using very large numbers of attributes with missing values, >> where we'd need this data anyway. If we just selected a few attributes >> performance went up rather than down. > > Now imagine a partitioned workload with a few thousand partitions over a > few tables. The additional memory is going to start being noticeable. > > >> pg_attrdef isn't really a good place for it (what if they drop the >> default?). So the only alternative then would be a completely new >> catalog. I'd need a deal of convincing that that was justified. > > There's plenty databases with pg_attribute being many gigabytes large, > and this is going to make that even worse. > I'll change it if there's a consensus about it, but so far the only other opinion has been from Tom who doesn't apparently see much of a problem. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2018-03-30 10:08:54 +1030, Andrew Dunstan wrote: > On Fri, Mar 30, 2018 at 5:00 AM, Andres Freund <andres@anarazel.de> wrote: > > this'll trigger one slot_getsomeattrs(slot, 2) call from within qual > > evaluation, and then a slot_getsomeattrs(slot, 4) from within the > > projection code. If you then imagine a tuple where only the first > > column exists physically, we'd copy b twice, because attno is only going > > to be one 1. I think we might just want to check tts nvalid in > > getmissingattrs? > OK. so add something like this to the top of slot_getmissingattrs()? > > startAttNum = Max(startAttNum, slot->tts_nvalid); Yea, I think that should do the trick. Greetings, Andres Freund
On 2018-03-29 16:40:29 -0700, Andres Freund wrote: > Hi, > > On 2018-03-30 10:08:54 +1030, Andrew Dunstan wrote: > > On Fri, Mar 30, 2018 at 5:00 AM, Andres Freund <andres@anarazel.de> wrote: > > > this'll trigger one slot_getsomeattrs(slot, 2) call from within qual > > > evaluation, and then a slot_getsomeattrs(slot, 4) from within the > > > projection code. If you then imagine a tuple where only the first > > > column exists physically, we'd copy b twice, because attno is only going > > > to be one 1. I think we might just want to check tts nvalid in > > > getmissingattrs? > > > OK. so add something like this to the top of slot_getmissingattrs()? > > > > startAttNum = Max(startAttNum, slot->tts_nvalid); > > Yea, I think that should do the trick. Hm, or if you want to microoptimize ;): if (startAttNum < slot->tts_nvalid) startAttNum = slot->tts_nvalid I think that can use cmov (i.e. no visible branch), which Max() probably can't usefully. Don't think the compiler can figure out that slot->tts_nvalid cannot be smaller than startAttNum. Greetings, Andres Freund
On Fri, Mar 30, 2018 at 10:08 AM, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > On Fri, Mar 30, 2018 at 5:00 AM, Andres Freund <andres@anarazel.de> wrote: [ missing values are loaded in the TupleDesc by RelationBuildTupleDesc ] >>> > I'm still strongly object to do doing this unconditionally for queries >>> > that never need any of this. We're can end up with a significant number >>> > of large tuples in memory here, and then copy that through dozens of >>> > tupledesc copies in queries. >> >>> We're just doing the same thing we do for default values. >> >> That's really not a defense. It's hurting us there too. >> > > > I will take a look and see if it can be avoided easily. > > For missing values there are three places where we would need to load them on demand: getmissingattr(), slot_getmissingattrs() and expand_tuple(). However, none of those functions have obvious access to the information required to load the missing values. We could probably do something very ugly like getting the relid from the TupleDesc's first attribute. Maybe someone else has a better option. But I can't see a simple and clean way to do this. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
[warning, reviving a thread from 2018] >>>>> "Andrew" == Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > On Wed, Feb 21, 2018 at 7:48 PM, Andres Freund <andres@anarazel.de> wrote: >> Hi, Andrew> Comments interspersed. >>> @@ -4059,10 +4142,6 @@ AttrDefaultFetch(Relation relation) >>> >>> systable_endscan(adscan); >>> heap_close(adrel, AccessShareLock); >>> - >>> - if (found != ndef) >>> - elog(WARNING, "%d attrdef record(s) missing for rel %s", >>> - ndef - found, RelationGetRelationName(relation)); >>> } >> >> Hm, it's not obvious why this is a good thing? I didn't find an answer to this in the thread archive, and the above change apparently did make it into the final patch. I just got through diagnosing a SEGV crash with someone on IRC, and the cause turned out to be exactly this - a table had (for some reason we could not determine within the available resources) lost its pg_attrdef record for the one column it had with a default (which was a serial column, so none of the fast-default code is actually implicated). Any attempt to alter the table resulted in a crash in equalTupleDesc on this line: if (strcmp(defval1->adbin, defval2->adbin) != 0) due to trying to compare adbin values which were NULL pointers. So, questions: why was the check removed in the first place? (Why was it previously only a warning when it causes a crash further down the line on any alteration?) Does equalTupleDesc need to be more defensive about this, or does the above check need to be reinstated? (The immediate issue was fixed by "update pg_attribute set atthasdef=false ..." for the offending attribute and then adding it back with ALTER TABLE, which seems to have cured the crash.) -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > I just got through diagnosing a SEGV crash with someone on IRC, and the > cause turned out to be exactly this - a table had (for some reason we > could not determine within the available resources) lost its pg_attrdef > record for the one column it had with a default (which was a serial > column, so none of the fast-default code is actually implicated). Any > attempt to alter the table resulted in a crash in equalTupleDesc on this > line: > if (strcmp(defval1->adbin, defval2->adbin) != 0) > due to trying to compare adbin values which were NULL pointers. Ouch. > Does equalTupleDesc need to be more defensive about this, or does the > above check need to be reinstated? Looking around at the other touches of AttrDefault.adbin in the backend (of which there are not that many), some of them are prepared for it to be NULL and some are not. I don't immediately have a strong opinion whether that should be an allowed state; but if it is not allowed then it's not okay for AttrDefaultFetch to leave such a state behind. Alternatively, if it is allowed then equalTupleDesc is in the wrong. We should choose one definition and make all the relevant code match it. regards, tom lane
It reminded me of this thread, but nothing ever came of it. https://www.postgresql.org/message-id/20200328223052.GK20103%40telsasoft.com https://www.postgresql.org/message-id/87pmzaq4gx.fsf%40news-spur.riddles.org.uk
On 4/3/21 10:09 PM, Tom Lane wrote: > Andrew Gierth <andrew@tao11.riddles.org.uk> writes: >> I just got through diagnosing a SEGV crash with someone on IRC, and the >> cause turned out to be exactly this - a table had (for some reason we >> could not determine within the available resources) lost its pg_attrdef >> record for the one column it had with a default (which was a serial >> column, so none of the fast-default code is actually implicated). I don't recall why the check was removed. In general the fast default code doesn't touch adbin. I'd be curious to know how this state came about. From the description it sounds like something removed the pg_attrdef entry without adjusting pg_attribute, which shouldn't happen. >> Any >> attempt to alter the table resulted in a crash in equalTupleDesc on this >> line: >> if (strcmp(defval1->adbin, defval2->adbin) != 0) >> due to trying to compare adbin values which were NULL pointers. > Ouch. > >> Does equalTupleDesc need to be more defensive about this, or does the >> above check need to be reinstated? > Looking around at the other touches of AttrDefault.adbin in the backend > (of which there are not that many), some of them are prepared for it to be > NULL and some are not. I don't immediately have a strong opinion whether > that should be an allowed state; but if it is not allowed then it's not > okay for AttrDefaultFetch to leave such a state behind. Alternatively, > if it is allowed then equalTupleDesc is in the wrong. We should choose > one definition and make all the relevant code match it. > > There's already a warning if it sets an explicit NULL value, so I'm inclined to say your suggestion "it's not okay for AttrDefaultFetch to leave such a state behind" is what we should go with. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On 4/4/21 9:19 AM, Justin Pryzby wrote: > It reminded me of this thread, but nothing ever came of it. > https://www.postgresql.org/message-id/20200328223052.GK20103%40telsasoft.com > > > https://www.postgresql.org/message-id/87pmzaq4gx.fsf%40news-spur.riddles.org.uk > > I don't recall seeing this. Around that time I was busy returning from Australia at the start of the pandemic, so my I might have missed it in the hubbub. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes: > On 4/3/21 10:09 PM, Tom Lane wrote: >> Looking around at the other touches of AttrDefault.adbin in the backend >> (of which there are not that many), some of them are prepared for it to be >> NULL and some are not. I don't immediately have a strong opinion whether >> that should be an allowed state; but if it is not allowed then it's not >> okay for AttrDefaultFetch to leave such a state behind. Alternatively, >> if it is allowed then equalTupleDesc is in the wrong. We should choose >> one definition and make all the relevant code match it. > There's already a warning if it sets an explicit NULL value, so I'm > inclined to say your suggestion "it's not okay for AttrDefaultFetch to > leave such a state behind" is what we should go with. Yeah, I came to the same conclusion after looking around a bit more. There are two places in tupdesc.c that defend against adbin being NULL, which seems a bit silly when their sibling equalTupleDesc does not. Every other touch in the backend will segfault on a NULL value, if it doesn't Assert first :-( Another thing that annoyed me while I was looking at this is the potentially O(N^2) behavior in equalTupleDesc due to not wanting to assume that the AttrDefault arrays are in the same order. It seems far smarter to have AttrDefaultFetch sort the arrays. So attached is a proposed patch to clean all this up. * Don't link the AttrDefault array into the relcache entry until it's fully built and valid. * elog, don't just Assert, if we fail to find an expected default value. (Perhaps there's a case for ereport instead.) * Remove now-useless null checks in tupdesc.c. * Sort the AttrDefault array, remove double loops in equalTupleDesc. I made CheckConstraintFetch likewise not install its array until it's done. I notice that it is throwing elog(ERROR) not WARNING for the equivalent cases of not finding the right number of entries. I wonder if we ought to back that off to a WARNING too. elog(ERROR) during relcache entry load is kinda nasty, because it prevents essentially *all* use of the relation. On the other hand, failing to enforce check constraints isn't lovely either. Anybody have opinions about that? regards, tom lane diff --git a/src/backend/access/common/tupdesc.c b/src/backend/access/common/tupdesc.c index cb76465050..d81f6b8a60 100644 --- a/src/backend/access/common/tupdesc.c +++ b/src/backend/access/common/tupdesc.c @@ -174,10 +174,7 @@ CreateTupleDescCopyConstr(TupleDesc tupdesc) cpy->defval = (AttrDefault *) palloc(cpy->num_defval * sizeof(AttrDefault)); memcpy(cpy->defval, constr->defval, cpy->num_defval * sizeof(AttrDefault)); for (i = cpy->num_defval - 1; i >= 0; i--) - { - if (constr->defval[i].adbin) - cpy->defval[i].adbin = pstrdup(constr->defval[i].adbin); - } + cpy->defval[i].adbin = pstrdup(constr->defval[i].adbin); } if (constr->missing) @@ -328,10 +325,7 @@ FreeTupleDesc(TupleDesc tupdesc) AttrDefault *attrdef = tupdesc->constr->defval; for (i = tupdesc->constr->num_defval - 1; i >= 0; i--) - { - if (attrdef[i].adbin) - pfree(attrdef[i].adbin); - } + pfree(attrdef[i].adbin); pfree(attrdef); } if (tupdesc->constr->missing) @@ -412,7 +406,6 @@ bool equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2) { int i, - j, n; if (tupdesc1->natts != tupdesc2->natts) @@ -488,22 +481,13 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2) n = constr1->num_defval; if (n != (int) constr2->num_defval) return false; + /* We assume here that both AttrDefault arrays are in adnum order */ for (i = 0; i < n; i++) { AttrDefault *defval1 = constr1->defval + i; - AttrDefault *defval2 = constr2->defval; - - /* - * We can't assume that the items are always read from the system - * catalogs in the same order; so use the adnum field to identify - * the matching item to compare. - */ - for (j = 0; j < n; defval2++, j++) - { - if (defval1->adnum == defval2->adnum) - break; - } - if (j >= n) + AttrDefault *defval2 = constr2->defval + i; + + if (defval1->adnum != defval2->adnum) return false; if (strcmp(defval1->adbin, defval2->adbin) != 0) return false; @@ -534,25 +518,21 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2) n = constr1->num_check; if (n != (int) constr2->num_check) return false; + + /* + * Similarly, we rely here on the ConstrCheck entries being sorted by + * name. If there are duplicate names, the outcome of the comparison + * is uncertain, but that should not happen. + */ for (i = 0; i < n; i++) { ConstrCheck *check1 = constr1->check + i; - ConstrCheck *check2 = constr2->check; - - /* - * Similarly, don't assume that the checks are always read in the - * same order; match them up by name and contents. (The name - * *should* be unique, but...) - */ - for (j = 0; j < n; check2++, j++) - { - if (strcmp(check1->ccname, check2->ccname) == 0 && - strcmp(check1->ccbin, check2->ccbin) == 0 && - check1->ccvalid == check2->ccvalid && - check1->ccnoinherit == check2->ccnoinherit) - break; - } - if (j >= n) + ConstrCheck *check2 = constr2->check + i; + + if (!(strcmp(check1->ccname, check2->ccname) == 0 && + strcmp(check1->ccbin, check2->ccbin) == 0 && + check1->ccvalid == check2->ccvalid && + check1->ccnoinherit == check2->ccnoinherit)) return false; } } diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 88a68a4697..87f9bdaef0 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -2501,21 +2501,24 @@ MergeAttributes(List *schema, List *supers, char relpersistence, if (attribute->atthasdef) { Node *this_default = NULL; - AttrDefault *attrdef; - int i; /* Find default in constraint structure */ - Assert(constr != NULL); - attrdef = constr->defval; - for (i = 0; i < constr->num_defval; i++) + if (constr != NULL) { - if (attrdef[i].adnum == parent_attno) + AttrDefault *attrdef = constr->defval; + + for (int i = 0; i < constr->num_defval; i++) { - this_default = stringToNode(attrdef[i].adbin); - break; + if (attrdef[i].adnum == parent_attno) + { + this_default = stringToNode(attrdef[i].adbin); + break; + } } } - Assert(this_default != NULL); + if (this_default == NULL) + elog(ERROR, "default expression not found for attribute %d of relation \"%s\"", + parent_attno, RelationGetRelationName(relation)); /* * If it's a GENERATED default, it might contain Vars that diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c index b968c25dd6..2c57b200f8 100644 --- a/src/backend/parser/parse_utilcmd.c +++ b/src/backend/parser/parse_utilcmd.c @@ -1276,7 +1276,9 @@ expandTableLikeClause(RangeVar *heapRel, TableLikeClause *table_like_clause) break; } } - Assert(this_default != NULL); + if (this_default == NULL) + elog(ERROR, "default expression not found for attribute %d of relation \"%s\"", + parent_attno, RelationGetRelationName(relation)); atsubcmd = makeNode(AlterTableCmd); atsubcmd->subtype = AT_CookedColumnDefault; diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c index 92661abae2..531edaacdb 100644 --- a/src/backend/rewrite/rewriteHandler.c +++ b/src/backend/rewrite/rewriteHandler.c @@ -1247,6 +1247,9 @@ build_column_default(Relation rel, int attrno) break; } } + if (expr == NULL) + elog(ERROR, "default expression not found for attribute %d of relation \"%s\"", + attrno, RelationGetRelationName(rel)); } /* diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index ff7395c85b..b05026f1c5 100644 --- a/src/backend/utils/cache/relcache.c +++ b/src/backend/utils/cache/relcache.c @@ -282,7 +282,8 @@ static void RelationInitPhysicalAddr(Relation relation); static void load_critical_index(Oid indexoid, Oid heapoid); static TupleDesc GetPgClassDescriptor(void); static TupleDesc GetPgIndexDescriptor(void); -static void AttrDefaultFetch(Relation relation); +static void AttrDefaultFetch(Relation relation, int ndef); +static int AttrDefaultCmp(const void *a, const void *b); static void CheckConstraintFetch(Relation relation); static int CheckConstraintCmp(const void *a, const void *b); static void InitIndexAmRoutine(Relation relation); @@ -503,7 +504,6 @@ RelationBuildTupleDesc(Relation relation) ScanKeyData skey[2]; int need; TupleConstr *constr; - AttrDefault *attrdef = NULL; AttrMissing *attrmiss = NULL; int ndef = 0; @@ -512,8 +512,8 @@ RelationBuildTupleDesc(Relation relation) relation->rd_rel->reltype ? relation->rd_rel->reltype : RECORDOID; relation->rd_att->tdtypmod = -1; /* just to be sure */ - constr = (TupleConstr *) MemoryContextAlloc(CacheMemoryContext, - sizeof(TupleConstr)); + constr = (TupleConstr *) MemoryContextAllocZero(CacheMemoryContext, + sizeof(TupleConstr)); constr->has_not_null = false; constr->has_generated_stored = false; @@ -560,7 +560,6 @@ RelationBuildTupleDesc(Relation relation) elog(ERROR, "invalid attribute number %d for %s", attp->attnum, RelationGetRelationName(relation)); - memcpy(TupleDescAttr(relation->rd_att, attnum - 1), attp, ATTRIBUTE_FIXED_PART_SIZE); @@ -570,22 +569,10 @@ RelationBuildTupleDesc(Relation relation) constr->has_not_null = true; if (attp->attgenerated == ATTRIBUTE_GENERATED_STORED) constr->has_generated_stored = true; - - /* If the column has a default, fill it into the attrdef array */ if (attp->atthasdef) - { - if (attrdef == NULL) - attrdef = (AttrDefault *) - MemoryContextAllocZero(CacheMemoryContext, - RelationGetNumberOfAttributes(relation) * - sizeof(AttrDefault)); - attrdef[ndef].adnum = attnum; - attrdef[ndef].adbin = NULL; - ndef++; - } - /* Likewise for a missing value */ + /* If the column has a "missing" value, put it in the attrmiss array */ if (attp->atthasmissing) { Datum missingval; @@ -680,33 +667,19 @@ RelationBuildTupleDesc(Relation relation) constr->has_generated_stored || ndef > 0 || attrmiss || - relation->rd_rel->relchecks) + relation->rd_rel->relchecks > 0) { relation->rd_att->constr = constr; if (ndef > 0) /* DEFAULTs */ - { - if (ndef < RelationGetNumberOfAttributes(relation)) - constr->defval = (AttrDefault *) - repalloc(attrdef, ndef * sizeof(AttrDefault)); - else - constr->defval = attrdef; - constr->num_defval = ndef; - AttrDefaultFetch(relation); - } + AttrDefaultFetch(relation, ndef); else constr->num_defval = 0; constr->missing = attrmiss; if (relation->rd_rel->relchecks > 0) /* CHECKs */ - { - constr->num_check = relation->rd_rel->relchecks; - constr->check = (ConstrCheck *) - MemoryContextAllocZero(CacheMemoryContext, - constr->num_check * sizeof(ConstrCheck)); CheckConstraintFetch(relation); - } else constr->num_check = 0; } @@ -4251,21 +4224,28 @@ GetPgIndexDescriptor(void) /* * Load any default attribute value definitions for the relation. + * + * ndef is the number of attributes that were marked atthasdef. + * + * Note: we don't make it a hard error to be missing some pg_attrdef records. + * We can limp along as long as no INSERT or UPDATE needs to use the entry. */ static void -AttrDefaultFetch(Relation relation) +AttrDefaultFetch(Relation relation, int ndef) { - AttrDefault *attrdef = relation->rd_att->constr->defval; - int ndef = relation->rd_att->constr->num_defval; + AttrDefault *attrdef; Relation adrel; SysScanDesc adscan; ScanKeyData skey; HeapTuple htup; - Datum val; - bool isnull; - int found; - int i; + int found = 0; + + /* Allocate array with room for as many entries as expected */ + attrdef = (AttrDefault *) + MemoryContextAllocZero(CacheMemoryContext, + ndef * sizeof(AttrDefault)); + /* Search pg_attrdef for relevant entries */ ScanKeyInit(&skey, Anum_pg_attrdef_adrelid, BTEqualStrategyNumber, F_OIDEQ, @@ -4274,49 +4254,69 @@ AttrDefaultFetch(Relation relation) adrel = table_open(AttrDefaultRelationId, AccessShareLock); adscan = systable_beginscan(adrel, AttrDefaultIndexId, true, NULL, 1, &skey); - found = 0; while (HeapTupleIsValid(htup = systable_getnext(adscan))) { Form_pg_attrdef adform = (Form_pg_attrdef) GETSTRUCT(htup); - Form_pg_attribute attr = TupleDescAttr(relation->rd_att, adform->adnum - 1); + Datum val; + bool isnull; - for (i = 0; i < ndef; i++) + /* protect limited size of array */ + if (found >= ndef) { - if (adform->adnum != attrdef[i].adnum) - continue; - if (attrdef[i].adbin != NULL) - elog(WARNING, "multiple attrdef records found for attr %s of rel %s", - NameStr(attr->attname), - RelationGetRelationName(relation)); - else - found++; - - val = fastgetattr(htup, - Anum_pg_attrdef_adbin, - adrel->rd_att, &isnull); - if (isnull) - elog(WARNING, "null adbin for attr %s of rel %s", - NameStr(attr->attname), - RelationGetRelationName(relation)); - else - { - /* detoast and convert to cstring in caller's context */ - char *s = TextDatumGetCString(val); - - attrdef[i].adbin = MemoryContextStrdup(CacheMemoryContext, s); - pfree(s); - } + elog(WARNING, "unexpected attrdef record found for attr %d of rel %s", + adform->adnum, RelationGetRelationName(relation)); break; } - if (i >= ndef) - elog(WARNING, "unexpected attrdef record found for attr %d of rel %s", + val = fastgetattr(htup, + Anum_pg_attrdef_adbin, + adrel->rd_att, &isnull); + if (isnull) + elog(WARNING, "null adbin for attr %d of rel %s", adform->adnum, RelationGetRelationName(relation)); + else + { + /* detoast and convert to cstring in caller's context */ + char *s = TextDatumGetCString(val); + + attrdef[found].adnum = adform->adnum; + attrdef[found].adbin = MemoryContextStrdup(CacheMemoryContext, s); + found++; + pfree(s); + } } systable_endscan(adscan); table_close(adrel, AccessShareLock); + + if (found != ndef) + elog(WARNING, "%d attrdef record(s) missing for rel %s", + ndef - found, RelationGetRelationName(relation)); + + /* + * Sort the AttrDefault entries by adnum, for the convenience of + * equalTupleDescs(). (Usually, they already will be in order, but this + * might not be so if systable_getnext isn't using an index.) + */ + if (found > 1) + qsort(attrdef, found, sizeof(AttrDefault), AttrDefaultCmp); + + /* Install array only after it's fully valid */ + relation->rd_att->constr->defval = attrdef; + relation->rd_att->constr->num_defval = found; +} + +/* + * qsort comparator to sort AttrDefault entries by adnum + */ +static int +AttrDefaultCmp(const void *a, const void *b) +{ + const AttrDefault *ada = (const AttrDefault *) a; + const AttrDefault *adb = (const AttrDefault *) b; + + return ada->adnum - adb->adnum; } /* @@ -4325,14 +4325,20 @@ AttrDefaultFetch(Relation relation) static void CheckConstraintFetch(Relation relation) { - ConstrCheck *check = relation->rd_att->constr->check; - int ncheck = relation->rd_att->constr->num_check; + ConstrCheck *check; + int ncheck = relation->rd_rel->relchecks; Relation conrel; SysScanDesc conscan; ScanKeyData skey[1]; HeapTuple htup; int found = 0; + /* Allocate array with room for as many entries as expected */ + check = (ConstrCheck *) + MemoryContextAllocZero(CacheMemoryContext, + ncheck * sizeof(ConstrCheck)); + + /* Search pg_constraint for relevant entries */ ScanKeyInit(&skey[0], Anum_pg_constraint_conrelid, BTEqualStrategyNumber, F_OIDEQ, @@ -4353,6 +4359,7 @@ CheckConstraintFetch(Relation relation) if (conform->contype != CONSTRAINT_CHECK) continue; + /* protect limited size of array */ if (found >= ncheck) elog(ERROR, "unexpected constraint record found for rel %s", RelationGetRelationName(relation)); @@ -4385,9 +4392,16 @@ CheckConstraintFetch(Relation relation) elog(ERROR, "%d constraint record(s) missing for rel %s", ncheck - found, RelationGetRelationName(relation)); - /* Sort the records so that CHECKs are applied in a deterministic order */ - if (ncheck > 1) - qsort(check, ncheck, sizeof(ConstrCheck), CheckConstraintCmp); + /* + * Sort the records by name. This ensures that CHECKs are applied in a + * deterministic order, and it also makes equalTupleDescs() faster. + */ + if (found > 1) + qsort(check, found, sizeof(ConstrCheck), CheckConstraintCmp); + + /* Install array only after it's fully valid */ + relation->rd_att->constr->check = check; + relation->rd_att->constr->num_check = found; } /*
On 4/4/21 12:05 PM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> On 4/3/21 10:09 PM, Tom Lane wrote: >>> Looking around at the other touches of AttrDefault.adbin in the backend >>> (of which there are not that many), some of them are prepared for it to be >>> NULL and some are not. I don't immediately have a strong opinion whether >>> that should be an allowed state; but if it is not allowed then it's not >>> okay for AttrDefaultFetch to leave such a state behind. Alternatively, >>> if it is allowed then equalTupleDesc is in the wrong. We should choose >>> one definition and make all the relevant code match it. >> There's already a warning if it sets an explicit NULL value, so I'm >> inclined to say your suggestion "it's not okay for AttrDefaultFetch to >> leave such a state behind" is what we should go with. > Yeah, I came to the same conclusion after looking around a bit more. > There are two places in tupdesc.c that defend against adbin being NULL, > which seems a bit silly when their sibling equalTupleDesc does not. > Every other touch in the backend will segfault on a NULL value, > if it doesn't Assert first :-( > > Another thing that annoyed me while I was looking at this is the > potentially O(N^2) behavior in equalTupleDesc due to not wanting > to assume that the AttrDefault arrays are in the same order. > It seems far smarter to have AttrDefaultFetch sort the arrays. +1 The O(N^2) loops also bothered me. > > So attached is a proposed patch to clean all this up. > > * Don't link the AttrDefault array into the relcache entry until > it's fully built and valid. > > * elog, don't just Assert, if we fail to find an expected default > value. (Perhaps there's a case for ereport instead.) > > * Remove now-useless null checks in tupdesc.c. > > * Sort the AttrDefault array, remove double loops in equalTupleDesc. This all looks a lot cleaner and simpler to follow. I like not allocating the array until AttrDefaultFetch. > > I made CheckConstraintFetch likewise not install its array until > it's done. I notice that it is throwing elog(ERROR) not WARNING > for the equivalent cases of not finding the right number of > entries. I wonder if we ought to back that off to a WARNING too. > elog(ERROR) during relcache entry load is kinda nasty, because > it prevents essentially *all* use of the relation. On the other > hand, failing to enforce check constraints isn't lovely either. > Anybody have opinions about that? None of this is supposed to happen, is it? If you can't fetch the constraints (and I think of a default as almost a kind of constraint) your database is already somehow corrupted. So maybe if any of these things happen we should ERROR out. Anything else seems likely to corrupt the database further. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Hi,
Thanks for the cleanup.
if (found != ncheck)
elog(ERROR, "%d constraint record(s) missing for rel %s",
ncheck - found, RelationGetRelationName(relation));
Since there is check on found being smaller than ncheck inside the loop, the if condition can be written as:
if (found < ncheck)
Actually the above is implied by the expression, ncheck - found.
One minor comment w.r.t. the variable name is that, found is normally a bool variable.
Maybe rename the variable nfound which would be clearer in its meaning.
+ if (found != ndef)
+ elog(WARNING, "%d attrdef record(s) missing for rel %s",
+ ndef - found, RelationGetRelationName(relation));
Since only warning is logged, there seems to be some wasted space in attrdef. Would such space accumulate, resulting in some memory leak ?
I think the attrdef should be copied to another array of the proper size so that there is no chance of memory leak.
Thanks
elog(ERROR, "%d constraint record(s) missing for rel %s",
ncheck - found, RelationGetRelationName(relation));
Since there is check on found being smaller than ncheck inside the loop, the if condition can be written as:
if (found < ncheck)
Actually the above is implied by the expression, ncheck - found.
One minor comment w.r.t. the variable name is that, found is normally a bool variable.
Maybe rename the variable nfound which would be clearer in its meaning.
+ if (found != ndef)
+ elog(WARNING, "%d attrdef record(s) missing for rel %s",
+ ndef - found, RelationGetRelationName(relation));
Since only warning is logged, there seems to be some wasted space in attrdef. Would such space accumulate, resulting in some memory leak ?
I think the attrdef should be copied to another array of the proper size so that there is no chance of memory leak.
Thanks
On Sun, Apr 4, 2021 at 9:05 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 4/3/21 10:09 PM, Tom Lane wrote:
>> Looking around at the other touches of AttrDefault.adbin in the backend
>> (of which there are not that many), some of them are prepared for it to be
>> NULL and some are not. I don't immediately have a strong opinion whether
>> that should be an allowed state; but if it is not allowed then it's not
>> okay for AttrDefaultFetch to leave such a state behind. Alternatively,
>> if it is allowed then equalTupleDesc is in the wrong. We should choose
>> one definition and make all the relevant code match it.
> There's already a warning if it sets an explicit NULL value, so I'm
> inclined to say your suggestion "it's not okay for AttrDefaultFetch to
> leave such a state behind" is what we should go with.
Yeah, I came to the same conclusion after looking around a bit more.
There are two places in tupdesc.c that defend against adbin being NULL,
which seems a bit silly when their sibling equalTupleDesc does not.
Every other touch in the backend will segfault on a NULL value,
if it doesn't Assert first :-(
Another thing that annoyed me while I was looking at this is the
potentially O(N^2) behavior in equalTupleDesc due to not wanting
to assume that the AttrDefault arrays are in the same order.
It seems far smarter to have AttrDefaultFetch sort the arrays.
So attached is a proposed patch to clean all this up.
* Don't link the AttrDefault array into the relcache entry until
it's fully built and valid.
* elog, don't just Assert, if we fail to find an expected default
value. (Perhaps there's a case for ereport instead.)
* Remove now-useless null checks in tupdesc.c.
* Sort the AttrDefault array, remove double loops in equalTupleDesc.
I made CheckConstraintFetch likewise not install its array until
it's done. I notice that it is throwing elog(ERROR) not WARNING
for the equivalent cases of not finding the right number of
entries. I wonder if we ought to back that off to a WARNING too.
elog(ERROR) during relcache entry load is kinda nasty, because
it prevents essentially *all* use of the relation. On the other
hand, failing to enforce check constraints isn't lovely either.
Anybody have opinions about that?
regards, tom lane
On 4/4/21 11:21 AM, Andrew Dunstan wrote: > On 4/4/21 9:19 AM, Justin Pryzby wrote: >> It reminded me of this thread, but nothing ever came of it. >> https://www.postgresql.org/message-id/20200328223052.GK20103%40telsasoft.com >> >> >> https://www.postgresql.org/message-id/87pmzaq4gx.fsf%40news-spur.riddles.org.uk >> >> > I don't recall seeing this. Around that time I was busy returning from > Australia at the start of the pandemic, so my I might have missed it in > the hubbub. > > If this is still an issue, is it possible to get a self-contained test to reproduce it? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes: > On 4/4/21 12:05 PM, Tom Lane wrote: >> I made CheckConstraintFetch likewise not install its array until >> it's done. I notice that it is throwing elog(ERROR) not WARNING >> for the equivalent cases of not finding the right number of >> entries. I wonder if we ought to back that off to a WARNING too. >> elog(ERROR) during relcache entry load is kinda nasty, because >> it prevents essentially *all* use of the relation. On the other >> hand, failing to enforce check constraints isn't lovely either. >> Anybody have opinions about that? > None of this is supposed to happen, is it? If you can't fetch the > constraints (and I think of a default as almost a kind of constraint) > your database is already somehow corrupted. So maybe if any of these > things happen we should ERROR out. Anything else seems likely to corrupt > the database further. Meh. "pg_class.relchecks is inconsistent with the number of entries in pg_constraint" does not seem to me like a scary enough situation to justify a panic response. Maybe there's an argument for failing at the point where we'd need to actually apply the CHECK constraints (similarly to what my patch is doing for missing defaults). But preventing the user from, say, dumping the data in the table seems to me to be making the situation worse not better. regards, tom lane
On 4/4/21 6:50 PM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> On 4/4/21 12:05 PM, Tom Lane wrote: >>> I made CheckConstraintFetch likewise not install its array until >>> it's done. I notice that it is throwing elog(ERROR) not WARNING >>> for the equivalent cases of not finding the right number of >>> entries. I wonder if we ought to back that off to a WARNING too. >>> elog(ERROR) during relcache entry load is kinda nasty, because >>> it prevents essentially *all* use of the relation. On the other >>> hand, failing to enforce check constraints isn't lovely either. >>> Anybody have opinions about that? >> None of this is supposed to happen, is it? If you can't fetch the >> constraints (and I think of a default as almost a kind of constraint) >> your database is already somehow corrupted. So maybe if any of these >> things happen we should ERROR out. Anything else seems likely to corrupt >> the database further. > Meh. "pg_class.relchecks is inconsistent with the number of entries > in pg_constraint" does not seem to me like a scary enough situation > to justify a panic response. Maybe there's an argument for failing > at the point where we'd need to actually apply the CHECK constraints > (similarly to what my patch is doing for missing defaults). > But preventing the user from, say, dumping the data in the table > seems to me to be making the situation worse not better. > > OK, fair argument. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes: > On 4/4/21 6:50 PM, Tom Lane wrote: >> Meh. "pg_class.relchecks is inconsistent with the number of entries >> in pg_constraint" does not seem to me like a scary enough situation >> to justify a panic response. Maybe there's an argument for failing >> at the point where we'd need to actually apply the CHECK constraints >> (similarly to what my patch is doing for missing defaults). >> But preventing the user from, say, dumping the data in the table >> seems to me to be making the situation worse not better. > OK, fair argument. Here's a v2 that applies the same principles to struct ConstrCheck as AttrDefault, ie create only valid entries (no NULL strings), and if there's not the right number, complain at point of use rather than failing the relcache load. There is a hole in this, which is that if pg_class.relchecks = 0 then we won't even look in pg_constraint, so we won't realize if there is an inconsistency. But that was true before, and I don't want to expend an almost-always-useless catalog search to see if relchecks = 0 is a lie. I also tried to bring the related message texts up to something approaching project standards, though I didn't convert them into translatable ereports. regards, tom lane diff --git a/src/backend/access/common/tupdesc.c b/src/backend/access/common/tupdesc.c index cb76465050..ffb275afbe 100644 --- a/src/backend/access/common/tupdesc.c +++ b/src/backend/access/common/tupdesc.c @@ -174,10 +174,7 @@ CreateTupleDescCopyConstr(TupleDesc tupdesc) cpy->defval = (AttrDefault *) palloc(cpy->num_defval * sizeof(AttrDefault)); memcpy(cpy->defval, constr->defval, cpy->num_defval * sizeof(AttrDefault)); for (i = cpy->num_defval - 1; i >= 0; i--) - { - if (constr->defval[i].adbin) - cpy->defval[i].adbin = pstrdup(constr->defval[i].adbin); - } + cpy->defval[i].adbin = pstrdup(constr->defval[i].adbin); } if (constr->missing) @@ -203,10 +200,8 @@ CreateTupleDescCopyConstr(TupleDesc tupdesc) memcpy(cpy->check, constr->check, cpy->num_check * sizeof(ConstrCheck)); for (i = cpy->num_check - 1; i >= 0; i--) { - if (constr->check[i].ccname) - cpy->check[i].ccname = pstrdup(constr->check[i].ccname); - if (constr->check[i].ccbin) - cpy->check[i].ccbin = pstrdup(constr->check[i].ccbin); + cpy->check[i].ccname = pstrdup(constr->check[i].ccname); + cpy->check[i].ccbin = pstrdup(constr->check[i].ccbin); cpy->check[i].ccvalid = constr->check[i].ccvalid; cpy->check[i].ccnoinherit = constr->check[i].ccnoinherit; } @@ -328,10 +323,7 @@ FreeTupleDesc(TupleDesc tupdesc) AttrDefault *attrdef = tupdesc->constr->defval; for (i = tupdesc->constr->num_defval - 1; i >= 0; i--) - { - if (attrdef[i].adbin) - pfree(attrdef[i].adbin); - } + pfree(attrdef[i].adbin); pfree(attrdef); } if (tupdesc->constr->missing) @@ -352,10 +344,8 @@ FreeTupleDesc(TupleDesc tupdesc) for (i = tupdesc->constr->num_check - 1; i >= 0; i--) { - if (check[i].ccname) - pfree(check[i].ccname); - if (check[i].ccbin) - pfree(check[i].ccbin); + pfree(check[i].ccname); + pfree(check[i].ccbin); } pfree(check); } @@ -412,7 +402,6 @@ bool equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2) { int i, - j, n; if (tupdesc1->natts != tupdesc2->natts) @@ -488,22 +477,13 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2) n = constr1->num_defval; if (n != (int) constr2->num_defval) return false; + /* We assume here that both AttrDefault arrays are in adnum order */ for (i = 0; i < n; i++) { AttrDefault *defval1 = constr1->defval + i; - AttrDefault *defval2 = constr2->defval; - - /* - * We can't assume that the items are always read from the system - * catalogs in the same order; so use the adnum field to identify - * the matching item to compare. - */ - for (j = 0; j < n; defval2++, j++) - { - if (defval1->adnum == defval2->adnum) - break; - } - if (j >= n) + AttrDefault *defval2 = constr2->defval + i; + + if (defval1->adnum != defval2->adnum) return false; if (strcmp(defval1->adbin, defval2->adbin) != 0) return false; @@ -534,25 +514,21 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2) n = constr1->num_check; if (n != (int) constr2->num_check) return false; + + /* + * Similarly, we rely here on the ConstrCheck entries being sorted by + * name. If there are duplicate names, the outcome of the comparison + * is uncertain, but that should not happen. + */ for (i = 0; i < n; i++) { ConstrCheck *check1 = constr1->check + i; - ConstrCheck *check2 = constr2->check; - - /* - * Similarly, don't assume that the checks are always read in the - * same order; match them up by name and contents. (The name - * *should* be unique, but...) - */ - for (j = 0; j < n; check2++, j++) - { - if (strcmp(check1->ccname, check2->ccname) == 0 && - strcmp(check1->ccbin, check2->ccbin) == 0 && - check1->ccvalid == check2->ccvalid && - check1->ccnoinherit == check2->ccnoinherit) - break; - } - if (j >= n) + ConstrCheck *check2 = constr2->check + i; + + if (!(strcmp(check1->ccname, check2->ccname) == 0 && + strcmp(check1->ccbin, check2->ccbin) == 0 && + check1->ccvalid == check2->ccvalid && + check1->ccnoinherit == check2->ccnoinherit)) return false; } } diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 88a68a4697..87f9bdaef0 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -2501,21 +2501,24 @@ MergeAttributes(List *schema, List *supers, char relpersistence, if (attribute->atthasdef) { Node *this_default = NULL; - AttrDefault *attrdef; - int i; /* Find default in constraint structure */ - Assert(constr != NULL); - attrdef = constr->defval; - for (i = 0; i < constr->num_defval; i++) + if (constr != NULL) { - if (attrdef[i].adnum == parent_attno) + AttrDefault *attrdef = constr->defval; + + for (int i = 0; i < constr->num_defval; i++) { - this_default = stringToNode(attrdef[i].adbin); - break; + if (attrdef[i].adnum == parent_attno) + { + this_default = stringToNode(attrdef[i].adbin); + break; + } } } - Assert(this_default != NULL); + if (this_default == NULL) + elog(ERROR, "default expression not found for attribute %d of relation \"%s\"", + parent_attno, RelationGetRelationName(relation)); /* * If it's a GENERATED default, it might contain Vars that diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 163242f54e..9f86910a6b 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -1609,6 +1609,15 @@ ExecRelCheck(ResultRelInfo *resultRelInfo, MemoryContext oldContext; int i; + /* + * CheckConstraintFetch let this pass with only a warning, but now we + * should fail rather than possibly failing to enforce an important + * constraint. + */ + if (ncheck != rel->rd_rel->relchecks) + elog(ERROR, "%d pg_constraint record(s) missing for relation \"%s\"", + rel->rd_rel->relchecks - ncheck, RelationGetRelationName(rel)); + /* * If first time through for this result relation, build expression * nodetrees for rel's constraint expressions. Keep them in the per-query @@ -1862,7 +1871,7 @@ ExecConstraints(ResultRelInfo *resultRelInfo, } } - if (constr->num_check > 0) + if (rel->rd_rel->relchecks > 0) { const char *failed; diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c index b968c25dd6..9dd30370da 100644 --- a/src/backend/parser/parse_utilcmd.c +++ b/src/backend/parser/parse_utilcmd.c @@ -1239,8 +1239,6 @@ expandTableLikeClause(RangeVar *heapRel, TableLikeClause *table_like_clause) (CREATE_TABLE_LIKE_DEFAULTS | CREATE_TABLE_LIKE_GENERATED)) && constr != NULL) { - AttrDefault *attrdef = constr->defval; - for (parent_attno = 1; parent_attno <= tupleDesc->natts; parent_attno++) { @@ -1264,6 +1262,7 @@ expandTableLikeClause(RangeVar *heapRel, TableLikeClause *table_like_clause) (table_like_clause->options & CREATE_TABLE_LIKE_DEFAULTS))) { Node *this_default = NULL; + AttrDefault *attrdef = constr->defval; AlterTableCmd *atsubcmd; bool found_whole_row; @@ -1276,7 +1275,9 @@ expandTableLikeClause(RangeVar *heapRel, TableLikeClause *table_like_clause) break; } } - Assert(this_default != NULL); + if (this_default == NULL) + elog(ERROR, "default expression not found for attribute %d of relation \"%s\"", + parent_attno, RelationGetRelationName(relation)); atsubcmd = makeNode(AlterTableCmd); atsubcmd->subtype = AT_CookedColumnDefault; diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c index 92661abae2..da78f02775 100644 --- a/src/backend/rewrite/rewriteHandler.c +++ b/src/backend/rewrite/rewriteHandler.c @@ -1228,25 +1228,28 @@ build_column_default(Relation rel, int attrno) } /* - * Scan to see if relation has a default for this column. + * If relation has a default for this column, fetch that expression. */ - if (att_tup->atthasdef && rd_att->constr && - rd_att->constr->num_defval > 0) + if (att_tup->atthasdef) { - AttrDefault *defval = rd_att->constr->defval; - int ndef = rd_att->constr->num_defval; - - while (--ndef >= 0) + if (rd_att->constr && rd_att->constr->num_defval > 0) { - if (attrno == defval[ndef].adnum) + AttrDefault *defval = rd_att->constr->defval; + int ndef = rd_att->constr->num_defval; + + while (--ndef >= 0) { - /* - * Found it, convert string representation to node tree. - */ - expr = stringToNode(defval[ndef].adbin); - break; + if (attrno == defval[ndef].adnum) + { + /* Found it, convert string representation to node tree. */ + expr = stringToNode(defval[ndef].adbin); + break; + } } } + if (expr == NULL) + elog(ERROR, "default expression not found for attribute %d of relation \"%s\"", + attrno, RelationGetRelationName(rel)); } /* diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index ff7395c85b..29702d6eab 100644 --- a/src/backend/utils/cache/relcache.c +++ b/src/backend/utils/cache/relcache.c @@ -282,7 +282,8 @@ static void RelationInitPhysicalAddr(Relation relation); static void load_critical_index(Oid indexoid, Oid heapoid); static TupleDesc GetPgClassDescriptor(void); static TupleDesc GetPgIndexDescriptor(void); -static void AttrDefaultFetch(Relation relation); +static void AttrDefaultFetch(Relation relation, int ndef); +static int AttrDefaultCmp(const void *a, const void *b); static void CheckConstraintFetch(Relation relation); static int CheckConstraintCmp(const void *a, const void *b); static void InitIndexAmRoutine(Relation relation); @@ -503,7 +504,6 @@ RelationBuildTupleDesc(Relation relation) ScanKeyData skey[2]; int need; TupleConstr *constr; - AttrDefault *attrdef = NULL; AttrMissing *attrmiss = NULL; int ndef = 0; @@ -512,8 +512,8 @@ RelationBuildTupleDesc(Relation relation) relation->rd_rel->reltype ? relation->rd_rel->reltype : RECORDOID; relation->rd_att->tdtypmod = -1; /* just to be sure */ - constr = (TupleConstr *) MemoryContextAlloc(CacheMemoryContext, - sizeof(TupleConstr)); + constr = (TupleConstr *) MemoryContextAllocZero(CacheMemoryContext, + sizeof(TupleConstr)); constr->has_not_null = false; constr->has_generated_stored = false; @@ -557,10 +557,9 @@ RelationBuildTupleDesc(Relation relation) attnum = attp->attnum; if (attnum <= 0 || attnum > RelationGetNumberOfAttributes(relation)) - elog(ERROR, "invalid attribute number %d for %s", + elog(ERROR, "invalid attribute number %d for relation \"%s\"", attp->attnum, RelationGetRelationName(relation)); - memcpy(TupleDescAttr(relation->rd_att, attnum - 1), attp, ATTRIBUTE_FIXED_PART_SIZE); @@ -570,22 +569,10 @@ RelationBuildTupleDesc(Relation relation) constr->has_not_null = true; if (attp->attgenerated == ATTRIBUTE_GENERATED_STORED) constr->has_generated_stored = true; - - /* If the column has a default, fill it into the attrdef array */ if (attp->atthasdef) - { - if (attrdef == NULL) - attrdef = (AttrDefault *) - MemoryContextAllocZero(CacheMemoryContext, - RelationGetNumberOfAttributes(relation) * - sizeof(AttrDefault)); - attrdef[ndef].adnum = attnum; - attrdef[ndef].adbin = NULL; - ndef++; - } - /* Likewise for a missing value */ + /* If the column has a "missing" value, put it in the attrmiss array */ if (attp->atthasmissing) { Datum missingval; @@ -648,7 +635,7 @@ RelationBuildTupleDesc(Relation relation) table_close(pg_attribute_desc, AccessShareLock); if (need != 0) - elog(ERROR, "catalog is missing %d attribute(s) for relid %u", + elog(ERROR, "pg_attribute catalog is missing %d attribute(s) for relation OID %u", need, RelationGetRelid(relation)); /* @@ -680,33 +667,19 @@ RelationBuildTupleDesc(Relation relation) constr->has_generated_stored || ndef > 0 || attrmiss || - relation->rd_rel->relchecks) + relation->rd_rel->relchecks > 0) { relation->rd_att->constr = constr; if (ndef > 0) /* DEFAULTs */ - { - if (ndef < RelationGetNumberOfAttributes(relation)) - constr->defval = (AttrDefault *) - repalloc(attrdef, ndef * sizeof(AttrDefault)); - else - constr->defval = attrdef; - constr->num_defval = ndef; - AttrDefaultFetch(relation); - } + AttrDefaultFetch(relation, ndef); else constr->num_defval = 0; constr->missing = attrmiss; if (relation->rd_rel->relchecks > 0) /* CHECKs */ - { - constr->num_check = relation->rd_rel->relchecks; - constr->check = (ConstrCheck *) - MemoryContextAllocZero(CacheMemoryContext, - constr->num_check * sizeof(ConstrCheck)); CheckConstraintFetch(relation); - } else constr->num_check = 0; } @@ -4251,21 +4224,29 @@ GetPgIndexDescriptor(void) /* * Load any default attribute value definitions for the relation. + * + * ndef is the number of attributes that were marked atthasdef. + * + * Note: we don't make it a hard error to be missing some pg_attrdef records. + * We can limp along as long as nothing needs to use the default value. Code + * that fails to find an expected AttrDefault record should throw an error. */ static void -AttrDefaultFetch(Relation relation) +AttrDefaultFetch(Relation relation, int ndef) { - AttrDefault *attrdef = relation->rd_att->constr->defval; - int ndef = relation->rd_att->constr->num_defval; + AttrDefault *attrdef; Relation adrel; SysScanDesc adscan; ScanKeyData skey; HeapTuple htup; - Datum val; - bool isnull; - int found; - int i; + int found = 0; + + /* Allocate array with room for as many entries as expected */ + attrdef = (AttrDefault *) + MemoryContextAllocZero(CacheMemoryContext, + ndef * sizeof(AttrDefault)); + /* Search pg_attrdef for relevant entries */ ScanKeyInit(&skey, Anum_pg_attrdef_adrelid, BTEqualStrategyNumber, F_OIDEQ, @@ -4274,65 +4255,94 @@ AttrDefaultFetch(Relation relation) adrel = table_open(AttrDefaultRelationId, AccessShareLock); adscan = systable_beginscan(adrel, AttrDefaultIndexId, true, NULL, 1, &skey); - found = 0; while (HeapTupleIsValid(htup = systable_getnext(adscan))) { Form_pg_attrdef adform = (Form_pg_attrdef) GETSTRUCT(htup); - Form_pg_attribute attr = TupleDescAttr(relation->rd_att, adform->adnum - 1); + Datum val; + bool isnull; - for (i = 0; i < ndef; i++) + /* protect limited size of array */ + if (found >= ndef) { - if (adform->adnum != attrdef[i].adnum) - continue; - if (attrdef[i].adbin != NULL) - elog(WARNING, "multiple attrdef records found for attr %s of rel %s", - NameStr(attr->attname), - RelationGetRelationName(relation)); - else - found++; - - val = fastgetattr(htup, - Anum_pg_attrdef_adbin, - adrel->rd_att, &isnull); - if (isnull) - elog(WARNING, "null adbin for attr %s of rel %s", - NameStr(attr->attname), - RelationGetRelationName(relation)); - else - { - /* detoast and convert to cstring in caller's context */ - char *s = TextDatumGetCString(val); - - attrdef[i].adbin = MemoryContextStrdup(CacheMemoryContext, s); - pfree(s); - } + elog(WARNING, "unexpected pg_attrdef record found for attribute %d of relation \"%s\"", + adform->adnum, RelationGetRelationName(relation)); break; } - if (i >= ndef) - elog(WARNING, "unexpected attrdef record found for attr %d of rel %s", + val = fastgetattr(htup, + Anum_pg_attrdef_adbin, + adrel->rd_att, &isnull); + if (isnull) + elog(WARNING, "null adbin for attribute %d of relation \"%s\"", adform->adnum, RelationGetRelationName(relation)); + else + { + /* detoast and convert to cstring in caller's context */ + char *s = TextDatumGetCString(val); + + attrdef[found].adnum = adform->adnum; + attrdef[found].adbin = MemoryContextStrdup(CacheMemoryContext, s); + pfree(s); + found++; + } } systable_endscan(adscan); table_close(adrel, AccessShareLock); + + if (found != ndef) + elog(WARNING, "%d pg_attrdef record(s) missing for relation \"%s\"", + ndef - found, RelationGetRelationName(relation)); + + /* + * Sort the AttrDefault entries by adnum, for the convenience of + * equalTupleDescs(). (Usually, they already will be in order, but this + * might not be so if systable_getnext isn't using an index.) + */ + if (found > 1) + qsort(attrdef, found, sizeof(AttrDefault), AttrDefaultCmp); + + /* Install array only after it's fully valid */ + relation->rd_att->constr->defval = attrdef; + relation->rd_att->constr->num_defval = found; +} + +/* + * qsort comparator to sort AttrDefault entries by adnum + */ +static int +AttrDefaultCmp(const void *a, const void *b) +{ + const AttrDefault *ada = (const AttrDefault *) a; + const AttrDefault *adb = (const AttrDefault *) b; + + return ada->adnum - adb->adnum; } /* * Load any check constraints for the relation. + * + * As with defaults, if we don't find the expected number of them, just warn + * here. The executor should throw an error if an INSERT/UPDATE is attempted. */ static void CheckConstraintFetch(Relation relation) { - ConstrCheck *check = relation->rd_att->constr->check; - int ncheck = relation->rd_att->constr->num_check; + ConstrCheck *check; + int ncheck = relation->rd_rel->relchecks; Relation conrel; SysScanDesc conscan; ScanKeyData skey[1]; HeapTuple htup; int found = 0; + /* Allocate array with room for as many entries as expected */ + check = (ConstrCheck *) + MemoryContextAllocZero(CacheMemoryContext, + ncheck * sizeof(ConstrCheck)); + + /* Search pg_constraint for relevant entries */ ScanKeyInit(&skey[0], Anum_pg_constraint_conrelid, BTEqualStrategyNumber, F_OIDEQ, @@ -4347,15 +4357,18 @@ CheckConstraintFetch(Relation relation) Form_pg_constraint conform = (Form_pg_constraint) GETSTRUCT(htup); Datum val; bool isnull; - char *s; /* We want check constraints only */ if (conform->contype != CONSTRAINT_CHECK) continue; + /* protect limited size of array */ if (found >= ncheck) - elog(ERROR, "unexpected constraint record found for rel %s", + { + elog(WARNING, "unexpected pg_constraint record found for relation \"%s\"", RelationGetRelationName(relation)); + break; + } check[found].ccvalid = conform->convalidated; check[found].ccnoinherit = conform->connoinherit; @@ -4367,27 +4380,36 @@ CheckConstraintFetch(Relation relation) Anum_pg_constraint_conbin, conrel->rd_att, &isnull); if (isnull) - elog(ERROR, "null conbin for rel %s", + elog(WARNING, "null conbin for relation \"%s\"", RelationGetRelationName(relation)); + else + { + /* detoast and convert to cstring in caller's context */ + char *s = TextDatumGetCString(val); - /* detoast and convert to cstring in caller's context */ - s = TextDatumGetCString(val); - check[found].ccbin = MemoryContextStrdup(CacheMemoryContext, s); - pfree(s); - - found++; + check[found].ccbin = MemoryContextStrdup(CacheMemoryContext, s); + pfree(s); + found++; + } } systable_endscan(conscan); table_close(conrel, AccessShareLock); if (found != ncheck) - elog(ERROR, "%d constraint record(s) missing for rel %s", + elog(WARNING, "%d pg_constraint record(s) missing for relation \"%s\"", ncheck - found, RelationGetRelationName(relation)); - /* Sort the records so that CHECKs are applied in a deterministic order */ - if (ncheck > 1) - qsort(check, ncheck, sizeof(ConstrCheck), CheckConstraintCmp); + /* + * Sort the records by name. This ensures that CHECKs are applied in a + * deterministic order, and it also makes equalTupleDescs() faster. + */ + if (found > 1) + qsort(check, found, sizeof(ConstrCheck), CheckConstraintCmp); + + /* Install array only after it's fully valid */ + relation->rd_att->constr->check = check; + relation->rd_att->constr->num_check = found; } /*
Zhihong Yu <zyu@yugabyte.com> writes: >> if (found != ncheck) > Since there is check on found being smaller than ncheck inside the loop, > the if condition can be written as: > if (found < ncheck) Doesn't really seem like an improvement? Nor am I excited about renaming these "found" variables, considering that those names can be traced back more than twenty years. > + if (found != ndef) > + elog(WARNING, "%d attrdef record(s) missing for rel %s", > + ndef - found, RelationGetRelationName(relation)); > Since only warning is logged, there seems to be some wasted space in > attrdef. Would such space accumulate, resulting in some memory leak ? No, it's just one palloc chunk either way. I do not think there is any value in adding more code to reclaim the unused array slots a bit sooner. (Because of aset.c's power-of-two allocation practices, it's likely there would be zero actual space savings anyway.) regards, tom lane
>>>>> "Andrew" == Andrew Dunstan <andrew@dunslane.net> writes: Andrew> I'd be curious to know how this state came about. Me too, but available information is fairly sparse: PG 12.5, in a container, backing a (personal) instance of Gitlab; the only database admin operations should have been those done by Gitlab itself, but I have not audited that codebase. No information on any history of crashes. The missing pg_attrdef row appeared to be missing or not visible in the heap, not just missing from indexes; it did not show up in queries whether seqscan or indexscan was used. Available time did not permit trying to use pageinspect on pg_attrdef. This gitlab ticket refers to the same incident: https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/6047 (which actually contains a new relevant fact that hadn't been mentioned in the IRC discussion, which is that the problem affected multiple tables, not just one.) -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > This gitlab ticket refers to the same incident: > https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/6047 > (which actually contains a new relevant fact that hadn't been mentioned > in the IRC discussion, which is that the problem affected multiple > tables, not just one.) Hmm. If you assume that the problem is either a corrupted index on pg_attrdef, or some block(s) of pg_attrdef got zeroed out, then it would be entirely unsurprising for multiple tables to be affected. I've pushed the v2 patch to HEAD. Not sure if we want to consider back-patching. regards, tom lane