Re: Document atthasmissing default optimization avoids verification table scan - Mailing list pgsql-hackers

From James Coleman
Subject Re: Document atthasmissing default optimization avoids verification table scan
Date
Msg-id CAAaqYe9dNfYUObu4aZ=zoDsWmvtcJDFngYZ52NndysT=fM8uvA@mail.gmail.com
Whole thread Raw
In response to Re: Document atthasmissing default optimization avoids verification table scan  ("David G. Johnston" <david.g.johnston@gmail.com>)
Responses Re: Document atthasmissing default optimization avoids verification table scan
List pgsql-hackers
On Wed, Jan 19, 2022 at 7:51 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Wed, Jan 19, 2022 at 5:08 PM Bossart, Nathan <bossartn@amazon.com> wrote:
>>
>> On 9/24/21, 7:30 AM, "James Coleman" <jtc331@gmail.com> wrote:
>> > When PG11 added the ability for ALTER TABLE ADD COLUMN to set a constant
>> > default value without rewriting the table the doc changes did not note
>> > how the new feature interplayed with ADD COLUMN DEFAULT NOT NULL.
>> > Previously such a new column required a verification table scan to
>> > ensure no values were null. That scan happens under an exclusive lock on
>> > the table, so it can have a meaningful impact on database "accessible
>> > uptime".
>>
>> I'm likely misunderstanding, but are you saying that adding a new
>> column with a default value and a NOT NULL constraint used to require
>> a verification scan?
>
>
> As a side-effect of rewriting every live record in the table and indexes to brand new files, yes.  I doubt an actual
independentscan was performed since the only way for the newly written tuples to not have the default value inserted
wouldbe a severe server bug. 

I've confirmed it wasn't a separate scan, but it does evaluate all
constraints (it doesn't have any optimizations for skipping ones
probably true by virtue of the new default).

>>
>> +     Additionally adding a column with a constant default value avoids a
>> +     a table scan to verify no <literal>NULL</literal> values are present.
>>
>> Should this clarify that it's referring to NOT NULL constraints?
>>
>
> This doesn't seem like relevant material to comment on.  It's an implementation detail that is sufficiently covered
by"making the ALTER TABLE very fast even on large tables". 
>
> Also, the idea of performing that scan seems ludicrous.  I just added the column and told it to populate with default
values- why do you need to check that your server didn't miss any? 

I'm open to the idea of wordsmithing here, of course, but I strongly
disagree that this is irrelevant data. There are plenty of
optimizations Postgres could theoretically implement but doesn't, so
measuring what should happen by what you think is obvious ("told it to
populate with default values - why do you need to check") is clearly
not valid.

This patch actually came out of our specifically needing to verify
that this is true before an op precisely because docs aren't actually
clear and because we can't risk a large table scan under an exclusive
lock. We're clearly not the only ones with that question; it came up
in a comment on this blog post announcing the newly committed feature
[1].

I realize that most users aren't as worried about this kind of
specific detail about DDL as we are (requiring absolutely zero slow
DDL while under an exclusive lock), but it is relevant to high uptime
systems.

Thanks,
James Coleman

1: https://www.depesz.com/2018/04/04/waiting-for-postgresql-11-fast-alter-table-add-column-with-a-non-null-default/



pgsql-hackers by date:

Previous
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: row filtering for logical replication
Next
From: Tom Lane
Date:
Subject: Re: Replace uses of deprecated Python module distutils.sysconfig