Re: Document atthasmissing default optimization avoids verification table scan - Mailing list pgsql-hackers
From | James Coleman |
---|---|
Subject | Re: Document atthasmissing default optimization avoids verification table scan |
Date | |
Msg-id | CAAaqYe9dNfYUObu4aZ=zoDsWmvtcJDFngYZ52NndysT=fM8uvA@mail.gmail.com Whole thread Raw |
In response to | Re: Document atthasmissing default optimization avoids verification table scan ("David G. Johnston" <david.g.johnston@gmail.com>) |
Responses |
Re: Document atthasmissing default optimization avoids verification table scan
|
List | pgsql-hackers |
On Wed, Jan 19, 2022 at 7:51 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Wed, Jan 19, 2022 at 5:08 PM Bossart, Nathan <bossartn@amazon.com> wrote: >> >> On 9/24/21, 7:30 AM, "James Coleman" <jtc331@gmail.com> wrote: >> > When PG11 added the ability for ALTER TABLE ADD COLUMN to set a constant >> > default value without rewriting the table the doc changes did not note >> > how the new feature interplayed with ADD COLUMN DEFAULT NOT NULL. >> > Previously such a new column required a verification table scan to >> > ensure no values were null. That scan happens under an exclusive lock on >> > the table, so it can have a meaningful impact on database "accessible >> > uptime". >> >> I'm likely misunderstanding, but are you saying that adding a new >> column with a default value and a NOT NULL constraint used to require >> a verification scan? > > > As a side-effect of rewriting every live record in the table and indexes to brand new files, yes. I doubt an actual independentscan was performed since the only way for the newly written tuples to not have the default value inserted wouldbe a severe server bug. I've confirmed it wasn't a separate scan, but it does evaluate all constraints (it doesn't have any optimizations for skipping ones probably true by virtue of the new default). >> >> + Additionally adding a column with a constant default value avoids a >> + a table scan to verify no <literal>NULL</literal> values are present. >> >> Should this clarify that it's referring to NOT NULL constraints? >> > > This doesn't seem like relevant material to comment on. It's an implementation detail that is sufficiently covered by"making the ALTER TABLE very fast even on large tables". > > Also, the idea of performing that scan seems ludicrous. I just added the column and told it to populate with default values- why do you need to check that your server didn't miss any? I'm open to the idea of wordsmithing here, of course, but I strongly disagree that this is irrelevant data. There are plenty of optimizations Postgres could theoretically implement but doesn't, so measuring what should happen by what you think is obvious ("told it to populate with default values - why do you need to check") is clearly not valid. This patch actually came out of our specifically needing to verify that this is true before an op precisely because docs aren't actually clear and because we can't risk a large table scan under an exclusive lock. We're clearly not the only ones with that question; it came up in a comment on this blog post announcing the newly committed feature [1]. I realize that most users aren't as worried about this kind of specific detail about DDL as we are (requiring absolutely zero slow DDL while under an exclusive lock), but it is relevant to high uptime systems. Thanks, James Coleman 1: https://www.depesz.com/2018/04/04/waiting-for-postgresql-11-fast-alter-table-add-column-with-a-non-null-default/
pgsql-hackers by date: