Re: Document atthasmissing default optimization avoids verification table scan - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: Document atthasmissing default optimization avoids verification table scan
Date
Msg-id CAKFQuwZ7OzjvtEx_iBm1xhuzi5CzDOH9L=EoiZusbqtV7xcbKQ@mail.gmail.com
Whole thread Raw
In response to Re: Document atthasmissing default optimization avoids verification table scan  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Document atthasmissing default optimization avoids verification table scan
List pgsql-hackers
On Fri, Jan 21, 2022 at 2:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Fri, Jan 21, 2022 at 2:08 PM Andrew Dunstan <andrew@dunslane.net> wrote:
>> I know what it's replacing refers to release 11, but let's stop doing
>> that. How about something like this?
>>
>> Adding a new column can sometimes require rewriting the table,
>> making it a very slow operation. However in many cases this rewrite
>> and related verification scans can be optimized away by using an
>> appropriate default value. See the notes in <command>ALTER
>> TABLE</command> for details.

> I think it is a virtue, and am supported in that feeling by the existing
> wording, to be explicit about the release before which these optimizations
> can not happen.  The docs generally use this to good effect without
> overdoing it.  This is a prime example.

The fact of the matter is that optimizations of this sort have existed
for years.  (For example, I think we've optimized away the rewrite
when the new column is DEFAULT NULL since the very beginning.)  So it
does not help to write the text as if there were no such optimizations
before version N and they were all there in N.

Fair point, and indeed the v10 docs do mention the NULL (or no default) optimization.


I agree that Andrew's text could stand a pass of "omit needless words".
But I also think that we could be a bit more explicit about what "slow"
means.  Maybe like

Adding a new column can require rewriting the whole table,
making it slow for large tables.  However the rewrite can be optimized
away in some cases, depending on what default value is given to the
column.  See <command>ALTER TABLE</command> for details.


Comma needed after however.
You've removed the "constraint verification scan" portion of this. Maybe:
"""
...
column.  The same applies for the NOT NULL constraint verification scan.
See <command>ALTER TABLE</command> for details.
"""


Re-reading this, the recommendation:

-     However, if the default value is volatile (e.g.,
-     <function>clock_timestamp()</function>)
-     each row will need to be updated with the value calculated at the time
-     <command>ALTER TABLE</command> is executed. To avoid a potentially
-     lengthy update operation, particularly if you intend to fill the column
-     with mostly nondefault values anyway, it may be preferable to add the
-     column with no default, insert the correct values using
-     <command>UPDATE</command>, and then add any desired default as described
-     below.

has now been completely removed from the documentation.  I suggest having this remain as the Tip and turning the optimization stuff into a Note.
 
(the ALTER TABLE reference should be a link, too)

Yeah, the page does have a link already (fairly close by...) but with these changes putting one here seems to make sense.

David J.

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: fairywren is generating bogus BASE_BACKUP commands
Next
From: Andrew Dunstan
Date:
Subject: Re: fairywren is generating bogus BASE_BACKUP commands