Re: Fast AT ADD COLUMN with DEFAULTs - Mailing list pgsql-hackers

From Pantelis Theodosiou
Subject Re: Fast AT ADD COLUMN with DEFAULTs
Date
Msg-id CAE3TBxwZkVMQnuwXEAEJp0dQ08UqboKb_TtheX57Vb+uOW-U8g@mail.gmail.com
Whole thread Raw
In response to Re: Fast AT ADD COLUMN with DEFAULTs  (Serge Rielau <serge@rielau.com>)
List pgsql-hackers


On Thu, Oct 6, 2016 at 12:05 AM, Serge Rielau <serge@rielau.com> wrote:


On Wed, Oct 5, 2016 at 3:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@anarazel.de> writes:
> On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:
>>> No, "a second “exist default"" was mentioned, i.e. it is an additional
>>> column in a system table (pg_attribute) as default column values of
>>> the "pre-alter" era. It solves changing of the default expression of
>>> the same column later.

> Don't think that actually solves the issue. The default might be unset
> for a while, for example. Essentially you'd need to be able to associate
> arbitrary number of default values with an arbitrary set of rows.

I think it does work, as long as the "exists default" is immutable.
(For safety, personally, I'd restrict it to be a verbatim constant.)
The point is that you apply that when you are reading a row that has
so few columns that it must predate the original ALTER TABLE ADD COLUMN.
Subsequent ADD/DROP/SET DEFAULT affect the "active default" and hence
insertions that happen after them, but they don't affect the
interpretation of old rows. And of course all rows inserted after the
ADD COLUMN contain explicit values of the column, so their meaning is
unaffected in any case.

You do need two defaults associated with a column to make this work.
The "exists default" never changes after the column is added. But
in principle, the "exists default" just replaces the NULL value that
we implicitly insert now in such cases.
Explained so much better than I could do it :-)

I want to point out as a minor “extension” that there is no need for the default to be immutable. It is merely required that the default is evaluate at time of ADD COLUMN
and then we remember the actual value for the exist default, rather than the parsed expression as we do for the “current” default.
 
Need a better name for the concept, since evidently this name isn't
conveying the idea.
By all means. Got anything in mind?



For comparison, SQL Server's implementation. They have a similar feature (in their Enterprise only edition).
From https://msdn.microsoft.com/en-us/library/ms190273.aspx :

Adding NOT NULL Columns as an Online Operation

Starting with SQL Server 2012 Enterprise Edition, adding a NOT NULL column with a default value is an online operation when the default value is a runtime constant. This means that the operation is completed almost instantaneously regardless of the number of rows in the table. This is because the existing rows in the table are not updated during the operation; instead, the default value is stored only in the metadata of the table and the value is looked up as needed in queries that access these rows. This behavior is automatic; no additional syntax is required to implement the online operation beyond the ADD COLUMN syntax. A runtime constant is an expression that produces the same value at runtime for each row in the table regardless of its determinism. For example, the constant expression "My temporary data", or the system function GETUTCDATETIME() are runtime constants. In contrast, the functions NEWID() or NEWSEQUENTIALID() are not runtime constants because a unique value is produced for each row in the table. Adding a NOT NULL column with a default value that is not a runtime constant is always performed offline and an exclusive (SCH-M) lock is acquired for the duration of the operation.

While the existing rows reference the value stored in metadata, the default value is stored on the row for any new rows that are inserted and do not specify another value for the column. The default value stored in metadata is moved to an existing row when the row is updated (even if the actual column is not specified in the UPDATE statement), or if the table or clustered index is rebuilt. 


pgsql-hackers by date:

Previous
From: Serge Rielau
Date:
Subject: Re: Fast AT ADD COLUMN with DEFAULTs
Next
From: Vitaly Burovoy
Date:
Subject: Re: Fast AT ADD COLUMN with DEFAULTs