Re: Fast AT ADD COLUMN with DEFAULTs - Mailing list pgsql-hackers

From Vitaly Burovoy
Subject Re: Fast AT ADD COLUMN with DEFAULTs
Date
Msg-id CAKOSWNnWBu-Q3Xh38pMmvESB4KN2=uj1gaFav2EjzTFL=i9Krw@mail.gmail.com
Whole thread Raw
In response to Fast AT ADD COLUMN with DEFAULTs  (Serge Rielau <serge@rielau.com>)
Responses Re: Fast AT ADD COLUMN with DEFAULTs  (Serge Rielau <serge@rielau.com>)
List pgsql-hackers
On 10/5/16, Serge Rielau <serge@rielau.com> wrote:
> Dear Hackers,
>
> I’m working on a patch that expands PG’s ability to add columns to a table
> without a table rewrite (i.e. at O(1) cost) from the
> nullable-without-default to a more general case. E.g.
>
> CREATE TABLE T(pk INT NOT NULL PRIMARY KEY);
> INSERT INTO T VALEUS (1), (2), (3);
> ALTER TABLE T ADD COLUMN c1 INTEGER NOT NULL DEFAULT 5;
> INSERT INTO T VALUES (4, DEFAULT);
> ALTER TABLE T ALTER COLUMN SET DEFAULT 6;
> INSERT INTO T VALUS (5, DEFAULT);
> SELECT * FROM T ORDER BY pk;
> =>
> (1, 5),
> (2, 5),
> (3, 5),
> (4, 5),
> (5, 6);
>
> Rows 1-3 have never been updated, yet they know that their values of c1 is
> 5.
>
> The requirement is driven by large tables for which add column takes too
> much time and/or produces too large a transaction for comfort.
>
> In simplified terms:
>
> * a second “exist default” is computed and stored in
> the catalogs at time of AT ADD COLUMN
>
> * The exist default is cached in the tuple descriptor (e.g in attrdef)
>
> * When one of the getAttr or copytuple related routines is invoked
> the exist default is filled in instead of simply NULL padding if the
> tuple is shorter the requested attribute number.
>
> Is there an interest in principle in the community for this functionality?

Wow! I think it would be great! It also solves huge vacuuming after
rewriting the table(s).
Just pay attention to corner cases like indexes, statistics and speed.

But I'd like to see solution for more important cases like:
CREATE TABLE t (pk INT NOT NULL PRIMARY KEY);
INSERT INTO t VALUES (1), (2), (3);
ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL DEFAULT 'now';
SELECT * FROM t ORDER BY pk;
ALTER TABLE t ADD COLUMN c2 serial;
SELECT * FROM t ORDER BY pk;
INSERT INTO t(pk) VALUES (4);
SELECT * FROM t ORDER BY pk;

P.S.: I really think it is a good idea, just some research is
necessary and covering corner cases...

--
Best regards,
Vitaly Burovoy



pgsql-hackers by date:

Previous
From: Vladimir Gordiychuk
Date:
Subject: Re: Stopping logical replication protocol
Next
From: Tom Lane
Date:
Subject: Switch to unnamed POSIX semaphores as our preferred sema code?