Thread: Tip box on Adding a Column
Hi folks, We have this Tip box under the "Adding a Column" header here: - https://www.postgresql.org/docs/current/ddl-alter.html#DDL-ALTER-ADDING-A-COLUMN That says: > From PostgreSQL 11, adding a column with a constant default value no longer > means that each row of the table needs to be updated when the ALTER TABLE > statement is executed. Instead, the default value will be returned the next > time the row is accessed, and applied when the table is rewritten, making the > ALTER TABLE very fast even on large tables. I'm just seeking clarification if this advice is true **even for** new columns declared with NOT NULL? Historically, I've had to add new fields on existing big tables with a NULL to avoid downtime, but it may be different when a DEFAULT is provided? I have used perf to profile the call-chain for adding a NOT NULL column with a default versus just an ordinary NULL with a default, and they are fairly similar. However, I see these functions being called in both cases: - ATRewriteTables find_composite_type_dependencies systable_beginscan index_rescan btrescan And the names raised my eyebrow... I don't have a deep understanding of the internals here, so it would be great if someone could clarify this for me. Thanks, Marcelo.
On Fri, 1 Nov 2024 at 22:06, Marcelo Fernandes <marcefern7@gmail.com> wrote: > - https://www.postgresql.org/docs/current/ddl-alter.html#DDL-ALTER-ADDING-A-COLUMN > > That says: > > > From PostgreSQL 11, adding a column with a constant default value no longer > > means that each row of the table needs to be updated when the ALTER TABLE > > statement is executed. Instead, the default value will be returned the next > > time the row is accessed, and applied when the table is rewritten, making the > > ALTER TABLE very fast even on large tables. > > I'm just seeking clarification if this advice is true **even for** new columns > declared with NOT NULL? Yes, providing the default expression is immutable. > Historically, I've had to add new fields on existing big tables with a NULL to > avoid downtime, but it may be different when a DEFAULT is provided? We have the ability to store immutable defaults in the catalogue tables and "fill in the blanks" from there for any tuples that don't have the new column. Since we only can store 1 value per column, it must be a constant, i.e. the same for all rows. > I have used perf to profile the call-chain for adding a NOT NULL column with > a default versus just an ordinary NULL with a default, and they are fairly > similar. > > However, I see these functions being called in both cases: > > - ATRewriteTables > find_composite_type_dependencies > systable_beginscan > index_rescan > btrescan > > And the names raised my eyebrow... I don't have a deep understanding of the > internals here, so it would be great if someone could clarify this for me. I think you're better off staying in userland here and just doing some experiments and using the performance numbers to help give you an idea of what's going on. Using psql: \timing on create table t1 as select a from generate_Series(1,1000000)a; alter table t1 add column b int null; -- Time: 1.721 ms alter table t1 add column c int not null default 1234; -- Time: 5.450 ms alter table t1 add column d int not null default random(1,10); -- Time: 436.735 ms!! only adding column d required a rewrite, that's because the default expression isn't immutable. If timing isn't confirmation enough, try doing the above with: SET client_min_messages=debug1; Only adding column "d" gives you: DEBUG: rewriting table "t1" David
On 2024-11-01 09:25:24, David Rowley <dgrowleyml(at)gmail(dot)com> wrote: > We have the ability to store immutable defaults in the catalogue tables and > "fill in the blanks" from there for any tuples that don't have the new > column. Since we only can store 1 value per column, it must be a constant, > i.e. the same for all rows. Thanks for clarifying that David. Profiling can be deceiving if you don't have experience in the code base. > If timing isn't confirmation enough, try doing the above with: SET > client_min_messages=debug1; > > Only adding column "d" gives you: DEBUG: rewriting table "t1" Oh wow, that's a very useful setting to have, thanks for pointing that out too. Best, Marcelo
On Fri, Nov 1, 2024 at 5:06 PM Marcelo Fernandes <marcefern7@gmail.com> wrote: > > Hi folks, > > We have this Tip box under the "Adding a Column" header here: > > - https://www.postgresql.org/docs/current/ddl-alter.html#DDL-ALTER-ADDING-A-COLUMN > > That says: > > > From PostgreSQL 11, adding a column with a constant default value no longer > > means that each row of the table needs to be updated when the ALTER TABLE > > statement is executed. Instead, the default value will be returned the next > > time the row is accessed, and applied when the table is rewritten, making the > > ALTER TABLE very fast even on large tables. > > I'm just seeking clarification if this advice is true **even for** new columns > declared with NOT NULL? > you can use event_trigger to test it. https://www.postgresql.org/docs/current/event-trigger-definition.html The table_rewrite event occurs just before a table is rewritten by some actions of the commands ALTER TABLE and ALTER TYPE. While other control statements are available to rewrite a table, like CLUSTER and VACUUM, the table_rewrite event is not triggered by them. --- following tests copy from https://github.com/postgres/postgres/blob/d893a299ce68f56522073a1b43d65764a552ae0d/src/test/regress/sql/fast_default.sql#L47 CREATE OR REPLACE FUNCTION log_rewrite() RETURNS event_trigger LANGUAGE plpgsql as $func$ declare this_schema text; begin RAISE NOTICE 'rewriting table % for reason %', pg_event_trigger_table_rewrite_oid()::regclass, pg_event_trigger_table_rewrite_reason(); end; $func$; CREATE EVENT TRIGGER has_rewrite ON table_rewrite EXECUTE PROCEDURE log_rewrite(); create table t1(a int); insert into t1 select 1; alter table t1 add column b double precision not null default 11; alter table t1 add column c double precision not null default random(); alter table t1 add column d timestamptz not null default now(); alter table t1 add column e timestamptz not null default current_timestamp; alter table t1 ADD column f int DEFAULT (random() * 10000)::int; alter table t1 ADD column g int not null DEFAULT (random() * 10000)::int;