Thread: Adding a non-null column without noticeable downtime
Hi all, I'm sure this has been answered somewhere, but I was not able to find anything in the list archives. I'm conceptually trying to do ALTER TABLE "foo" ADD COLUMN "bar" boolean NOT NULL DEFAULT False; without taking any noticeable downtime. I know I can divide the query up like so: ALTER TABLE "foo" ADD COLUMN "bar" boolean; UPDATE foo SET bar = False; -- Done in batches ALTER TABLE "foo" ALTER COLUMN "bar" SET DEFAULT False; ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL; The first 3 queries shouldn't impact other concurrent queries on the system. My question is about the sequential scan that occurs when setting the column NOT NULL. Will that sequential scan block other inserts or selects on the table? If so, can it be sped up by using an index (which would be created concurrently)? Thanks, Zev
To be clear, this is with PostgreSQL 9.1. Also, if there is some other way of doing this, I'd be interested in other methodologies as well. Zev On 02/24/2014 10:41 PM, Zev Benjamin wrote: > Hi all, > > I'm sure this has been answered somewhere, but I was not able to find > anything in the list archives. > > I'm conceptually trying to do > ALTER TABLE "foo" ADD COLUMN "bar" boolean NOT NULL DEFAULT False; > > without taking any noticeable downtime. I know I can divide the query > up like so: > > ALTER TABLE "foo" ADD COLUMN "bar" boolean; > UPDATE foo SET bar = False; -- Done in batches > ALTER TABLE "foo" ALTER COLUMN "bar" SET DEFAULT False; > ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL; > > The first 3 queries shouldn't impact other concurrent queries on the > system. My question is about the sequential scan that occurs when > setting the column NOT NULL. Will that sequential scan block other > inserts or selects on the table? If so, can it be sped up by using an > index (which would be created concurrently)? > > > Thanks, > Zev > >
I think index should help.
Why don't you try it out and check the explain plan of it?
If you are planning to break it down as below:
1. ALTER TABLE "foo" ADD COLUMN "bar" boolean;
2. UPDATE foo SET bar = False; -- Done in batches
3. ALTER TABLE "foo" ALTER COLUMN "bar" SET DEFAULT False;
4. ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL;
2. UPDATE foo SET bar = False; -- Done in batches
3. ALTER TABLE "foo" ALTER COLUMN "bar" SET DEFAULT False;
4. ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL;
I would suggest on interchanging operation 2 and 3 in sequence
EXPLAIN does not appear to work on ALTER TABLE statements: => EXPLAIN ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL; ERROR: syntax error at or near "ALTER" LINE 1: explain ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL; ^ Zev On 02/25/2014 01:56 PM, Sameer Kumar wrote: > I think index should help. > Why don't you try it out and check the explain plan of it? > If you are planning to break it down as below: > 1. ALTER TABLE "foo" ADD COLUMN "bar" boolean; > 2. UPDATE foo SET bar = False; -- Done in batches > 3. ALTER TABLE "foo" ALTER COLUMN "bar" SET DEFAULT False; > 4. ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL; > > > I would suggest on interchanging operation 2 and 3 in sequence
On Mon, Feb 24, 2014 at 7:41 PM, Zev Benjamin <zev-pgsql@strangersgate.com> wrote: [...] > ALTER TABLE "foo" ADD COLUMN "bar" boolean; > UPDATE foo SET bar = False; -- Done in batches > ALTER TABLE "foo" ALTER COLUMN "bar" SET DEFAULT False; > ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL; You should set default before performing updates, otherwise new rows will be with nulls in this column. The template sniplet for your case is below. ALTER TABLE foo ADD bar boolean; ALTER TABLE foo ALTER bar SET DEFAULT false; CREATE INDEX CONCURRENTLY foo_migration_tmp ON foo (id) WHERE bar IS NULL; /* PSQL=/usr/local/bin/psql total_updated=0 updated=1 time ( while [ $updated -gt 0 ]; do updated=$(($PSQL -X Game2 <<EOF UPDATE foo SET bar = false WHERE id IN ( SELECT id FROM foo WHERE bar IS NULL LIMIT 100); EOF ) | cut -d ' ' -f 2) (( total_updated+=updated )) echo -ne "\r$total_updated" done ) 2>&1 */ DROP INDEX foo_migration_tmp; ANALYZE foo; ALTER TABLE foo ALTER bar SET NOT NULL; -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray.ru@gmail.com
On 02/25/2014 04:41 AM, Zev Benjamin wrote: > I'm conceptually trying to do > ALTER TABLE "foo" ADD COLUMN "bar" boolean NOT NULL DEFAULT False; > > without taking any noticeable downtime. I know I can divide the query > up like so: > > ALTER TABLE "foo" ADD COLUMN "bar" boolean; > UPDATE foo SET bar = False; -- Done in batches > ALTER TABLE "foo" ALTER COLUMN "bar" SET DEFAULT False; > ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL; You need to set the default before doing the update. Also, make sure the update is in its own transaction. > The first 3 queries shouldn't impact other concurrent queries on the > system. My question is about the sequential scan that occurs when > setting the column NOT NULL. Will that sequential scan block other > inserts or selects on the table? Yes, because ALTER TABLE will have taken an AccessExclusiveLock. > If so, can it be sped up by using an index (which would be created > concurrently)? Unfortunately not. -- Vik