Thread: Adding a non-null column without noticeable downtime

Adding a non-null column without noticeable downtime

From
Zev Benjamin
Date:
Hi all,

I'm sure this has been answered somewhere, but I was not able to find
anything in the list archives.

I'm conceptually trying to do
ALTER TABLE "foo" ADD COLUMN "bar" boolean NOT NULL DEFAULT False;

without taking any noticeable downtime.  I know I can divide the query
up like so:

ALTER TABLE "foo" ADD COLUMN "bar" boolean;
UPDATE foo SET bar = False; -- Done in batches
ALTER TABLE "foo" ALTER COLUMN "bar" SET DEFAULT False;
ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL;

The first 3 queries shouldn't impact other concurrent queries on the
system.  My question is about the sequential scan that occurs when
setting the column NOT NULL.  Will that sequential scan block other
inserts or selects on the table?  If so, can it be sped up by using an
index (which would be created concurrently)?


Thanks,
Zev


Re: Adding a non-null column without noticeable downtime

From
Zev Benjamin
Date:
To be clear, this is with PostgreSQL 9.1.  Also, if there is some other
way of doing this, I'd be interested in other methodologies as well.


Zev

On 02/24/2014 10:41 PM, Zev Benjamin wrote:
> Hi all,
>
> I'm sure this has been answered somewhere, but I was not able to find
> anything in the list archives.
>
> I'm conceptually trying to do
> ALTER TABLE "foo" ADD COLUMN "bar" boolean NOT NULL DEFAULT False;
>
> without taking any noticeable downtime.  I know I can divide the query
> up like so:
>
> ALTER TABLE "foo" ADD COLUMN "bar" boolean;
> UPDATE foo SET bar = False; -- Done in batches
> ALTER TABLE "foo" ALTER COLUMN "bar" SET DEFAULT False;
> ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL;
>
> The first 3 queries shouldn't impact other concurrent queries on the
> system.  My question is about the sequential scan that occurs when
> setting the column NOT NULL.  Will that sequential scan block other
> inserts or selects on the table?  If so, can it be sped up by using an
> index (which would be created concurrently)?
>
>
> Thanks,
> Zev
>
>


Re: Adding a non-null column without noticeable downtime

From
Sameer Kumar
Date:
I think index should help.
Why don't you try it out and check the explain plan of it?
If you are planning to break it down as below:
1. ALTER TABLE "foo" ADD COLUMN "bar" boolean;
2. UPDATE foo SET bar = False; -- Done in batches
3. ALTER TABLE "foo" ALTER COLUMN "bar" SET DEFAULT False;
4. ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL;


I would suggest on interchanging operation 2 and 3 in sequence

Re: Adding a non-null column without noticeable downtime

From
Zev Benjamin
Date:
EXPLAIN does not appear to work on ALTER TABLE statements:

=> EXPLAIN ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL;
ERROR:  syntax error at or near "ALTER"
LINE 1: explain ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL;
                 ^


Zev

On 02/25/2014 01:56 PM, Sameer Kumar wrote:
> I think index should help.
> Why don't you try it out and check the explain plan of it?
> If you are planning to break it down as below:
> 1. ALTER TABLE "foo" ADD COLUMN "bar" boolean;
> 2. UPDATE foo SET bar = False; -- Done in batches
> 3. ALTER TABLE "foo" ALTER COLUMN "bar" SET DEFAULT False;
> 4. ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL;
>
>
> I would suggest on interchanging operation 2 and 3 in sequence


Re: Adding a non-null column without noticeable downtime

From
Sergey Konoplev
Date:
On Mon, Feb 24, 2014 at 7:41 PM, Zev Benjamin
<zev-pgsql@strangersgate.com> wrote:
[...]
> ALTER TABLE "foo" ADD COLUMN "bar" boolean;
> UPDATE foo SET bar = False; -- Done in batches
> ALTER TABLE "foo" ALTER COLUMN "bar" SET DEFAULT False;
> ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL;

You should set default before performing updates, otherwise new rows
will be with nulls in this column. The template sniplet for your case
is below.

ALTER TABLE foo ADD bar boolean;

ALTER TABLE foo ALTER bar SET DEFAULT false;

CREATE INDEX CONCURRENTLY foo_migration_tmp
ON foo (id) WHERE bar IS NULL;

/*

PSQL=/usr/local/bin/psql

total_updated=0
updated=1
time (
    while [ $updated -gt 0 ]; do
        updated=$(($PSQL -X Game2 <<EOF
UPDATE foo SET bar = false
WHERE id IN (
    SELECT id FROM foo
    WHERE bar IS NULL LIMIT 100);
EOF
        )  | cut -d ' ' -f 2)
        (( total_updated+=updated ))
        echo -ne "\r$total_updated"
    done
) 2>&1

*/

DROP INDEX foo_migration_tmp;

ANALYZE foo;

ALTER TABLE foo ALTER bar SET NOT NULL;

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com


Re: Adding a non-null column without noticeable downtime

From
Vik Fearing
Date:
On 02/25/2014 04:41 AM, Zev Benjamin wrote:
> I'm conceptually trying to do
> ALTER TABLE "foo" ADD COLUMN "bar" boolean NOT NULL DEFAULT False;
>
> without taking any noticeable downtime.  I know I can divide the query
> up like so:
>
> ALTER TABLE "foo" ADD COLUMN "bar" boolean;
> UPDATE foo SET bar = False; -- Done in batches
> ALTER TABLE "foo" ALTER COLUMN "bar" SET DEFAULT False;
> ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL;

You need to set the default before doing the update.  Also, make sure
the update is in its own transaction.

> The first 3 queries shouldn't impact other concurrent queries on the
> system.  My question is about the sequential scan that occurs when
> setting the column NOT NULL.  Will that sequential scan block other
> inserts or selects on the table?

Yes, because ALTER TABLE will have taken an AccessExclusiveLock.

> If so, can it be sped up by using an index (which would be created
> concurrently)?

Unfortunately not.

--
Vik