Thread: Add column with default value in big table - splitting of updates can help?

Add column with default value in big table - splitting of updates can help?

From
Durumdara
Date:
Dear Members!

I've read this article, but I need your experience in this theme.


alter table tk
  add colum field1 default 'MUCH';

The table tk have 200 million rows. The autovacuum is no problem, only the long update.

But as I read the alter makes table lock, so this update locks the table for long time.

The article said we need to do this:

1. add column without default - fast.
2. set default on column.
3. update it where is null.

What we can save with this method?

As I suppose the table lock substituted with long update (row locks on whole table).

The article suggested to replace long update to shorter sequences (10000-100000 records by cycle).

We used to execute these SQL-s (alter, default, update) in one transaction.
So I can't make commit there.

What is the difference between "full update" and "updates by 10000 records" when I'm in a single transaction?

Is it possible that this pseudo code makes less locks WITHOUT INNER COMMIT?

Pseudo:
----------------------
begin 
  while True loop
     update tk set field1 = ' MUCH' when field1 is NULL and id in
        (select id from tk where field1 is NULL limit 100000);
     if not FOUND then
        break;
  end while;
end   
  ---------------------- 

Is there any advance when I split updates? I'm in one transaction.

TR. START
----
1. alter add col 
2. set default
3. updates
---
TR. COMMIT

Or it isn't help me?

Because the whole transaction locks the other users also, just like "alter add colum wit hdefault statement"?

Thank you for your and help!

Best regards
   dd

Re: Add column with default value in big table - splitting of updatescan help?

From
Adrian Klaver
Date:
On 1/30/20 7:51 AM, Durumdara wrote:
> Dear Members!
> 
> I've read this article, but I need your experience in this theme.
> 
> https://leopard.in.ua/2016/09/20/safe-and-unsafe-operations-postgresql#.XjL3fcqYXDc
> 
> alter table tk
>    add colum field1 default 'MUCH';
> 
> The table tk have 200 million rows. The autovacuum is no problem, only 
> the long update.
> 
> But as I read the alter makes table lock, so this update locks the table 
> for long time.

What version of Postgres are you using?

I ask because:

https://www.postgresql.org/docs/11/release-11.html

"Many other useful performance improvements, including the ability to 
avoid a table rewrite for ALTER TABLE ... ADD COLUMN with a non-null 
column default"

> 
> The article said we need to do this:
> 
> 1. add column without default - fast.
> 2. set default on column.
> 3. update it where is null.
> 
> What we can save with this method?
> 
> As I suppose the table lock substituted with long update (row locks on 
> whole table).
> 
> The article suggested to replace long update to shorter sequences 
> (10000-100000 records by cycle).
> 
> We used to execute these SQL-s (alter, default, update) in one transaction.
> So I can't make commit there.
> 
> What is the difference between "full update" and "updates by 10000 
> records" when I'm in a single transaction?
> 
> Is it possible that this pseudo code makes less locks WITHOUT INNER COMMIT?
> 
> Pseudo:
> ----------------------
> begin
>    while True loop
>       update tk set field1 = ' MUCH' when field1 is NULL and id in
>          (select id from tk where field1 is NULL limit 100000);
>       if not FOUND then
>          break;
>    end while;
> end
>    ----------------------
> 
> Is there any advance when I split updates? I'm in one transaction.
> 
> TR. START
> ----
> 1. alter add col
> 2. set default
> 3. updates
> ---
> TR. COMMIT
> 
> Or it isn't help me?
> 
> Because the whole transaction locks the other users also, just like 
> "alter add colum wit hdefault statement"?
> 
> Thank you for your and help!
> 
> Best regards
>     dd


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Add column with default value in big table - splitting ofupdates can help?

From
hubert depesz lubaczewski
Date:
On Thu, Jan 30, 2020 at 04:51:02PM +0100, Durumdara wrote:
> Is there any advance when I split updates? I'm in one transaction.

In this case - no. The benefit of split updates is when you're not in
single transaction.

Why would you want to have it all done in single transaction?

Best regards,

depesz




Hello!

a.)
PG version is mainly 9.6, but some of the servers are 10.x or 11.x.

b.) 
We have semi-automatic tool which get the a new modifications on databases, and execute them at once by database.
So one SQL script by one database, under one transaction - whole or nothing. If something failed, we know where to we start again by hand. It is impossible to execute only the first half, and we don't know which one executed or not.

The main problem that sometimes we have to modify some tables which have too much records in some customer databases.

---

As I think the best way to solve this:

1.) 
Future goal: upgrade to PG 11.

2.)
We need to write a preprocessor code. If it finds "alter table" with "add column" and "default", and it is "critical database", the whole operation halts on that point, and warn us to "do it yourself"... :-)

---

After the previous same problem on tk table I tried to write a client app, which update records by 10000 with commit - but it was very slow.

   update tk set field1 = 'MUCH' where id in (
      select id from tk where field1 is NULL limit 10000
   )

I think this caused that in the goal field haven't got index (because many times the fields with default values are not searchable, just row level fields), and the client/server communication is slower than I like.

Formerly I thought I could speeding up this with stored proc - but as I read the stored procs can't use inner transactions - so I must make client programme to use begin/commit... (PGAdmin?).

Thanks for the prior infos!

Best regards
   dd


hubert depesz lubaczewski <depesz@depesz.com> ezt írta (időpont: 2020. jan. 30., Cs, 17:20):
On Thu, Jan 30, 2020 at 04:51:02PM +0100, Durumdara wrote:
> Is there any advance when I split updates? I'm in one transaction.

In this case - no. The benefit of split updates is when you're not in
single transaction.

Why would you want to have it all done in single transaction?

Best regards,

depesz

Re: Add column with default value in big table - splitting of updatescan help?

From
Adrian Klaver
Date:
On 1/31/20 5:43 AM, Durumdara wrote:
> Hello!
> 
> a.)
> PG version is mainly 9.6, but some of the servers are 10.x or 11.x.
> 
> b.)
> We have semi-automatic tool which get the a new modifications on 
> databases, and execute them at once by database.
> So one SQL script by one database, under one transaction - whole or 
> nothing. If something failed, we know where to we start again by hand. 
> It is impossible to execute only the first half, and we don't know which 
> one executed or not
> 
> The main problem that sometimes we have to modify some tables which have 
> too much records in some customer databases.
> 
> ---
> 
> As I think the best way to solve this:
> 
> 1.)
> Future goal: upgrade to PG 11.
> 
> 2.)
> We need to write a preprocessor code. If it finds "alter table" with 
> "add column" and "default", and it is "critical database", the whole 
> operation halts on that point, and warn us to "do it yourself"... :-)
> 
> ---
> 
> After the previous same problem on tk table I tried to write a client 
> app, which update records by 10000 with commit - but it was very slow.
> 
>     update tk set field1 = 'MUCH' where id in (
>        select id from tk where field1 is NULL limit 10000
>     )
> 
> I think this caused that in the goal field haven't got index (because 
> many times the fields with default values are not searchable, just row 
> level fields), and the client/server communication is slower than I like.
> 
> Formerly I thought I could speeding up this with stored proc - but as I 
> read the stored procs can't use inner transactions - so I must make 
> client programme to use begin/commit... (PGAdmin?).


Up to version 11 Postgres only had stored functions. With 11+ there are 
stored procedures and inner transactions:

https://www.postgresql.org/docs/11/plpgsql-transactions.html

In stored functions the best you can do is:

https://www.postgresql.org/docs/11/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING

> 
> Thanks for the prior infos!
> 
> Best regards
>     dd
> 
> 
> hubert depesz lubaczewski <depesz@depesz.com <mailto:depesz@depesz.com>> 
> ezt írta (időpont: 2020. jan. 30., Cs, 17:20):
> 
>     On Thu, Jan 30, 2020 at 04:51:02PM +0100, Durumdara wrote:
>      > Is there any advance when I split updates? I'm in one transaction.
> 
>     In this case - no. The benefit of split updates is when you're not in
>     single transaction.
> 
>     Why would you want to have it all done in single transaction?
> 
>     Best regards,
> 
>     depesz
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



RE: Add column with default value in big table - splitting of updatescan help?

From
Kevin Brannen
Date:
>From: Durumdara <durumdara@gmail.com>
>
>a.)
>PG version is mainly 9.6, but some of the servers are 10.x or 11.x.
>b.)
>We have semi-automatic tool which get the a new modifications on databases, and execute them at once by database.
>So one SQL script by one database, under one transaction - whole or nothing. If something failed, we know where to we
startagain by hand. It is impossible to execute only the first half, and we don't know which one executed or not.
 

Unless you have some special requirement, you don't have to do it all or
nothing. As Despez points out, you can do it in multiple transactions just
fine. We do it that way here all the time. :)

What it means is that you have to write guards or checks in your upgrade script.
In some instances, it's very easy because some statements have IF NOT EXISTS to
help you. For those places where doing the same statement twice would cause an
error, then put a check around it. A plpgsql DO block allows for conditions,
then only if the condition check fails, you do the work (the information_schema and
pg_catalog tables are your friend for this).

The point is that you can run your upgrade script as many times as needed, should
something happen and it stops. Rerunning the upgrade script should never
cause an error if you've coded it correctly.

>
>The main problem that sometimes we have to modify some tables which have too much records in some customer databases.

I'm going to go with everyone else here because it works. We tend to do updates
in blocks of 10K records at a time. Do some tests and figure out what works best
for your setup (maybe you have enough memory to do 100K chunks). Whatever you
do, make sure that the column you use to divide the work has an index on it! Use
the primary key if it's an INT. If you have no index, find a column you can create
an index on for this work then drop it at the end; that will be far faster than having
to do a bunch of table scans.

Yes, it's probably a very good idea to upgrade to a newer version if you can
as performance improvements come with each new version.

HTH,
Kevin
This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential
information.If you are not the intended recipient, or a person responsible for delivering it to the intended recipient,
youare hereby notified that any disclosure, distribution, review, copy or use of any of the information contained in or
attachedto this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately
notifyus by reply e-mail, and destroy the original transmission and its attachments without reading them or saving them
todisk. Thank you.