Thread: Updating a large table

Updating a large table

From
Timokhin Maxim
Date:
Hello! We have a large table 11GB ( about 37 million records ) and we need to alter a table - add a new column with
defaultvalues is false. Also 'NOT NULL' is required.
 
So, first I've done:

ALTER TABLE clusters ALTER COLUMN "is_paid";

after that:

UPDATE clusters SET is_paid = DEFAULT where ctime <= now() - interval '720h' AND is_paid != FALSE;

Everything went ok. Then I tried to run it again for an interval of 1 years. And I got that no one can't see - the was
noavailable space on a disk. The reason was WAL-files ate everything.
 
Master-server couldn't send some WAL-file to their replicas. Bandwidth wasn't enough.

Well, I'm searching for a better idea to update the table.
Solutions I found.
1. Separate my UPDATE by chunks.
2. Alter a table using a new temporary table, but it's not convenient for me because there is a lot of foreign keys and
indexes.
3. Hot-update. This is the most interesting case for me.
Speaking of HOT-update
https://www.dbrnd.com/2016/03/postgresql-the-awesome-table-fillfactor-to-speedup-update-and-select-statement/
The article says: it might be useful for tables that change often and moreover It would be the best way to increase the
speedof UPDATE.
 
So, my questions are will it work for all tuples? It says that - no
https://www.dbrnd.com/2016/03/postgresql-alter-table-to-change-fillfactor-value/,but I could not find a confirmation in
officialpostresql's documentation.
 
Why do I need to launch vacuum after updating?
How should I reduce the better fillfactor?
What will be with WAL-files it this case?
Thank you!



PostgreSQL 9.6

-- 
Timokhin 'maf' Maxim


Re: Updating a large table

From
salah jubeh
Date:
Hello,

Does the tale have foreign keys, if not you could create another table may be unlogged and then do the changes you want via INSERT ... SELECT; and finally convert the unlogged table to logged table. 

In addition to that there are several ways to increase data writing performance for example the following configuration settings have a impact on write performance: synchronous_commit, commit_delay, max_wal_size, wal_buffers and maintenance_work_mem. 

Regards 

On Friday, December 22, 2017, 6:59:43 PM GMT+1, Timokhin Maxim <ncx2@yandex.com> wrote:



Hello! We have a large table 11GB ( about 37 million records ) and we need to alter a table - add a new column with default values is false. Also 'NOT NULL' is required.
So, first I've done:

ALTER TABLE clusters ALTER COLUMN "is_paid";

after that:

UPDATE clusters SET is_paid = DEFAULT where ctime <= now() - interval '720h' AND is_paid != FALSE;

Everything went ok. Then I tried to run it again for an interval of 1 years. And I got that no one can't see - the was no available space on a disk. The reason was WAL-files ate everything.
Master-server couldn't send some WAL-file to their replicas. Bandwidth wasn't enough.

Well, I'm searching for a better idea to update the table.
Solutions I found.
1. Separate my UPDATE by chunks.
2. Alter a table using a new temporary table, but it's not convenient for me because there is a lot of foreign keys and indexes.
3. Hot-update. This is the most interesting case for me.
The article says: it might be useful for tables that change often and moreover It would be the best way to increase the speed of UPDATE.
So, my questions are will it work for all tuples? It says that - no https://www.dbrnd.com/2016/03/postgresql-alter-table-to-change-fillfactor-value/, but I could not find a confirmation in official postresql's documentation.
Why do I need to launch vacuum after updating?
How should I reduce the better fillfactor?
What will be with WAL-files it this case?
Thank you!



PostgreSQL 9.6

-- 
Timokhin 'maf' Maxim

Re: Updating a large table

From
Tomas Vondra
Date:
On 12/22/2017 05:46 PM, Timokhin Maxim wrote:
> 
> Hello! We have a large table 11GB ( about 37 million records ) and we
> need to alter a table - add a new column with default values is 
> false. Also 'NOT NULL' is required.
>
> So, first I've done:
>
> ALTER TABLE clusters ALTER COLUMN "is_paid";
> 

That seems somewhat incomplete ... what exactly did the ALTER do?

> after that:
> 
> UPDATE clusters SET is_paid = DEFAULT where ctime <= now() - interval '720h' AND is_paid != FALSE;
> 
> Everything went ok. Then I tried to run it again for an interval of 1
> years. And I got that no one can't see - the was no available space
> on a disk. The reason was WAL-files ate everything.
> Master-server couldn't send some WAL-file to their replicas. Bandwidth wasn't enough.
> 

Well, then perhaps the best solution is to add more disk space and/or
make sure the network bandwidth is sufficient?

In any case, don't forget this may also need to update all indexes on
the table, because the new row versions will end up on different pages.
So while the table has 11GB, this update may need much more WAL space
than that.

> Well, I'm searching for a better idea to update the table.
> Solutions I found.
> 1. Separate my UPDATE by chunks.

If this is a one-time change, this is probably the best option.

> 2. Alter a table using a new temporary table, but it's not convenient
> for me because there is a lot of foreign keys and indexes.
Right.

> 3. Hot-update. This is the most interesting case for me.
> Speaking of HOT-update
https://www.dbrnd.com/2016/03/postgresql-the-awesome-table-fillfactor-to-speedup-update-and-select-statement/
> The article says: it might be useful for tables that change often and moreover It would be the best way to increase
thespeed of UPDATE.
 

First of all, to make HOT possible there would have to be enough free
space on the pages. As you need to update the whole table, that means
each table would have to be only 50% full. That's unlikely to be true,
and you can't fix that at this point.

> So, my questions are will it work for all tuples? It says that - no 
> https://www.dbrnd.com/2016/03/postgresql-alter-table-to-change- 
> fillfactor-value/, but I could not find a confirmation in official 
> postresql's documentation.
Not sure I understand your question, but HOT can only happen when two
conditions are met:

1) the update does not change any indexed column

This is likely met, assuming you don't have an index on is_paid.

2) there's enough space on the same page for the new row version

This is unlikely to be true, because the default fillfactor for tables
is 90%. You may change fillfactor using ALTER TABLE, but that only
applies to new data.

Moreover, as the article says - this is useful for tables that change
often. Which is not quite what one-time table rewrite does.

So HOT is not the solution you're looking for.

> Why do I need to launch vacuum after updating?

You don't need to launch vacuum - autovacuum will take care of that
eventually. But you may do that, to do the cleanup when it's convenient
for you.

> How should I reduce the better fillfactor?

For example to change fillfactor to 75% (i.e. 25% free space):

ALTER TABLE t SET (fillfactor = 75);

But as I said, it's not a solution for you.

> What will be with WAL-files it this case?

Not sure what you mean.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Updating a large table

From
Timokhin Maxim
Date:
Hello Tomas! Thank you for the useful answer!
 

23.12.2017, 23:58, "Tomas Vondra" <tomas.vondra@2ndquadrant.com>:
> On 12/22/2017 05:46 PM, Timokhin Maxim wrote:
>>  Hello! We have a large table 11GB ( about 37 million records ) and we
>>  need to alter a table - add a new column with default values is
>>  false. Also 'NOT NULL' is required.
>>
>>  So, first I've done:
>>
>>  ALTER TABLE clusters ALTER COLUMN "is_paid";
>
> That seems somewhat incomplete ... what exactly did the ALTER do?

I'll try to explain what exactly I meant.
ALTER TABLE clusters ADD COLUMN "is_paid" BOOLEAN NOT NULL DEFAULT FALSE;
What exactly I need.
But that query would lock the whole table for about 40 minutes. I decided to separate it like:
1. ALTER TABLE clusters ADD COLUMN "is_paid" BOOLEAN DEFAULT FALSE;
2. UPDATE clusters SET is_paid = DEFAULT where ctime <= now() - interval '720h' AND is_paid != FALSE;  ( This was
neededas soon as possible )
 
3. UPDATE another part by chunks 
4. set NOT NULL for the table.

I was thinking about how to optimize the 3th step.
Well, my solution was to write a script which runs two threads. The first one UPDATE "is_paid" by chunks, another one
checksmy metrics. If something is becoming wrong first thread stops until metrics are good.
 

Thank you, Tomas.

>
>>  after that:
>>
>>  UPDATE clusters SET is_paid = DEFAULT where ctime <= now() - interval '720h' AND is_paid != FALSE;
>>
>>  Everything went ok. Then I tried to run it again for an interval of 1
>>  years. And I got that no one can't see - the was no available space
>>  on a disk. The reason was WAL-files ate everything.
>>  Master-server couldn't send some WAL-file to their replicas. Bandwidth wasn't enough.
>
> Well, then perhaps the best solution is to add more disk space and/or
> make sure the network bandwidth is sufficient?
>
> In any case, don't forget this may also need to update all indexes on
> the table, because the new row versions will end up on different pages.
> So while the table has 11GB, this update may need much more WAL space
> than that.
>
Got it, thank you!
>>  Well, I'm searching for a better idea to update the table.
>>  Solutions I found.
>>  1. Separate my UPDATE by chunks.
>
> If this is a one-time change, this is probably the best option.
>
Exactly, thank you!

>>  2. Alter a table using a new temporary table, but it's not convenient
>>  for me because there is a lot of foreign keys and indexes.
>
> Right.
>
>>  3. Hot-update. This is the most interesting case for me.
>>  Speaking of HOT-update
https://www.dbrnd.com/2016/03/postgresql-the-awesome-table-fillfactor-to-speedup-update-and-select-statement/
>>  The article says: it might be useful for tables that change often and moreover It would be the best way to increase
thespeed of UPDATE.
 
>
> First of all, to make HOT possible there would have to be enough free
> space on the pages. As you need to update the whole table, that means
> each table would have to be only 50% full. That's unlikely to be true,
> and you can't fix that at this point.
>
>>  So, my questions are will it work for all tuples? It says that - no
>>  https://www.dbrnd.com/2016/03/postgresql-alter-table-to-change-
>>  fillfactor-value/, but I could not find a confirmation in official
>>  postresql's documentation.
>
> Not sure I understand your question, but HOT can only happen when two
> conditions are met:
>
> 1) the update does not change any indexed column
>
> This is likely met, assuming you don't have an index on is_paid.
>
> 2) there's enough space on the same page for the new row version
>
> This is unlikely to be true, because the default fillfactor for tables
> is 90%. You may change fillfactor using ALTER TABLE, but that only
> applies to new data.
>
> Moreover, as the article says - this is useful for tables that change
> often. Which is not quite what one-time table rewrite does.
>
> So HOT is not the solution you're looking for.
>
>>  Why do I need to launch vacuum after updating?
>
> You don't need to launch vacuum - autovacuum will take care of that
> eventually. But you may do that, to do the cleanup when it's convenient
> for you.
>
>>  How should I reduce the better fillfactor?
>
> For example to change fillfactor to 75% (i.e. 25% free space):
>
> ALTER TABLE t SET (fillfactor = 75);
>
> But as I said, it's not a solution for you.
>
>>  What will be with WAL-files it this case?
>
> Not sure what you mean.
>
> regards
>
> --
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Updating a large table

From
Sergei Kornilov
Date:
Hello

> 1. ALTER TABLE clusters ADD COLUMN "is_paid" BOOLEAN DEFAULT FALSE;
this is wrong. To avoid large table lock you need DEFAULT NULL:
ALTER TABLE clusters ADD COLUMN "is_paid" BOOLEAN DEFAULT NULL;
Default null changes only system catalog, default with any non-null value will rewrite all rows. After adding column
youcan set default value - it applied only for future inserts:
 
ALTER TABLE clusters ALTER COLUMN "is_paid" SET DEFAULT FALSE;

And then you can update all old rows in table by small chunks. Finally, when here is no NULL values you can set not
null:
ALTER TABLE clusters ALTER COLUMN "is_paid" SET NOT NULL;
But unfortunately this locks table for some time - smaller what rewrite time, but time of full seqscan. I hope my patch
[1]will be merged and not null can be set in future by temporary adding check constraint (not valid, then validate) -
whichnot require large table lock
 

[1] https://www.postgresql.org/message-id/flat/81911511895540@web58j.yandex.ru#81911511895540@web58j.yandex.ru

Regards, Sergei


Re: Updating a large table

From
Timokhin Maxim
Date:
Hello, Sergey!

09.01.2018, 15:53, "Sergei Kornilov" <sk@zsrv.org>:
> Hello
>
>>  1. ALTER TABLE clusters ADD COLUMN "is_paid" BOOLEAN DEFAULT FALSE;
>
> this is wrong. To avoid large table lock you need DEFAULT NULL:
> ALTER TABLE clusters ADD COLUMN "is_paid" BOOLEAN DEFAULT NULL;
> Default null changes only system catalog, default with any non-null value will rewrite all rows. After adding column
youcan set default value - it applied only for future inserts:
 
> ALTER TABLE clusters ALTER COLUMN "is_paid" SET DEFAULT FALSE;
>
> And then you can update all old rows in table by small chunks. Finally, when here is no NULL values you can set not
null:
What you wrote are exactly I'm doing. Moreover, I'm checking current metrics to avoid previously problems.

> ALTER TABLE clusters ALTER COLUMN "is_paid" SET NOT NULL;
> But unfortunately this locks table for some time - smaller what rewrite time, but time of full seqscan. I hope my
patch[1] will be merged and not null can be set in future by temporary adding check constraint (not valid, then
validate)- which not require large table lock
 
Hope your commit will be merged. It will be realy useful.
>
> [1] https://www.postgresql.org/message-id/flat/81911511895540@web58j.yandex.ru#81911511895540@web58j.yandex.ru
>
> Regards, Sergei

-- 
Timokhin 'maf' Maxim