Re: Updating a large table - Mailing list pgsql-performance

From Timokhin Maxim
Subject Re: Updating a large table
Date
Msg-id 4917721515500328@web47j.yandex.ru
Whole thread Raw
In response to Updating a large table  (Timokhin Maxim <ncx2@yandex.com>)
Responses Re: Updating a large table
List pgsql-performance
Hello Tomas! Thank you for the useful answer!
 

23.12.2017, 23:58, "Tomas Vondra" <tomas.vondra@2ndquadrant.com>:
> On 12/22/2017 05:46 PM, Timokhin Maxim wrote:
>>  Hello! We have a large table 11GB ( about 37 million records ) and we
>>  need to alter a table - add a new column with default values is
>>  false. Also 'NOT NULL' is required.
>>
>>  So, first I've done:
>>
>>  ALTER TABLE clusters ALTER COLUMN "is_paid";
>
> That seems somewhat incomplete ... what exactly did the ALTER do?

I'll try to explain what exactly I meant.
ALTER TABLE clusters ADD COLUMN "is_paid" BOOLEAN NOT NULL DEFAULT FALSE;
What exactly I need.
But that query would lock the whole table for about 40 minutes. I decided to separate it like:
1. ALTER TABLE clusters ADD COLUMN "is_paid" BOOLEAN DEFAULT FALSE;
2. UPDATE clusters SET is_paid = DEFAULT where ctime <= now() - interval '720h' AND is_paid != FALSE;  ( This was
neededas soon as possible )
 
3. UPDATE another part by chunks 
4. set NOT NULL for the table.

I was thinking about how to optimize the 3th step.
Well, my solution was to write a script which runs two threads. The first one UPDATE "is_paid" by chunks, another one
checksmy metrics. If something is becoming wrong first thread stops until metrics are good.
 

Thank you, Tomas.

>
>>  after that:
>>
>>  UPDATE clusters SET is_paid = DEFAULT where ctime <= now() - interval '720h' AND is_paid != FALSE;
>>
>>  Everything went ok. Then I tried to run it again for an interval of 1
>>  years. And I got that no one can't see - the was no available space
>>  on a disk. The reason was WAL-files ate everything.
>>  Master-server couldn't send some WAL-file to their replicas. Bandwidth wasn't enough.
>
> Well, then perhaps the best solution is to add more disk space and/or
> make sure the network bandwidth is sufficient?
>
> In any case, don't forget this may also need to update all indexes on
> the table, because the new row versions will end up on different pages.
> So while the table has 11GB, this update may need much more WAL space
> than that.
>
Got it, thank you!
>>  Well, I'm searching for a better idea to update the table.
>>  Solutions I found.
>>  1. Separate my UPDATE by chunks.
>
> If this is a one-time change, this is probably the best option.
>
Exactly, thank you!

>>  2. Alter a table using a new temporary table, but it's not convenient
>>  for me because there is a lot of foreign keys and indexes.
>
> Right.
>
>>  3. Hot-update. This is the most interesting case for me.
>>  Speaking of HOT-update
https://www.dbrnd.com/2016/03/postgresql-the-awesome-table-fillfactor-to-speedup-update-and-select-statement/
>>  The article says: it might be useful for tables that change often and moreover It would be the best way to increase
thespeed of UPDATE.
 
>
> First of all, to make HOT possible there would have to be enough free
> space on the pages. As you need to update the whole table, that means
> each table would have to be only 50% full. That's unlikely to be true,
> and you can't fix that at this point.
>
>>  So, my questions are will it work for all tuples? It says that - no
>>  https://www.dbrnd.com/2016/03/postgresql-alter-table-to-change-
>>  fillfactor-value/, but I could not find a confirmation in official
>>  postresql's documentation.
>
> Not sure I understand your question, but HOT can only happen when two
> conditions are met:
>
> 1) the update does not change any indexed column
>
> This is likely met, assuming you don't have an index on is_paid.
>
> 2) there's enough space on the same page for the new row version
>
> This is unlikely to be true, because the default fillfactor for tables
> is 90%. You may change fillfactor using ALTER TABLE, but that only
> applies to new data.
>
> Moreover, as the article says - this is useful for tables that change
> often. Which is not quite what one-time table rewrite does.
>
> So HOT is not the solution you're looking for.
>
>>  Why do I need to launch vacuum after updating?
>
> You don't need to launch vacuum - autovacuum will take care of that
> eventually. But you may do that, to do the cleanup when it's convenient
> for you.
>
>>  How should I reduce the better fillfactor?
>
> For example to change fillfactor to 75% (i.e. 25% free space):
>
> ALTER TABLE t SET (fillfactor = 75);
>
> But as I said, it's not a solution for you.
>
>>  What will be with WAL-files it this case?
>
> Not sure what you mean.
>
> regards
>
> --
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




pgsql-performance by date:

Previous
From: Rambabu V
Date:
Subject: Need Help on wal_compression
Next
From: Sergei Kornilov
Date:
Subject: Re: Updating a large table