Re: Updating a large table - Mailing list pgsql-performance
From | Timokhin Maxim |
---|---|
Subject | Re: Updating a large table |
Date | |
Msg-id | 4917721515500328@web47j.yandex.ru Whole thread Raw |
In response to | Updating a large table (Timokhin Maxim <ncx2@yandex.com>) |
Responses |
Re: Updating a large table
|
List | pgsql-performance |
Hello Tomas! Thank you for the useful answer! 23.12.2017, 23:58, "Tomas Vondra" <tomas.vondra@2ndquadrant.com>: > On 12/22/2017 05:46 PM, Timokhin Maxim wrote: >> Hello! We have a large table 11GB ( about 37 million records ) and we >> need to alter a table - add a new column with default values is >> false. Also 'NOT NULL' is required. >> >> So, first I've done: >> >> ALTER TABLE clusters ALTER COLUMN "is_paid"; > > That seems somewhat incomplete ... what exactly did the ALTER do? I'll try to explain what exactly I meant. ALTER TABLE clusters ADD COLUMN "is_paid" BOOLEAN NOT NULL DEFAULT FALSE; What exactly I need. But that query would lock the whole table for about 40 minutes. I decided to separate it like: 1. ALTER TABLE clusters ADD COLUMN "is_paid" BOOLEAN DEFAULT FALSE; 2. UPDATE clusters SET is_paid = DEFAULT where ctime <= now() - interval '720h' AND is_paid != FALSE; ( This was neededas soon as possible ) 3. UPDATE another part by chunks 4. set NOT NULL for the table. I was thinking about how to optimize the 3th step. Well, my solution was to write a script which runs two threads. The first one UPDATE "is_paid" by chunks, another one checksmy metrics. If something is becoming wrong first thread stops until metrics are good. Thank you, Tomas. > >> after that: >> >> UPDATE clusters SET is_paid = DEFAULT where ctime <= now() - interval '720h' AND is_paid != FALSE; >> >> Everything went ok. Then I tried to run it again for an interval of 1 >> years. And I got that no one can't see - the was no available space >> on a disk. The reason was WAL-files ate everything. >> Master-server couldn't send some WAL-file to their replicas. Bandwidth wasn't enough. > > Well, then perhaps the best solution is to add more disk space and/or > make sure the network bandwidth is sufficient? > > In any case, don't forget this may also need to update all indexes on > the table, because the new row versions will end up on different pages. > So while the table has 11GB, this update may need much more WAL space > than that. > Got it, thank you! >> Well, I'm searching for a better idea to update the table. >> Solutions I found. >> 1. Separate my UPDATE by chunks. > > If this is a one-time change, this is probably the best option. > Exactly, thank you! >> 2. Alter a table using a new temporary table, but it's not convenient >> for me because there is a lot of foreign keys and indexes. > > Right. > >> 3. Hot-update. This is the most interesting case for me. >> Speaking of HOT-update https://www.dbrnd.com/2016/03/postgresql-the-awesome-table-fillfactor-to-speedup-update-and-select-statement/ >> The article says: it might be useful for tables that change often and moreover It would be the best way to increase thespeed of UPDATE. > > First of all, to make HOT possible there would have to be enough free > space on the pages. As you need to update the whole table, that means > each table would have to be only 50% full. That's unlikely to be true, > and you can't fix that at this point. > >> So, my questions are will it work for all tuples? It says that - no >> https://www.dbrnd.com/2016/03/postgresql-alter-table-to-change- >> fillfactor-value/, but I could not find a confirmation in official >> postresql's documentation. > > Not sure I understand your question, but HOT can only happen when two > conditions are met: > > 1) the update does not change any indexed column > > This is likely met, assuming you don't have an index on is_paid. > > 2) there's enough space on the same page for the new row version > > This is unlikely to be true, because the default fillfactor for tables > is 90%. You may change fillfactor using ALTER TABLE, but that only > applies to new data. > > Moreover, as the article says - this is useful for tables that change > often. Which is not quite what one-time table rewrite does. > > So HOT is not the solution you're looking for. > >> Why do I need to launch vacuum after updating? > > You don't need to launch vacuum - autovacuum will take care of that > eventually. But you may do that, to do the cleanup when it's convenient > for you. > >> How should I reduce the better fillfactor? > > For example to change fillfactor to 75% (i.e. 25% free space): > > ALTER TABLE t SET (fillfactor = 75); > > But as I said, it's not a solution for you. > >> What will be with WAL-files it this case? > > Not sure what you mean. > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-performance by date: