Thread: Need to update all my 60 million rows at once without transactional integrity
Need to update all my 60 million rows at once without transactional integrity
From
christian_behrens@gmx.net
Date:
Hi! How can I make a Update of a column in a very large table for all rows without using the double amount of disc space andwithout any need for atomic operation? I have a very large table with about 60 million rows. I sometimes need to do a simple update to ALL rows that resets a status-flagto zero. I don't need to have transactional integrity (but of course if the system crashes, there should be no data corruption. Aseparate flag in the file system can well save the fact that that bulk update was in progress) for this, I don't care orwant a abort or "all or nothing" like SQL mandates. The idea is basically that either this update succeeds or it succeedsor - there is no "not". It must succeed. It must be tried until it works. If the update is halfway finished, that'sokay. If I just do an UPDATE table SET flag=0; then Pg will make a copy of every row which must be cleaned up by vaccuum. I understand - and don't like during this specificproblem - that PG is a MVCC database and will keep my old versions of the rows before the update alive. This is verybad. If I do a batched loop like this: UPDATE table SET flag=0 where id>=0 and id <200; UPDATE table SET flag=0 where id>=200 and id <400; UPDATE table SET flag=0 where id>=400 and id <600; ... then PG will seek all over my harddrive I think. It would be much better if it could just start in the beginning of the table and work it's way towards the end. But whichsort-criteria would satisfy this? I don't think that there is any SQL that does something like that. Another ideas (and I think it's a quite good idea) would be to drop the column and recreate it with a new default value. But the problem is that this is not actually MY database, but an appliance (which has a harddrive that does not have thedouble amount of space available btw) and it has to work reliably whenever something special happens. And I don't think I should create new columns (the old one would be hidden and their internal column ids lost I think) allthe time, that might have a limit. Can I do it maybe every day?? Is there any other way to go? I would really like to kind of "skip transactions". Of course basic data integrity in on disc structures, but not atomicyfor this bulk update! I read that PG has an internal command language that is used to build up a database when all the tables for e.g. table-namesare not present yet. Could I use that to hack my way around transactions? Basically I can do everything to this PG installation, this is an extremly controlled, locked down environment. I don't needto care for anyone else, it's always the same schema, the same version, same OS, etc. and I could happily apply any hackthat solves the problem. Many thanks for any pointers or ideas how to solve this problem! Christian -- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser
Re: Need to update all my 60 million rows at once without transactional integrity
From
"A. Kretschmer"
Date:
am Mon, dem 21.04.2008, um 0:19:34 +0200 mailte christian_behrens@gmx.net folgendes: > Hi! > > How can I make a Update of a column in a very large table for all rows without using the double amount of disc space andwithout any need for atomic operation? > > I have a very large table with about 60 million rows. I sometimes need to do a simple update to ALL rows that resets astatus-flag to zero. > > I don't need to have transactional integrity (but of course if the system crashes, there should be no data corruption.A separate flag in the file system can well save the fact that that bulk update was in progress) for this, I don'tcare or want a abort or "all or nothing" like SQL mandates. The idea is basically that either this update succeeds orit succeeds or - there is no "not". It must succeed. It must be tried until it works. If the update is halfway finished,that's okay. > > If I just do an > UPDATE table SET flag=0; > then Pg will make a copy of every row which must be cleaned up by vaccuum. I understand - and don't like during this specificproblem - that PG is a MVCC database and will keep my old versions of the rows before the update alive. This is verybad. > > If I do a batched loop like this: > UPDATE table SET flag=0 where id>=0 and id <200; > UPDATE table SET flag=0 where id>=200 and id <400; > UPDATE table SET flag=0 where id>=400 and id <600; Don't forget to VACUUM after every Update... > > Is there any other way to go? Update to 8.3 and profit from the new HOT feature (wild guess: there is no index on this flag-column) Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
Re: Need to update all my 60 million rows at once without transactional integrity
From
"Pavan Deolasee"
Date:
On Wed, Apr 23, 2008 at 1:52 PM, A. Kretschmer <andreas.kretschmer@schollglas.com> wrote: > am Mon, dem 21.04.2008, um 0:19:34 +0200 mailte christian_behrens@gmx.net folgendes: > > > > > If I do a batched loop like this: > > UPDATE table SET flag=0 where id>=0 and id <200; > > UPDATE table SET flag=0 where id>=200 and id <400; > > UPDATE table SET flag=0 where id>=400 and id <600; > > > Don't forget to VACUUM after every Update... > VACUUMing a large table so often could a problem. But if disk space is the only limitation and you don't care much about IO and CPU usage, its not a bad idea. > > > > > Is there any other way to go? > > Update to 8.3 and profit from the new HOT feature (wild guess: there is > no index on this flag-column) > HOT may not help a lot in this case. HOT needs free space in the same block to put the new version. It can recycle the previously updated rows and thus free up space, but only if the rows were updated in an older (now committed) transaction. Now, if you are doing batch updates, then there is a chance that HOT may be able recycle rows updated in one of the previous batches. But if the updates are happening sequential, then the blocks which were updated previously would never be touched again and hence no space will be freed. If you are updating one row at a time (in a separate transaction) or if the batch updates are kind of scattered, then HOT can reuse the dead tuples and limit the bloat. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: Need to update all my 60 million rows at once without transactional integrity
From
Tomasz Ostrowski
Date:
On 2008-04-21 00:19, christian_behrens@gmx.net wrote: > I have a very large table with about 60 million rows. I sometimes > need to do a simple update to ALL rows that resets a status-flag to > zero. > UPDATE table SET flag=0; First optimization: UPDATE table SET flag=0 where flag!=0; Second optimization: > If I do a batched loop like this: > UPDATE table SET flag=0 where id>=0 and id <200; > UPDATE table SET flag=0 where id>=200 and id <400; > UPDATE table SET flag=0 where id>=400 and id <600; > then PG will seek all over my harddrive I think. Loop like this (in pseudo-code): non0 = select count(*) from table where flag!=0; batchsize = 1000000; for ( i=0; i<non0; i+=batchsize) { update table set flag=0 where id in (select id from table where flag!=0 limit batchsize); commit; vacuum table; analyze table; } You'll use only 1/60 of space. Will need about 3*60 table scans. But it should not seek too much. Regards Tometzky -- ...although Eating Honey was a very good thing to do, there was a moment just before you began to eat it which was better than when you were... Winnie the Pooh
Re: Need to update all my 60 million rows at once without transactional integrity
From
Alvaro Herrera
Date:
christian_behrens@gmx.net wrote: > How can I make a Update of a column in a very large table for all rows > without using the double amount of disc space and without any need for > atomic operation? > > I have a very large table with about 60 million rows. I sometimes need > to do a simple update to ALL rows that resets a status-flag to zero. Perhaps you should rethink your data model. > I would really like to kind of "skip transactions". Of course basic > data integrity in on disc structures, but not atomicy for this bulk > update! There is no such thing on Postgres as "not a transaction". -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Re: Need to update all my 60 million rows at once without transactional integrity
From
"Merlin Moncure"
Date:
On Wed, Apr 23, 2008 at 9:04 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > christian_behrens@gmx.net wrote: > > > > How can I make a Update of a column in a very large table for all rows > > without using the double amount of disc space and without any need for > > atomic operation? > > > > I have a very large table with about 60 million rows. I sometimes need > > to do a simple update to ALL rows that resets a status-flag to zero. > > Perhaps you should rethink your data model. for example: *) change status-flag to a timestamp *) make a new table (status_reset or something) with one column, one row...and *) make a view (perhaps swapping names with your original table) that joins the two tables and preserves the 1/0 status flag column appearance so you don't have to change the app. Consider the fundamental tenet: 'If PostgreSQL can't do it well, you are probably doing it the wrong way' :-) merlin
Re: Need to update all my 60 million rows at once without transactional integrity
From
"Pavan Deolasee"
Date:
On Mon, Apr 21, 2008 at 3:49 AM, <christian_behrens@gmx.net> wrote: > > Could I use that to hack my way around transactions? > Since you are asking for trouble, may there is something you can do with Before UPDATE Triggers and heap_inplace_update(). Before you try this out: I must say, *I have no idea if this would work in all scenario and I don't take any guarantee of data consistency*. So do it on your own risk :-) Obviously, transaction integrity and MVCC is compromised. But I think crash recovery should work fine because heap_inplace_update() takes care of WAL logging. Write a BEFORE UPDATE trigger in C, something like this: PG_FUNCTION_INFO_V1(inplace_update_trigger); extern Datum inplace_update_trigger(PG_FUNCTION_ARGS); Datum inplace_update_trigger(PG_FUNCTION_ARGS) { TriggerData *trigdata = (TriggerData *)fcinfo->context; trigdata->tg_newtuple->t_self = trigdata->tg_trigtuple->t_self; heap_inplace_update(trigdata->tg_relation, trigdata->tg_newtuple); return NULL; } CREATE OR REPLACE FUNCTION inplace_update_trigger() RETURNS TRIGGER AS 'trigger.so', 'inplace_update_trigger' LANGUAGE C STRICT; CREATE TRIGGER inplace_update_trigger BEFORE UPDATE ON <tblname> FOR EACH ROW EXECUTE PROCEDURE inplace_update_trigger(); Now whenever you update a row in the table, the before update trigger would update the old tuple in-place and return NULL. That would ensure that the actual UPDATE operation is not performed, but the changes are permanently recorded on the old tuple. In case of crash or transaction abort, the updates can not be rolled back. Also, you may want to take an exclusive lock on the relation before you start the update. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: Need to update all my 60 million rows at once without transactional integrity
From
Steve Crawford
Date:
christian_behrens@gmx.net wrote: > Hi! > > How can I make a Update of a column in a very large table for all rows without using the double amount of disc space andwithout any need for atomic operation? > > I have a very large table with about 60 million rows. I sometimes need to do a simple update to ALL rows that resets astatus-flag to zero. > Without knowing details of your app, here are some random thoughts: 1. Use a where clause. If the number of non-zero status flags is small portion of the table, then the table will only grow by the number of flags that need to be reset, not the whole table. 2. Split the flag into a separate table. You will have index overhead, but: 2a. You can reset by a simple truncate followed by an INSERT into flagtable (id,flag) SELECT rowid,0 from yourmaintable. 2b. Even if (not-recommended) you did a full update of the flagtable, you would only be growing the usage by the size of the flagtable. 2c. You may be able to have the flagtable only store non-zero flags in which case you could use a coalesce((SELECT flag from flagtable where flagtable.id=manitable.id),0) to fetch the flag. Then a reset is just a near-instantaneous truncate. 3. Partition your table - use inheritance to create a main table consisting of many children containing the data. Depending on your app, there may be other benefits to partitioning. But in any case, you can update one child-table at a time. Follow the update of each sub-table with a CLUSTER which is far faster than VACUUM FULL. Cheers, Steve
Re: Need to update all my 60 million rows at once without transactional integrity
From
Simon Riggs
Date:
On Mon, 2008-04-21 at 00:19 +0200, christian_behrens@gmx.net wrote: > How can I make a Update of a column in a very large table for all rows > without using the double amount of disc space and without any need for > atomic operation? > > I have a very large table with about 60 million rows. I sometimes need > to do a simple update to ALL rows that resets a status-flag to zero. > > I don't need to have transactional integrity (but of course if the > system crashes, there should be no data corruption. No such thing. Without transactions you have no sensible definition of what constitutes data corruption. > A separate flag in the file system can well save the fact that that > bulk update was in progress) for this, I don't care or want a abort or > "all or nothing" like SQL mandates. The idea is basically that either > this update succeeds or it succeeds or - there is no "not". It must > succeed. It must be tried until it works. If the update is halfway > finished, that's okay. Don't reset them to zero, just redefine the meaning of the counter. Take the max value in the table and then have the app understand that anything <= the previous max value means the same thing as whatever "status = 0" means now. The app would need to read the new baseline value before performing any work. This is roughly the technique used by Slony to avoid needing to update every row in the log table to show that it has successfully replicated it. It's also the technique MVCC relies upon internally. It's less work and crash safe in all cases. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Re: Need to update all my 60 million rows at once without transactional integrity
From
Tom Allison
Date:
Far from being an expert on postgres, but there are two ideas-- assuming that you cannot afford the time it would take to simply UPDATE and wait... Write a script to update all the rows, one at a time. Lowest impact to operations but would take a very long time. Assuming you have a sequence primary key value on each row, update by ID blocks on the order of 10,000's or 100,000's at a time (or more). This is a balancing act between time to complete and immediate load on the server. I've used both options but I don't think I've exceeded 15 million rows. Sent from my iPhone. On Apr 23, 2008, at 2:15 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Mon, 2008-04-21 at 00:19 +0200, christian_behrens@gmx.net wrote: > >> How can I make a Update of a column in a very large table for all >> rows >> without using the double amount of disc space and without any need >> for >> atomic operation? >> >> I have a very large table with about 60 million rows. I sometimes >> need >> to do a simple update to ALL rows that resets a status-flag to zero. >> >> I don't need to have transactional integrity (but of course if the >> system crashes, there should be no data corruption. > > No such thing. Without transactions you have no sensible definition of > what constitutes data corruption. > >> A separate flag in the file system can well save the fact that that >> bulk update was in progress) for this, I don't care or want a abort >> or >> "all or nothing" like SQL mandates. The idea is basically that either >> this update succeeds or it succeeds or - there is no "not". It must >> succeed. It must be tried until it works. If the update is halfway >> finished, that's okay. > > Don't reset them to zero, just redefine the meaning of the counter. > Take > the max value in the table and then have the app understand that > anything <= the previous max value means the same thing as whatever > "status = 0" means now. The app would need to read the new baseline > value before performing any work. > > This is roughly the technique used by Slony to avoid needing to update > every row in the log table to show that it has successfully replicated > it. It's also the technique MVCC relies upon internally. > > It's less work and crash safe in all cases. > > -- > Simon Riggs > 2ndQuadrant http://www.2ndQuadrant.com > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
Re: Need to update all my 60 million rows at once without transactional integrity
From
Alban Hertroys
Date:
On Apr 21, 2008, at 12:19 AM, christian_behrens@gmx.net wrote: > Hi! > > How can I make a Update of a column in a very large table for all > rows without using the double amount of disc space and without any > need for atomic operation? > > I have a very large table with about 60 million rows. I sometimes > need to do a simple update to ALL rows that resets a status-flag to > zero. I'll assume they're wide rows then. > I don't need to have transactional integrity (but of course if the > system crashes, there should be no data corruption. A separate flag > in the file system can well save the fact that that bulk update was > in progress) for this, I don't care or want a abort or "all or > nothing" like SQL mandates. The idea is basically that either this > update succeeds or it succeeds or - there is no "not". It must > succeed. It must be tried until it works. If the update is halfway > finished, that's okay. > > If I just do an > UPDATE table SET flag=0; > then Pg will make a copy of every row which must be cleaned up by > vaccuum. I understand - and don't like during this specific problem > - that PG is a MVCC database and will keep my old versions of the > rows before the update alive. This is very bad. I'm not sure what you're trying to do exactly, but updating a flag on every row you changed will double your data size, as update means an insert/delete combination in MVCC. If there was no reason to update the row except for changing the flag, that's probably not what you want. You could instead use referential integrity to do this job for you. Create a (temporary) table have a foreign key to your records (define it as both PRIMARY key and FOREIGN key to keep a 1 to 1 relation). Setting the flag is done by inserting a record into the temp table referencing the record you "updated". Existence of the record in the temp table would signify the row was changed. This way you'll only have inserts of small rows (into the temp table), and less than 60 million if you didn't succeed. That still leaves the problem of disabling atomicity of that update of course. I think you actually mean to also update data in the rows you want to set those flags for, in which case my suggestion is of little help... > If I do a batched loop like this: > UPDATE table SET flag=0 where id>=0 and id <200; > UPDATE table SET flag=0 where id>=200 and id <400; > UPDATE table SET flag=0 where id>=400 and id <600; > ... > > then PG will seek all over my harddrive I think. > > It would be much better if it could just start in the beginning of > the table and work it's way towards the end. But which sort- > criteria would satisfy this? I don't think that there is any SQL > that does something like that. If that table is clustered over an index, that would be a good candidate as clustering means the data is mostly ordered on disk according to that index. > Another ideas (and I think it's a quite good idea) would be to > drop the column and recreate it with a new default value. > > But the problem is that this is not actually MY database, but an > appliance (which has a harddrive that does not have the double > amount of space available btw) and it has to work reliably whenever > something special happens. > > And I don't think I should create new columns (the old one would be > hidden and their internal column ids lost I think) all the time, > that might have a limit. > > Can I do it maybe every day?? > > > Is there any other way to go? > > I would really like to kind of "skip transactions". Of course basic > data integrity in on disc structures, but not atomicy for this bulk > update! > > I read that PG has an internal command language that is used to > build up a database when all the tables for e.g. table-names are > not present yet. > > Could I use that to hack my way around transactions? > > Basically I can do everything to this PG installation, this is an > extremly controlled, locked down environment. I don't need to care > for anyone else, it's always the same schema, the same version, > same OS, etc. and I could happily apply any hack that solves the > problem. > > Many thanks for any pointers or ideas how to solve this problem! > Christian > -- > Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten > Browser-Versionen downloaden: http://www.gmx.net/de/go/browser > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > > > > Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,480fba09927662091310159!
Re: Need to update all my 60 million rows at once without transactional integrity
From
Chris Browne
Date:
christian_behrens@gmx.net writes: > How can I make a Update of a column in a very large table for all > rows without using the double amount of disc space and without any > need for atomic operation? You may need to redefine the problem. > I have a very large table with about 60 million rows. I sometimes > need to do a simple update to ALL rows that resets a status-flag to > zero. That seems like a pretty useless update to me... Why not instead redefine the "status-we-need-to-have" to be zero? [e.g. - don't try to turn off the Sun; instead, define "brightness" as the standard ;-)] > I don't need to have transactional integrity (but of course if the system crashes, there should be no data corruption.A separate flag in the file system can well save the fact that that bulk update was in progress) for this, I don'tcare or want a abort or "all or nothing" like SQL mandates. The idea is basically that either this update succeeds orit succeeds or - there is no "not". It must succeed. It must be tried until it works. If the update is halfway finished,that's okay. > > If I just do an > UPDATE table SET flag=0; > then Pg will make a copy of every row which must be cleaned up by vaccuum. I understand - and don't like during this specificproblem - that PG is a MVCC database and will keep my old versions of the rows before the update alive. This is verybad. > > If I do a batched loop like this: > UPDATE table SET flag=0 where id>=0 and id <200; > UPDATE table SET flag=0 where id>=200 and id <400; > UPDATE table SET flag=0 where id>=400 and id <600; > ... > > then PG will seek all over my harddrive I think. If I *were* going to do this, I'd want to do: update table set flag=0 where flag <> 0 and id in (select id from table where flag <> 0 limit 50000); which will go through the table 50K rows at a time. > It would be much better if it could just start in the beginning of > the table and work it's way towards the end. But which sort-criteria > would satisfy this? I don't think that there is any SQL that does > something like that. You couldn't guarantee physical ordering anyways, so that seems futile. > Another ideas (and I think it's a quite good idea) would be to > drop the column and recreate it with a new default value. The "column drop" would be virtually instantaneous; it would essentially be hidden from view. But the addition of the column would rewrite the table, doubling its size :-(. > And I don't think I should create new columns (the old one would be > hidden and their internal column ids lost I think) all the time, > that might have a limit. That's correct. It will eventually cause a problem. > Is there any other way to go? Step back to what value you're putting into that column, and why. Perhaps you never need to change this value. In the Slony-I replication system, we have something sort of similar; we collect a table of entries that need to get applied to another database. (Table called sl_log_1.) We *NEVER* alter its contents; that is not necessary to indicate that data has been replicated (which is the status of interest). Instead of updating tuples to mark that they are processed, we instead store information in another table that indicates up to which point in time we have finished replicating. (It's more complex and indirect than that, but nonetheless, it's still a fair characterization...) So, perhaps you should be storing a per-day value in the table, and store, somewhere else, what point you're "done up to." -- let name="cbbrowne" and tld="linuxdatabases.info" in name ^ "@" ^ tld;; http://linuxdatabases.info/info/emacs.html Last night I played a blank tape at full blast. The mime next door went nuts.
Re: Need to update all my 60 million rows at once without transactional integrity
From
"Dann Corbit"
Date:
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of Chris Browne > Sent: Wednesday, April 23, 2008 3:20 PM > To: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Need to update all my 60 million rows at once > without transactional integrity > > christian_behrens@gmx.net writes: > > How can I make a Update of a column in a very large table for all > > rows without using the double amount of disc space and without any > > need for atomic operation? > > You may need to redefine the problem. > > > I have a very large table with about 60 million rows. I sometimes > > need to do a simple update to ALL rows that resets a status-flag to > > zero. > > That seems like a pretty useless update to me... > > Why not instead redefine the "status-we-need-to-have" to be zero? > [e.g. - don't try to turn off the Sun; instead, define "brightness" as > the standard ;-)] > > > I don't need to have transactional integrity (but of course if the > system crashes, there should be no data corruption. A separate flag in the > file system can well save the fact that that bulk update was in progress) > for this, I don't care or want a abort or "all or nothing" like SQL > mandates. The idea is basically that either this update succeeds or it > succeeds or - there is no "not". It must succeed. It must be tried until > it works. If the update is halfway finished, that's okay. > > > > If I just do an > > UPDATE table SET flag=0; > > then Pg will make a copy of every row which must be cleaned up by > vaccuum. I understand - and don't like during this specific problem - that > PG is a MVCC database and will keep my old versions of the rows before the > update alive. This is very bad. > > > > If I do a batched loop like this: > > UPDATE table SET flag=0 where id>=0 and id <200; > > UPDATE table SET flag=0 where id>=200 and id <400; > > UPDATE table SET flag=0 where id>=400 and id <600; > > ... > > > > then PG will seek all over my harddrive I think. > > If I *were* going to do this, I'd want to do: > > update table set flag=0 where flag <> 0 and id in (select id from table > where flag <> 0 limit 50000); > > which will go through the table 50K rows at a time. > > > It would be much better if it could just start in the beginning of > > the table and work it's way towards the end. But which sort-criteria > > would satisfy this? I don't think that there is any SQL that does > > something like that. > > You couldn't guarantee physical ordering anyways, so that seems > futile. > > > Another ideas (and I think it's a quite good idea) would be to > > drop the column and recreate it with a new default value. > > The "column drop" would be virtually instantaneous; it would > essentially be hidden from view. > > But the addition of the column would rewrite the table, doubling its > size :-(. > > > And I don't think I should create new columns (the old one would be > > hidden and their internal column ids lost I think) all the time, > > that might have a limit. > > That's correct. It will eventually cause a problem. > > > Is there any other way to go? > > Step back to what value you're putting into that column, and why. > > Perhaps you never need to change this value. > > In the Slony-I replication system, we have something sort of similar; > we collect a table of entries that need to get applied to another > database. (Table called sl_log_1.) We *NEVER* alter its contents; > that is not necessary to indicate that data has been replicated (which > is the status of interest). > > Instead of updating tuples to mark that they are processed, we instead > store information in another table that indicates up to which point in > time we have finished replicating. (It's more complex and indirect > than that, but nonetheless, it's still a fair characterization...) > > So, perhaps you should be storing a per-day value in the table, and > store, somewhere else, what point you're "done up to." I have a notion here... If there is a table which gets some subset of its columns completely rewritten every day, then I suggest taking the completely rewritten columns and make them into a child table. Take the primary key of the current table with the problem and clone that primary key for the child table. E.g.: Table foo has columns: a,b,c,d,e,f,g,h,I,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z We update p,t,y every day, 100% Primary key of foo is a,b I suggest creating bar (a,b,p,t,y) and remove p,t,y from main table foo and rename table foo to foo_1 Create a view that has the same name as foo and the columns of the original foo table by joining tables foo_1 and bar. Every time you repopulate the table, drop and recreate bar, COPY INTO and then recreate bar's index on (a,b). Your applications will work the same because the view foo has the same properties as the original table foo. You could even populate the bar table offline, rename the old bar table to bar_old and rename the new populated table from bar_new to bar and so even the apparent downtime would be very small.