Need to update all my 60 million rows at once without transactional integrity - Mailing list pgsql-general

From christian_behrens@gmx.net
Subject Need to update all my 60 million rows at once without transactional integrity
Date
Msg-id 20080420221934.13270@gmx.net
Whole thread Raw
Responses Re: Need to update all my 60 million rows at once without transactional integrity  ("A. Kretschmer" <andreas.kretschmer@schollglas.com>)
Re: Need to update all my 60 million rows at once without transactional integrity  (Tomasz Ostrowski <tometzky@batory.org.pl>)
Re: Need to update all my 60 million rows at once without transactional integrity  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: Need to update all my 60 million rows at once without transactional integrity  ("Pavan Deolasee" <pavan.deolasee@gmail.com>)
Re: Need to update all my 60 million rows at once without transactional integrity  (Steve Crawford <scrawford@pinpointresearch.com>)
Re: Need to update all my 60 million rows at once without transactional integrity  (Simon Riggs <simon@2ndquadrant.com>)
Re: Need to update all my 60 million rows at once without transactional integrity  (Alban Hertroys <dalroi@solfertje.student.utwente.nl>)
List pgsql-general
Hi!

How can I make a Update of a column in a very large table for all rows without using the double amount of disc space
andwithout any need for atomic operation? 

I have a very large table with about 60 million rows. I sometimes need to do a simple update to ALL rows that resets a
status-flagto zero. 

I don't need to have transactional integrity (but of course if the system crashes, there should be no data corruption.
Aseparate flag in the file system can well save the fact that that bulk update was in progress) for this, I don't care
orwant a abort or "all or nothing" like SQL mandates. The idea is basically that either this update succeeds or it
succeedsor - there is no "not". It must succeed. It must be tried until it works. If the update is halfway finished,
that'sokay. 

If I just do an
UPDATE table SET flag=0;
then Pg will make a copy of every row which must be cleaned up by vaccuum. I understand - and don't like during this
specificproblem - that PG is a MVCC database and will keep my old versions of the rows before the update alive. This is
verybad. 

If I do a batched loop like this:
UPDATE table SET flag=0 where id>=0 and id <200;
UPDATE table SET flag=0 where id>=200 and id <400;
UPDATE table SET flag=0 where id>=400 and id <600;
...

then PG will seek all over my harddrive I think.

It would be much better if it could just start in the beginning of the table and work it's way towards the end. But
whichsort-criteria would satisfy this? I don't think that there is any SQL that does something like that. 


Another ideas (and I think it's a quite good idea) would be to
drop the column and recreate it with a new default value.

But the problem is that this is not actually MY database, but an appliance (which has a harddrive that does not have
thedouble amount of space available btw) and it has to work reliably whenever something special happens. 

And I don't think I should create new columns (the old one would be hidden and their internal column ids lost I think)
allthe time, that might have a limit. 

Can I do it maybe every day??


Is there any other way to go?

I would really like to kind of "skip transactions". Of course basic data integrity in on disc structures, but not
atomicyfor this bulk update! 

I read that PG has an internal command language that is used to build up a database when all the tables for e.g.
table-namesare not present yet. 

Could I use that to hack my way around transactions?

Basically I can do everything to this PG installation, this is an extremly controlled, locked down environment. I don't
needto care for anyone else, it's always the same schema, the same version, same OS, etc. and I could happily apply any
hackthat solves the problem. 

Many thanks for any pointers or ideas how to solve this problem!
 Christian
--
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser

pgsql-general by date:

Previous
From: Monalee Bhandge
Date:
Subject: Postgresql Help
Next
From: BLazeD
Date:
Subject: Re: Changed Hosts, Lots of Errors in PostgreSQL - Help Please!