Thread: Adding and filling new column on big table
I have a table of about 500,000 rows.
I need to add a new column and populate it.
So, I have tried to run the following command. The command never finishes (I gave up after about and hour and a half!).
Note that none of the columns have indexes.
Update mytable set new_column =
case when column_1 = column_2 then 1
when column_1+column_3= column_2 and column_3 > 0 then 2
when column_1+column_3+column_4 = column_2 and column_4 > 0 then 3
when column_1+column_3+column_4+column_5 = column_2 and column_5 > 0 then 4
else 0
end
My computer is a Pentium 4 – 2.4 GHZ and 1G RAM – so it should be fast enough.
Any ideas?
Jonathan Blitz
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.392 / Virus Database: 268.5.6/340 - Release Date: 05/15/2006
On Wed, May 17, 2006 at 03:19:26AM +0200, Jonathan Blitz wrote: > I have a table of about 500,000 rows. > > I need to add a new column and populate it. > > So, I have tried to run the following command. The command never finishes (I > gave up after about and hour and a half!). If you install contrib/pgstattuple you can figure out how fast the update is running. Run "SELECT * FROM pgstattuple('mytable')" a few times and note the rate at which dead_tuple_count is increasing. If it's not increasing at all then query pg_locks and look for locks where "granted" is false. I created a test table, populated it with 500,000 rows of random data, and ran the update you posted. On a 500MHz Pentium III with 512M RAM and a SCSI drive from the mid-to-late 90s, running PostgreSQL 8.1.3 on FreeBSD 6.1, the update finished in just over two minutes. The table had one index (the primary key). > Note that none of the columns have indexes. Do you mean that no columns in the table have indexes? Or that the columns referenced in the update don't have indexes but that other columns do? What does "\d mytable" show? Do other tables have foreign key references to this table? What non-default settings do you have in postgresql.conf? What version of PostgreSQL are you running and on what platform? How busy is the system? What's the output of "EXPLAIN UPDATE mytable ..."? -- Michael Fuhr
Jonathan Blitz writes: > So, I have tried to run the following command. The command never finishes > (I gave up after about and hour and a half!). Did you ever find what was the problem? Perhaps you needed to run a vacuum full on the table?
> > So, I have tried to run the following command. The command never finishes > > (I gave up after about and hour and a half!). > > Did you ever find what was the problem? > Perhaps you needed to run a vacuum full on the table? Nope. I just gave up in the end and left it with NULL as the default value. There were, in fact, over 2 million rows in the table rather than 1/4 of a million so that was part of the problem. Jonathan -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.394 / Virus Database: 268.7.4/351 - Release Date: 05/29/2006
Jonathan Blitz writes: > I just gave up in the end and left it with NULL as the default value. Could you do the updates in batches instead of trying to do them all at once? Have you done a vacuum full on this table ever? > There were, in fact, over 2 million rows in the table rather than 1/4 of a > million so that was part of the problem. What hardware? I have a dual CPU opteron with 4GB of RAM and 8 disks in RAID 10 (SATA). Doing an update on a 5 million record table took quite a while, but it did fininish. :-) I just did vacuum full before and after though.. That many updates tend to slow down operations on the table aftewards unless you vacuum the table. Based on what you wrote it sounded as if you tried a few times and may have killed the process.. this would certainly slow down the operations on that table unless you did a vacuum full. I wonder if running vacuum analyze against the table as the updates are running would be of any help.
> > > Could you do the updates in batches instead of trying to do them all at > once? Nope. Didn't think it would make any difference. > > Have you done a vacuum full on this table ever? Many times > > What hardware? > I have a dual CPU opteron with 4GB of RAM and 8 disks in RAID 10 (SATA). > Doing an update on a 5 million record table took quite a while, but it did > fininish. :-) I am using a laptop :). Pentium 4 (not 4M) with 1GB of memory - 2 MHZ Must do it on that since the program is aimed for use at home. Jonathan -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.394 / Virus Database: 268.7.4/351 - Release Date: 05/29/2006
Jonathan Blitz writes: > Nope. Didn't think it would make any difference. May be worth a try. > I am using a laptop :). > Pentium 4 (not 4M) with 1GB of memory - 2 MHZ Most laptop drives are only 5,400 RPM which would make a transaction like you are describing likely take a while. > Must do it on that since the program is aimed for use at home. No desktop at home you could try it on? I think the problem with the laptop is likely it's drive.
> > Most laptop drives are only 5,400 RPM which would make a transaction like > you are describing likely take a while. Not sure what my one is but it is new(ish). > > No desktop at home you could try it on? > I think the problem with the laptop is likely it's drive. I suppose I could do but I need to install PostgreSQL there and then copy over the database. Maybe I will give it a try. Jonathan -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.394 / Virus Database: 268.7.4/351 - Release Date: 05/29/2006
On Tue, 2006-05-30 at 16:04, Jonathan Blitz wrote: > > > > Most laptop drives are only 5,400 RPM which would make a transaction like > > you are describing likely take a while. > > Not sure what my one is but it is new(ish). > > > > > No desktop at home you could try it on? > > I think the problem with the laptop is likely it's drive. > > I suppose I could do but I need to install PostgreSQL there and then copy > over the database. Keep in mind, most, if not all IDE drives lie about fsync, so the speed of the drive is a limit only if you're actually writing a lot of data. If you're doing a lot of little transactions, the drive should be lying and holding the data in cache on board, so the speed should be OK.
>> Most laptop drives are only 5,400 RPM which would make a transaction >> like >> you are describing likely take a while. > > Not sure what my one is but it is new(ish). If you're doing data intensive operations (like a big update which looks like what you're doing) it will write many megabytes to the harddrive... my laptop HDD (5400 rpm) does about 15 MB/s throughput while a standard desktop 7200rpm drive does 55-60 MB/s throughput. Plus, seek times on a laptop drive are horrendous.
Jonathan Blitz writes: > I suppose I could do but I need to install PostgreSQL there and then copy > over the database. > Maybe I will give it a try. I really think that is your best bet. If for whatever reason that will not be an option perhaps you can just let the process run over the weekend.. possibly monitor the process from the OS to make sure it is not frozen. Don't recall if you mentioned the OS.. is it any unix like os? If so there are several ways you could check to make sure the process is not frozen such as iostats, top, vmstats(these from FreeBSD, but most unix like os should have tools like those if not some with the same name).