Thread: Adding and filling new column on big table

Adding and filling new column on big table

From
"Jonathan Blitz"
Date:

I have a table of about 500,000 rows.

 

I need to add a new column and populate it.

 

So, I have tried to run the following command. The command never finishes (I gave up after about and hour and a half!).

Note that none of the columns have indexes.

 

Update mytable set new_column =   

case when column_1 = column_2                                                                        then 1  

when column_1+column_3= column_2 and column_3 > 0                                      then 2  

when column_1+column_3+column_4 = column_2 and column_4 > 0                     then 3  

when column_1+column_3+column_4+column_5 = column_2 and column_5 > 0     then 4  

else                                                                                                                         0

end

 

 

My computer is a Pentium 4 – 2.4 GHZ and 1G RAM – so it should be fast enough.

 

Any ideas?

 

Jonathan Blitz

 


--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.392 / Virus Database: 268.5.6/340 - Release Date: 05/15/2006

Re: Adding and filling new column on big table

From
Michael Fuhr
Date:
On Wed, May 17, 2006 at 03:19:26AM +0200, Jonathan Blitz wrote:
> I have a table of about 500,000 rows.
>
> I need to add a new column and populate it.
>
> So, I have tried to run the following command. The command never finishes (I
> gave up after about and hour and a half!).

If you install contrib/pgstattuple you can figure out how fast the
update is running.  Run "SELECT * FROM pgstattuple('mytable')" a
few times and note the rate at which dead_tuple_count is increasing.
If it's not increasing at all then query pg_locks and look for locks
where "granted" is false.

I created a test table, populated it with 500,000 rows of random
data, and ran the update you posted.  On a 500MHz Pentium III with
512M RAM and a SCSI drive from the mid-to-late 90s, running PostgreSQL
8.1.3 on FreeBSD 6.1, the update finished in just over two minutes.
The table had one index (the primary key).

> Note that none of the columns have indexes.

Do you mean that no columns in the table have indexes?  Or that the
columns referenced in the update don't have indexes but that other
columns do?  What does "\d mytable" show?  Do other tables have
foreign key references to this table?  What non-default settings
do you have in postgresql.conf?  What version of PostgreSQL are you
running and on what platform?  How busy is the system?  What's the
output of "EXPLAIN UPDATE mytable ..."?

--
Michael Fuhr

Re: Adding and filling new column on big table

From
Francisco Reyes
Date:
Jonathan Blitz writes:

> So, I have tried to run the following command. The command never finishes
> (I gave up after about and hour and a half!).

Did you ever find what was the problem?
Perhaps you needed to run a vacuum full on the table?


Re: Adding and filling new column on big table

From
"Jonathan Blitz"
Date:
> > So, I have tried to run the following command. The command never
finishes
> > (I gave up after about and hour and a half!).
>
> Did you ever find what was the problem?
> Perhaps you needed to run a vacuum full on the table?

Nope.
I just gave up in the end and left it with NULL as the default value.
There were, in fact, over 2 million rows in the table rather than 1/4 of a
million so that was part of the problem.

Jonathan


--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.7.4/351 - Release Date: 05/29/2006



Re: Adding and filling new column on big table

From
Francisco Reyes
Date:
Jonathan Blitz writes:

> I just gave up in the end and left it with NULL as the default value.


Could you do the updates in batches instead of trying to do them all at
once?

Have you done a vacuum full on this table ever?

> There were, in fact, over 2 million rows in the table rather than 1/4 of a
> million so that was part of the problem.

What hardware?
I have a dual CPU opteron with 4GB of RAM and 8 disks in RAID 10 (SATA).
Doing an update on a 5 million record table took quite a while, but it did
fininish. :-)

I just did vacuum full before and after though.. That many updates tend to
slow down operations on the table aftewards unless you vacuum the table.
Based on what you wrote it sounded as if you tried a few times and may have
killed the process.. this would certainly slow down the operations on that
table unless you did a vacuum full.

I wonder if running vacuum analyze against the table as the updates are
running would be of any help.


Re: Adding and filling new column on big table

From
"Jonathan Blitz"
Date:
>
>
> Could you do the updates in batches instead of trying to do them all at
> once?

Nope. Didn't think it would make any difference.
>
> Have you done a vacuum full on this table ever?

Many times

>
> What hardware?
> I have a dual CPU opteron with 4GB of RAM and 8 disks in RAID 10 (SATA).
> Doing an update on a 5 million record table took quite a while, but it did
> fininish. :-)

I am using a laptop :).
Pentium 4 (not 4M) with 1GB of memory - 2 MHZ

Must do it on  that since the program is aimed for use at home.

Jonathan

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.7.4/351 - Release Date: 05/29/2006



Re: Adding and filling new column on big table

From
Francisco Reyes
Date:
Jonathan Blitz writes:

> Nope. Didn't think it would make any difference.

May be worth a try.

> I am using a laptop :).
> Pentium 4 (not 4M) with 1GB of memory - 2 MHZ

Most laptop drives are only 5,400 RPM which would make a transaction like
you are describing likely take a while.

> Must do it on  that since the program is aimed for use at home.

No desktop at home you could try it on?
I think the problem with the laptop is likely it's drive.


Re: Adding and filling new column on big table

From
"Jonathan Blitz"
Date:
>
> Most laptop drives are only 5,400 RPM which would make a transaction like
> you are describing likely take a while.

Not sure what my one is but it is new(ish).

>
> No desktop at home you could try it on?
> I think the problem with the laptop is likely it's drive.

I suppose I could do but I need to install PostgreSQL there and then copy
over the database.
Maybe I will give it a try.

Jonathan

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.7.4/351 - Release Date: 05/29/2006



Re: Adding and filling new column on big table

From
Scott Marlowe
Date:
On Tue, 2006-05-30 at 16:04, Jonathan Blitz wrote:
> >
> > Most laptop drives are only 5,400 RPM which would make a transaction like
> > you are describing likely take a while.
>
> Not sure what my one is but it is new(ish).
>
> >
> > No desktop at home you could try it on?
> > I think the problem with the laptop is likely it's drive.
>
> I suppose I could do but I need to install PostgreSQL there and then copy
> over the database.

Keep in mind, most, if not all IDE drives lie about fsync, so the speed
of the drive is a limit only if you're actually writing a lot of data.
If you're doing a lot of little transactions, the drive should be lying
and holding the data in cache on board, so the speed should be OK.

Re: Adding and filling new column on big table

From
PFC
Date:
>> Most laptop drives are only 5,400 RPM which would make a transaction
>> like
>> you are describing likely take a while.
>
> Not sure what my one is but it is new(ish).

    If you're doing data intensive operations (like a big update which looks
like what you're doing) it will write many megabytes to the harddrive...
my laptop HDD (5400 rpm) does about 15 MB/s throughput while a standard
desktop 7200rpm drive does 55-60 MB/s throughput. Plus, seek times on a
laptop drive are horrendous.

Re: Adding and filling new column on big table

From
Francisco Reyes
Date:
Jonathan Blitz writes:

> I suppose I could do but I need to install PostgreSQL there and then copy
> over the database.
> Maybe I will give it a try.

I really think that is your best bet.
If for whatever reason that will not be an option perhaps you can just let
the process run over the weekend.. possibly monitor the process from the OS
to make sure it is not frozen.

Don't recall if you mentioned the OS.. is it any unix like os?
If so there are several ways you could check to make sure the process is not
frozen such as iostats, top, vmstats(these from FreeBSD, but most unix like
os should have tools like those if not some with the same name).