Thread: update query taking 24+ hours

update query taking 24+ hours

From

Ken

Date:

14 January 2007, 03:35:35

Hello,
I have postgres 8.1 on a linux box: 2.6Ghz P4, 1.5GB ram, 320GB hard 
drive.  I'm performing an update between two large tables and so far 
it's been running for 24+ hours.

I have two tables:
Master:
x int4
y int4
val1 int2
val2 int2

Import:
x int4
y int4
val int2

Each table has about 100 million rows.  I want to populate val2 in 
Master with val from Import where the two tables match on x and y.
So, my query looks like:
UPDATE Master SET val2=Import.val WHERE Master.x=Import.x AND 
Master.y=Import.y;

Both tables have indexes on the x and y columns.  Will that help?

Is there a better way to do this?  In each table x,y are unique, does 
that make a difference?  ie: would it be faster to run some kind of 
query, or loop, that just goes through each row in Import and updates 
Master (val2=val) where x=x and y=y?
If this approach would be better how to construct such a SQL statement?

The other weird thing is that when I monitor the system with xload it 
shows two bars of load, and the hard drive is going nuts,  so far my 
database directory has grown by 25GB,  however when I run "top" the 
system shows 98% idle and the postmaster process is usually only between 
1-2% CPU, although it is using 50% (750MB) ram.  Also the process shows 
up with a "D" status in the "S" column. 

Not sure what is going on.  If the size of the tables makes what I'm 
trying to do insane, or if I just have a bad SQL approach, or if 
something is wrong with my postgres configuration.

Really appreciate any help!
Thanks!
Ken

Re: update query taking 24+ hours

From

Tom Lane

Date:

14 January 2007, 15:30:50

Ken <postgres@kwasnicki.com> writes:
> I have postgres 8.1 on a linux box: 2.6Ghz P4, 1.5GB ram, 320GB hard 
> drive.  I'm performing an update between two large tables and so far 
> it's been running for 24+ hours.
> UPDATE Master SET val2=Import.val WHERE Master.x=Import.x AND 
> Master.y=Import.y;

What does EXPLAIN say about that?  (Don't try EXPLAIN ANALYZE,
but a plain EXPLAIN should be quick enough.)

> Both tables have indexes on the x and y columns.  Will that help?

A two-column index would have helped a lot more, probably, although
with so many rows to process I'm not sure that indexes are useful
anyway.  For really bulk updates a hashjoin or sort-and-mergejoin
plan is probably the best bet.

BTW, both of those would require plenty of work_mem to run fast
... what have you got work_mem set to?

And possibly even more to the point, do you have any foreign key
constraints leading into or out of the Master table?
        regards, tom lane

Re: update query taking 24+ hours

From

Ken

Date:

15 January 2007, 00:48:34

Thanks Tom!
explain output:Merge Join  (cost=60454519.54..70701005.93 rows=682951183 width=22)  Merge Cond: (("outer".y =
"inner".y)AND ("outer".x = "inner".x))  ->  Sort  (cost=41812068.08..42304601.78 rows=197013479 width=20)        Sort
Key:Master.y, Master.x        ->  Seq Scan on Master  (cost=0.00..3129037.79 rows=197013479 
 
width=20)  ->  Sort  (cost=18642451.46..18879400.92 rows=94779784 width=10)        Sort Key: Import.y, Import.x
-> Seq Scan on Import  (cost=0.00..1460121.84 rows=94779784 
 
width=10)

Don't really understand all those numbers but they look big, to me.

work_mem is set to 262144.  should it be bigger?  i have 1.5GB ram on 
the system.  also i set /proc/sys/kernel/shmmax to 256000000.  too big, 
too small?

There are no foreign key constraints on either table. 

I don't know what hashjoin or sort-and-mergejoin are but I will look 
into them.

Thanks!
Ken

Tom Lane wrote:
> Ken <postgres@kwasnicki.com> writes:
>   
>> I have postgres 8.1 on a linux box: 2.6Ghz P4, 1.5GB ram, 320GB hard 
>> drive.  I'm performing an update between two large tables and so far 
>> it's been running for 24+ hours.
>> UPDATE Master SET val2=Import.val WHERE Master.x=Import.x AND 
>> Master.y=Import.y;
>>     
>
> What does EXPLAIN say about that?  (Don't try EXPLAIN ANALYZE,
> but a plain EXPLAIN should be quick enough.)
>
>   
>> Both tables have indexes on the x and y columns.  Will that help?
>>     
>
> A two-column index would have helped a lot more, probably, although
> with so many rows to process I'm not sure that indexes are useful
> anyway.  For really bulk updates a hashjoin or sort-and-mergejoin
> plan is probably the best bet.
>
> BTW, both of those would require plenty of work_mem to run fast
> ... what have you got work_mem set to?
>
> And possibly even more to the point, do you have any foreign key
> constraints leading into or out of the Master table?
>
>             regards, tom lane
>
>