Thread: update query taking 24+ hours
Hello, I have postgres 8.1 on a linux box: 2.6Ghz P4, 1.5GB ram, 320GB hard drive. I'm performing an update between two large tables and so far it's been running for 24+ hours. I have two tables: Master: x int4 y int4 val1 int2 val2 int2 Import: x int4 y int4 val int2 Each table has about 100 million rows. I want to populate val2 in Master with val from Import where the two tables match on x and y. So, my query looks like: UPDATE Master SET val2=Import.val WHERE Master.x=Import.x AND Master.y=Import.y; Both tables have indexes on the x and y columns. Will that help? Is there a better way to do this? In each table x,y are unique, does that make a difference? ie: would it be faster to run some kind of query, or loop, that just goes through each row in Import and updates Master (val2=val) where x=x and y=y? If this approach would be better how to construct such a SQL statement? The other weird thing is that when I monitor the system with xload it shows two bars of load, and the hard drive is going nuts, so far my database directory has grown by 25GB, however when I run "top" the system shows 98% idle and the postmaster process is usually only between 1-2% CPU, although it is using 50% (750MB) ram. Also the process shows up with a "D" status in the "S" column. Not sure what is going on. If the size of the tables makes what I'm trying to do insane, or if I just have a bad SQL approach, or if something is wrong with my postgres configuration. Really appreciate any help! Thanks! Ken
Ken <postgres@kwasnicki.com> writes: > I have postgres 8.1 on a linux box: 2.6Ghz P4, 1.5GB ram, 320GB hard > drive. I'm performing an update between two large tables and so far > it's been running for 24+ hours. > UPDATE Master SET val2=Import.val WHERE Master.x=Import.x AND > Master.y=Import.y; What does EXPLAIN say about that? (Don't try EXPLAIN ANALYZE, but a plain EXPLAIN should be quick enough.) > Both tables have indexes on the x and y columns. Will that help? A two-column index would have helped a lot more, probably, although with so many rows to process I'm not sure that indexes are useful anyway. For really bulk updates a hashjoin or sort-and-mergejoin plan is probably the best bet. BTW, both of those would require plenty of work_mem to run fast ... what have you got work_mem set to? And possibly even more to the point, do you have any foreign key constraints leading into or out of the Master table? regards, tom lane
Thanks Tom! explain output:Merge Join (cost=60454519.54..70701005.93 rows=682951183 width=22) Merge Cond: (("outer".y = "inner".y)AND ("outer".x = "inner".x)) -> Sort (cost=41812068.08..42304601.78 rows=197013479 width=20) Sort Key:Master.y, Master.x -> Seq Scan on Master (cost=0.00..3129037.79 rows=197013479 width=20) -> Sort (cost=18642451.46..18879400.92 rows=94779784 width=10) Sort Key: Import.y, Import.x -> Seq Scan on Import (cost=0.00..1460121.84 rows=94779784 width=10) Don't really understand all those numbers but they look big, to me. work_mem is set to 262144. should it be bigger? i have 1.5GB ram on the system. also i set /proc/sys/kernel/shmmax to 256000000. too big, too small? There are no foreign key constraints on either table. I don't know what hashjoin or sort-and-mergejoin are but I will look into them. Thanks! Ken Tom Lane wrote: > Ken <postgres@kwasnicki.com> writes: > >> I have postgres 8.1 on a linux box: 2.6Ghz P4, 1.5GB ram, 320GB hard >> drive. I'm performing an update between two large tables and so far >> it's been running for 24+ hours. >> UPDATE Master SET val2=Import.val WHERE Master.x=Import.x AND >> Master.y=Import.y; >> > > What does EXPLAIN say about that? (Don't try EXPLAIN ANALYZE, > but a plain EXPLAIN should be quick enough.) > > >> Both tables have indexes on the x and y columns. Will that help? >> > > A two-column index would have helped a lot more, probably, although > with so many rows to process I'm not sure that indexes are useful > anyway. For really bulk updates a hashjoin or sort-and-mergejoin > plan is probably the best bet. > > BTW, both of those would require plenty of work_mem to run fast > ... what have you got work_mem set to? > > And possibly even more to the point, do you have any foreign key > constraints leading into or out of the Master table? > > regards, tom lane > >