Speed up UPDATE query? - Mailing list pgsql-sql

From Lee Hachadoorian
Subject Speed up UPDATE query?
Date
Msg-id 5ab13580910291050v26bea7e5j83d1a07f11956d39@mail.gmail.com
Whole thread Raw
Responses Re: Speed up UPDATE query?  (Scott Marlowe <scott.marlowe@gmail.com>)
List pgsql-sql
I'm trying to update several tables (all child tables of the same
parent), and as the number of records increases, the length of time it
takes to run the update is shooting up exponentially. I have imported
the new data to an import table, and then join the import table to the
update table. The update statement looks like:

UPDATEhousehold_2000 h
SEThhincome = new_hhincome
FROM (SELECT    serial, new_hhincomeFROM    importWHERE    year = 2000) r
WHEREh.serial = r.serial

household_2000 is a child table of a household table that, as you
might guess, only contains records from the year 2000. I am putting a
year = 2000 restriction on the import table and then linking on the
unique identifier.

For different child tables, this is how long the update takes to run
(numbers are approximate):

Records         Cost (via EXPLAIN)            Actual time
460,000         300,000                            23 seconds
510,000         320,000                            26 seconds
1.2 million      670,000                            3:16
1.3 million      820,000                            3:25
6.2 million      2.7 million                         ~2.5 hours

So, the cost estimate given by EXPLAIN seems to be roughly
proportional to the number of records in the dataset, but the actual
time it takes to run seems to increase faster than the cost, even for
the small and medium tables, and shoots through the roof for the large
tables. Since I need to run this on additional child tables that are
larger (the largest is 14 million records), I want to know what I can
do to speed up the query.

Here's the EXPLAIN for the query. Note that the query plan is the same
for the small, medium, and large tables.

Hash Join  (cost=1268532.36..2379787.06 rows=5465837 width=1128) Hash Cond: (import_6_17_rev_hh.serial = h.serial) ->
BitmapHeap Scan on import_6_17_rev_hh
 
(cost=126551.72..308495.69 rows=5465837 width=8)       Recheck Cond: (year = 1990)       ->  Bitmap Index Scan on
import_6_17_rev_hh_pkey
(cost=0.00..125185.26 rows=5465837 width=0)             Index Cond: (year = 1990) ->  Hash  (cost=295596.06..295596.06
rows=5527406width=1124)       ->  Seq Scan on household_1990 h  (cost=0.00..295596.06
 
rows=5527406 width=1124)

Any help will be appreciated. Thanks,

--Lee

-- 
Lee Hachadoorian
PhD Student, Geography
Program in Earth & Environmental Sciences
CUNY Graduate Center


pgsql-sql by date:

Previous
From: Thomas Kellerer
Date:
Subject: Re: Using information_schema to find about function parameters?
Next
From: Grant Masan
Date:
Subject: slow count(CASE) query