On Thu, Oct 29, 2009 at 11:50 AM, Lee Hachadoorian
<lee.hachadoorian@gmail.com> wrote:
> I'm trying to update several tables (all child tables of the same
> parent), and as the number of records increases, the length of time it
> takes to run the update is shooting up exponentially. I have imported
> the new data to an import table, and then join the import table to the
> update table. The update statement looks like:
>
> UPDATE
> household_2000 h
> SET
> hhincome = new_hhincome
> FROM (
> SELECT
> serial, new_hhincome
> FROM
> import
> WHERE
> year = 2000
> ) r
> WHERE
> h.serial = r.serial
>
> household_2000 is a child table of a household table that, as you
> might guess, only contains records from the year 2000. I am putting a
> year = 2000 restriction on the import table and then linking on the
> unique identifier.
>
> For different child tables, this is how long the update takes to run
> (numbers are approximate):
>
> Records Cost (via EXPLAIN) Actual time
> 460,000 300,000 23 seconds
> 510,000 320,000 26 seconds
> 1.2 million 670,000 3:16
> 1.3 million 820,000 3:25
> 6.2 million 2.7 million ~2.5 hours
>
> So, the cost estimate given by EXPLAIN seems to be roughly
> proportional to the number of records in the dataset, but the actual
> time it takes to run seems to increase faster than the cost, even for
> the small and medium tables, and shoots through the roof for the large
> tables. Since I need to run this on additional child tables that are
> larger (the largest is 14 million records), I want to know what I can
> do to speed up the query.
>
> Here's the EXPLAIN for the query. Note that the query plan is the same
> for the small, medium, and large tables.
>
> Hash Join (cost=1268532.36..2379787.06 rows=5465837 width=1128)
> Hash Cond: (import_6_17_rev_hh.serial = h.serial)
> -> Bitmap Heap Scan on import_6_17_rev_hh
> (cost=126551.72..308495.69 rows=5465837 width=8)
> Recheck Cond: (year = 1990)
> -> Bitmap Index Scan on import_6_17_rev_hh_pkey
> (cost=0.00..125185.26 rows=5465837 width=0)
> Index Cond: (year = 1990)
> -> Hash (cost=295596.06..295596.06 rows=5527406 width=1124)
> -> Seq Scan on household_1990 h (cost=0.00..295596.06
> rows=5527406 width=1124)
Any chance of getting the output of explain analyze for a fast and a
slow run of this query?