Re: Speed up UPDATE query? - Mailing list pgsql-sql

From Scott Marlowe
Subject Re: Speed up UPDATE query?
Date
Msg-id dcc563d10910311708s281a81deg81a3968cbc4073ab@mail.gmail.com
Whole thread Raw
In response to Speed up UPDATE query?  (Lee Hachadoorian <lee.hachadoorian@gmail.com>)
List pgsql-sql
On Thu, Oct 29, 2009 at 11:50 AM, Lee Hachadoorian
<lee.hachadoorian@gmail.com> wrote:
> I'm trying to update several tables (all child tables of the same
> parent), and as the number of records increases, the length of time it
> takes to run the update is shooting up exponentially. I have imported
> the new data to an import table, and then join the import table to the
> update table. The update statement looks like:
>
> UPDATE
>        household_2000 h
> SET
>        hhincome = new_hhincome
> FROM (
>        SELECT
>                serial, new_hhincome
>        FROM
>                import
>        WHERE
>                year = 2000
>        ) r
> WHERE
>        h.serial = r.serial
>
> household_2000 is a child table of a household table that, as you
> might guess, only contains records from the year 2000. I am putting a
> year = 2000 restriction on the import table and then linking on the
> unique identifier.
>
> For different child tables, this is how long the update takes to run
> (numbers are approximate):
>
> Records         Cost (via EXPLAIN)            Actual time
> 460,000         300,000                            23 seconds
> 510,000         320,000                            26 seconds
> 1.2 million      670,000                            3:16
> 1.3 million      820,000                            3:25
> 6.2 million      2.7 million                         ~2.5 hours
>
> So, the cost estimate given by EXPLAIN seems to be roughly
> proportional to the number of records in the dataset, but the actual
> time it takes to run seems to increase faster than the cost, even for
> the small and medium tables, and shoots through the roof for the large
> tables. Since I need to run this on additional child tables that are
> larger (the largest is 14 million records), I want to know what I can
> do to speed up the query.
>
> Here's the EXPLAIN for the query. Note that the query plan is the same
> for the small, medium, and large tables.
>
> Hash Join  (cost=1268532.36..2379787.06 rows=5465837 width=1128)
>  Hash Cond: (import_6_17_rev_hh.serial = h.serial)
>  ->  Bitmap Heap Scan on import_6_17_rev_hh
> (cost=126551.72..308495.69 rows=5465837 width=8)
>        Recheck Cond: (year = 1990)
>        ->  Bitmap Index Scan on import_6_17_rev_hh_pkey
> (cost=0.00..125185.26 rows=5465837 width=0)
>              Index Cond: (year = 1990)
>  ->  Hash  (cost=295596.06..295596.06 rows=5527406 width=1124)
>        ->  Seq Scan on household_1990 h  (cost=0.00..295596.06
> rows=5527406 width=1124)

Any chance of getting the output of explain analyze for a fast and a
slow run of this query?


pgsql-sql by date:

Previous
From: Thomas Kellerer
Date:
Subject: Re: pg_get_functiondef and overloaded functions
Next
From: Andrew Hall
Date:
Subject: PostgreSQL Security/Roles/Grants