Re: Slow update statement

From: Patrick Hatcher
Subject: Re: Slow update statement
Date: ,
Msg-id: 42F6BCD3.4070600@comcast.net
(view: Whole thread, Raw)
In response to: Re: Slow update statement  (John A Meinel)
Responses: Re: Slow update statement  (Tom Lane)
List: pgsql-performance

Tree view

Slow update statement  (Patrick Hatcher, )
 Re: Slow update statement  (John A Meinel, )
  Re: Slow update statement  (Patrick Hatcher, )
   Re: Slow update statement  (Tom Lane, )
    Re: Slow update statement  (Patrick Hatcher, )
 Re: Slow update statement  (Tom Lane, )
  Re: Slow update statement  (Patrick Hatcher, )
   Re: Slow update statement  (Tom Lane, )

Sorry went out of town for the weekend.  The update did occur, but I
have no idea when it finished.

Here's the actual query and the explain Update:
cdm.bcp_ddw_ck_cus = 12.7 M
cdm.cdm_ddw_customer = 12.8M

explain
update cdm.cdm_ddw_customer
                                        set indiv_fkey = b.indiv_fkey
                                        from cdm.bcp_ddw_ck_cus b
                                        where
cdm.cdm_ddw_customer.cus_nbr = b.cus_num;


 Hash Join  (cost=1246688.42..4127248.31 rows=12702676 width=200)
   Hash Cond: ("outer".cus_num = "inner".cus_nbr)
   ->  Seq Scan on bcp_ddw_ck_cus b  (cost=0.00..195690.76 rows=12702676
width=16)
   ->  Hash  (cost=874854.34..874854.34 rows=12880834 width=192)
         ->  Seq Scan on cdm_ddw_customer  (cost=0.00..874854.34
rows=12880834 width=192)


John A Meinel wrote:

>Patrick Hatcher wrote:
>
>
>>[Reposted from General section with updated information]
>>Pg 7.4.5
>>
>>I'm running an update statement on about 12 million records using the
>>following query:
>>
>>Update table_A
>>set F1 = b.new_data
>>from table_B b
>>where b.keyfield = table_A.keyfield
>>
>>both keyfields are indexed, all other keys in table_A were dropped, yet
>>this job has been running over 15 hours.  Is
>>this normal?
>>
>>
>
>Can you do an EXPLAIN UPDATE so that we can have an idea what the
>planner is trying to do?
>
>My personal concern is if it doing something like pulling in all rows
>from b, and then one by one updating table_A, but as it is going, it
>can't retire any dead rows, because you are still in a transaction. So
>you are getting a lot of old rows, which it has to pull in to realize it
>was old.
>
>How many rows are in table_B?
>
>I can see that possibly doing it in smaller chunks might be faster, as
>would inserting into another table. But I would do more of a test and
>see what happens.
>
>John
>=:->
>
>
>
>>I stopped the process the first time after 3 hours of running due to
>>excessive log rotation and reset the conf file to these settings:
>>
>>
>>wal_buffers = 64                # min 4, 8KB each
>>
>># - Checkpoints -
>>
>>checkpoint_segments = 128       # in logfile segments, min 1, 16MB each
>>checkpoint_timeout = 1800       # range 30-3600, in seconds
>>#checkpoint_warning = 30        # 0 is off, in seconds
>>#commit_delay = 0               # range 0-100000, in microseconds
>>#commit_siblings = 5            # range 1-1000
>>
>>
>>Would it just be quicker to run a JOIN statement to a temp file and then
>>reinsert?
>>TIA Patrick
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 9: In versions below 8.0, the planner will ignore your desire to
>>      choose an index scan if your joining column's datatypes do not
>>      match
>>
>>
>>
>
>
>


pgsql-performance by date:

From: Patrick Hatcher
Date:
Subject: Re: Slow update statement
From: Kari Lavikka
Date:
Subject: Re: Finding bottleneck