Re: [PERFORM] update from performance question - Mailing list pgsql-performance

From Armand Pirvu (home)
Subject Re: [PERFORM] update from performance question
Date
Msg-id CAB35198-1D1F-4FB0-A574-338B6DE574E9@gmail.com
Whole thread Raw
In response to Re: [PERFORM] update from performance question  (Albe Laurenz <laurenz.albe@wien.gv.at>)
List pgsql-performance
Hi Albe


Thank you for your reply

The query changed a bit


explain (analyze, buffers)
UPDATE
    csischema.tf_transaction_item_person TRANS
SET
    general_ledger_code = PURCH.general_ledger_code,
    general_ledger_code_desc = PURCH.general_ledger_code_desc,
    update_datetime = now()::timestamp(0)
FROM
    csischema.tf_purchases_person PURCH
WHERE
    PURCH.general_ledger_code IS NOT NULL AND
    TRANS.purchased_log_id = PURCH.purchased_log_id AND
    TRANS.general_ledger_code IS NULL
;

                                                             ^
select count(*) from csischema.tf_transaction_item_person where general_ledger_code is null;
  count
---------
 1393515

select count(*) from csischema.tf_transaction_item_person ;
  count
---------
 3408380

select count(*) from csischema.tf_purchases_person;
  count
----------
 20760731

select count(*) from csischema.tf_purchases_person where general_ledger_code IS NOT NULL;
  count
---------
 6909204


But the kicker is this

A select count to see how many records will be used for update gets me zero


select count(trans.purchased_log_id) from
    csischema.tf_transaction_item_person TRANS,
    csischema.tf_purchases_person PURCH
WHERE
    PURCH.general_ledger_code IS NOT NULL AND
    TRANS.purchased_log_id = PURCH.purchased_log_id AND
    TRANS.general_ledger_code IS NULL
;
 count
-------
     0
(1 row)

Considering this , I wonder if an index on csischema.tf_purchases_person (purchased_log_id, general_ledger_code) and
oneon tf_transaction_item_person (purchased_log_id, general_ledger_code) would not help ? 

This is what bugs me.

I got the explain out


without indexes


                                                                          QUERY PLAN
                                      

--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on tf_transaction_item_person trans  (cost=1164684.43..1572235.51 rows=507748 width=227) (actual
time=230320.060..230320.060rows=0 loops=1) 
   Buffers: shared hit=120188 read=876478, temp read=93661 written=93631
   ->  Hash Join  (cost=1164684.43..1572235.51 rows=507748 width=227) (actual time=230320.054..230320.054 rows=0
loops=1)
         Hash Cond: ((trans.purchased_log_id)::text = (purch.purchased_log_id)::text)
         Buffers: shared hit=120188 read=876478, temp read=93661 written=93631
         ->  Seq Scan on tf_transaction_item_person trans  (cost=0.00..228945.93 rows=1542683 width=199) (actual
time=13.312..52046.689rows=1393515 loops=1) 
               Filter: (general_ledger_code IS NULL)
               Rows Removed by Filter: 2014865
               Buffers: shared read=191731
         ->  Hash  (cost=1012542.32..1012542.32 rows=6833049 width=52) (actual time=152339.000..152339.000 rows=6909204
loops=1)
               Buckets: 524288  Batches: 16  Memory Usage: 39882kB
               Buffers: shared hit=120188 read=684747, temp written=57588
               ->  Seq Scan on tf_purchases_person purch  (cost=0.00..1012542.32 rows=6833049 width=52) (actual
time=8.252..140992.716rows=6909204 loops=1) 
                     Filter: (general_ledger_code IS NOT NULL)
                     Rows Removed by Filter: 13851527
                     Buffers: shared hit=120188 read=684747
 Planning time: 0.867 ms
 Execution time: 230328.223 ms
(18 rows)



with indexes



                                                                          QUERY PLAN
                                      

--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on tf_transaction_item_person trans  (cost=1161742.22..1567806.87 rows=497927 width=228) (actual
time=155171.388..155171.388rows=0 loops=1) 
   Buffers: shared hit=88095 read=908571, temp read=93661 written=93631
   ->  Hash Join  (cost=1161742.22..1567806.87 rows=497927 width=228) (actual time=155171.358..155171.358 rows=0
loops=1)
         Hash Cond: ((trans.purchased_log_id)::text = (purch.purchased_log_id)::text)
         Buffers: shared hit=88095 read=908571, temp read=93661 written=93631
         ->  Seq Scan on tf_transaction_item_person trans  (cost=0.00..228945.93 rows=1542683 width=199) (actual
time=16.801..31016.221rows=1393515 loops=1) 
               Filter: (general_ledger_code IS NULL)
               Rows Removed by Filter: 2014865
               Buffers: shared read=191731
         ->  Hash  (cost=1012542.32..1012542.32 rows=6700872 width=53) (actual time=105101.946..105101.946 rows=6909204
loops=1)
               Buckets: 524288  Batches: 16  Memory Usage: 39882kB
               Buffers: shared hit=88095 read=716840, temp written=57588
               ->  Seq Scan on tf_purchases_person purch  (cost=0.00..1012542.32 rows=6700872 width=53) (actual
time=13.823..95970.776rows=6909204 loops=1) 
                     Filter: (general_ledger_code IS NOT NULL)
                     Rows Removed by Filter: 13851527
                     Buffers: shared hit=88095 read=716840
 Planning time: 90.409 ms
 Execution time: 155179.181 ms
(18 rows)

Thanks
Armand



On Apr 19, 2017, at 3:06 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

> Armand Pirvu wrote:
>> Running 9.5.2
>>
>> I have the following update and run into a bit of a trouble . I realize the tables
>> involved have quite some data but here goes
>>
>>
>> UPDATE
>>    tf_transaction_item_person TRANS
>> SET
>>    general_ledger_code = PURCH.general_ledger_code,
>>    general_ledger_code_desc = PURCH.general_ledger_code_desc,
>>    update_datetime = now()::timestamp(0)
>> FROM
>>   tf_purchases_person PURCH
>> WHERE
>>    PURCH.general_ledger_code != '' AND
>>    TRANS.purchased_log_id = PURCH.purchased_log_id AND
>>    TRANS.general_ledger_code != PURCH.general_ledger_code
>> ;
> [...]
>>                              Table "tf_transaction_item_person"
> [...]
>> Indexes:
>>    "tf_transaction_item_person_pkey" PRIMARY KEY, btree (person_transaction_item_id)
>>    "tf_tip_idx" btree (client_id, update_datetime)
>>    "tf_tip_isdel_idx" btree (show_id, person_transaction_item_id)
>
> You don't show EXPLAIN (ANALYZE, BUFFERS) output for the problematic query,
> so it is difficult to say where the time is spent.
>
> But since you say that the same query without the UPDATE also takes more than
> a minute, the duration for the UPDATE is not outrageous.
> It may well be that much of the time is spent updating the index
> entries for the 3.5 million affected rows.
>
> I don't know if dropping indexes for the duration of the query and recreating
> them afterwards would be a net win, but you should consider it.
>
> It may be that the only ways to improve performance would be general
> things like faster I/O, higher max_wal_size setting, and, most of all,
> enough RAM in the machine to contain the whole database.
>
> Yours,
> Laurenz Albe



pgsql-performance by date:

Previous
From: Albe Laurenz
Date:
Subject: Re: [PERFORM] update from performance question
Next
From: Marco Renzi
Date:
Subject: [PERFORM] Query with no result set, really really slow adding ORBDER BY / LIMIT clause