Thread: [PERFORM] update from performance question

[PERFORM] update from performance question

From
"Armand Pirvu (home)"
Date:
Hi


Running 9.5.2

I have the following update and run into a bit of a trouble . I realize the tables involved have quite some data but
heregoes 


UPDATE
    tf_transaction_item_person TRANS
SET
    general_ledger_code = PURCH.general_ledger_code,
    general_ledger_code_desc = PURCH.general_ledger_code_desc,
    update_datetime = now()::timestamp(0)
FROM
   tf_purchases_person PURCH
WHERE
    PURCH.general_ledger_code != '' AND
    TRANS.purchased_log_id = PURCH.purchased_log_id AND
    TRANS.general_ledger_code != PURCH.general_ledger_code
;


                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Update on tf_transaction_item_person trans  (cost=1432701.45..2209776.18 rows=3405170 width=231)
   ->  Hash Join  (cost=1432701.45..2209776.18 rows=3405170 width=231)
         Hash Cond: ((trans.purchased_log_id)::text = (purch.purchased_log_id)::text)
         Join Filter: ((trans.general_ledger_code)::text <> (purch.general_ledger_code)::text)
         ->  Seq Scan on tf_transaction_item_person trans  (cost=0.00..160488.20 rows=3405920 width=257)
         ->  Hash  (cost=970842.28..970842.28 rows=20743134 width=56)
               ->  Seq Scan on tf_purchases_person purch  (cost=0.00..970842.28 rows=20743134 width=56)
                     Filter: ((general_ledger_code)::text <> ''::text)





                              Table "tf_transaction_item_person"
             Column              |            Type             |               Modifiers
---------------------------------+-----------------------------+----------------------------------------
 person_transaction_item_id      | character varying(100)      | not null
 person_transaction_id           | character varying(100)      | not null
 transaction_id                  | character varying(100)      |
 show_id                         | character varying(100)      | not null
 client_id                       | integer                     | not null
 company_id                      | integer                     | not null
 person_id                       | integer                     | not null
 badge_id                        | character varying(100)      | not null
 transaction_type_code           | character varying(100)      | not null
 payment_type_code               | character varying(100)      | not null
 purchased_log_id                | character varying(100)      | not null
 item_id                         | character varying(100)      | not null
 transaction_amount              | double precision            | not null
 add_by_user_id                  | character varying(100)      | not null
 add_date                        | timestamp without time zone | not null
 transaction_items_person_source | character varying(1)        | not null
 update_datetime                 | timestamp without time zone |
 is_deleted                      | character varying(5)        |
 reg_is_deleted                  | character varying(5)        | not null default ''::character varying
 birst_is_deleted                | character varying(5)        | not null default ''::character varying
 general_ledger_code             | character varying(20)       |
 general_ledger_code_desc        | character varying(50)       |
Indexes:
    "tf_transaction_item_person_pkey" PRIMARY KEY, btree (person_transaction_item_id)
    "tf_tip_idx" btree (client_id, update_datetime)
    "tf_tip_isdel_idx" btree (show_id, person_transaction_item_id)


                               Table "tf_purchases_person"
           Column            |            Type             |               Modifiers
-----------------------------+-----------------------------+----------------------------------------
 purchased_log_id            | character varying(100)      | not null
 show_id                     | character varying(100)      |
 client_id                   | integer                     |
 company_id                  | integer                     |
 person_id                   | integer                     |
 badge_id                    | character varying(100)      |
 item_id                     | character varying(100)      |
 general_ledger_code         | character varying(100)      |
 purchase_status             | character varying(100)      |
 purchase_quantity           | integer                     |
 purchase_rate               | double precision            |
 purchase_total              | double precision            |
 tax_rate                    | double precision            |
 tax_total                   | double precision            |
 final_total                 | double precision            |
 add_by_user_id              | character varying(100)      |
 add_date                    | timestamp without time zone |
 purchase_item_person_source | character varying(1)        |
 is_deleted                  | character varying(5)        |
 update_datetime             | timestamp without time zone |
 reg_is_deleted              | character varying(5)        | not null default ''::character varying
 birst_is_deleted            | character varying(5)        | not null default ''::character varying
 general_ledger_code_desc    | character varying(50)       |
Indexes:
    "tf_purchases_person_pkey" PRIMARY KEY, btree (purchased_log_id)
    "foo1" btree (general_ledger_code, show_id, purchased_log_id)
    "tf_pp_genl_idx" btree (show_id, general_ledger_code, general_ledger_code_desc)
    "tf_pp_idx" btree (client_id, update_datetime)
    "tf_pp_isdel_idx" btree (show_id, purchased_log_id)



I looked at the counts to see which conditions are getting me the least amount of records relative to the tables’
countsand attempt some indexing 


birstdb=# select count(*) from tf_transaction_item_person;
  count
---------
 3405920
(1 row)
birstdb=# select count(*) from tf_purchases_person;
  count
----------
 20747702
(1 row)
select count(TRANS.purchased_log_id)
from

    tf_transaction_item_person TRANS,
    tf_purchases_person PURCH
WHERE
    PURCH.general_ledger_code != '' AND
    TRANS.show_id = PURCH.show_id AND
    TRANS.purchased_log_id = PURCH.purchased_log_id AND
    TRANS.general_ledger_code != PURCH.general_ledger_code
;
 count
-------
     0

select count(TRANS.purchased_log_id)
from

    tf_transaction_item_person TRANS,
    tf_purchases_person PURCH
WHERE
    TRANS.show_id = PURCH.show_id AND
    TRANS.purchased_log_id = PURCH.purchased_log_id AND
    TRANS.general_ledger_code != PURCH.general_ledger_code
;
 count
-------
     0




create index foo1 on tf_purchases_person (general_ledger_code, show_id, purchased_log_id);
create index foo2 on tf_transaction_item_person (general_ledger_code, show_id, purchased_log_id);



No real improvement

I went even this route


UPDATE
     tf_transaction_item_person TRANS
SET
    general_ledger_code = PURCH.general_ledger_code,
    general_ledger_code_desc = PURCH.general_ledger_code_desc,
    update_datetime = now()::timestamp(0)
FROM
(
select a.show_id ,a.general_ledger_code, a.purchased_log_id, a.general_ledger_code_desc
from
tf_transaction_item_person a left join tf_purchases_person b
on
    b.general_ledger_code != '' AND
    b.show_id=a.show_id AND
    b.purchased_log_id = a.purchased_log_id AND
    b.general_ledger_code = a.general_ledger_code
where b.general_ledger_code is null
) PURCH
WHERE
    TRANS.purchased_log_id = PURCH.purchased_log_id AND
    TRANS.show_id = PURCH.show_id
;

                                                                                           QUERY PLAN
                                                        


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------
 Update on tf_transaction_item_person trans  (cost=19194432.16..19467044.63 rows=34859 width=387)
   ->  Nested Loop Anti Join  (cost=19194432.16..19467044.63 rows=34859 width=387)
         ->  Merge Join  (cost=19194431.59..19254383.78 rows=34859 width=415)
               Merge Cond: (((trans.show_id)::text = (a.show_id)::text) AND ((trans.purchased_log_id)::text =
(a.purchased_log_id)::text))
               ->  Sort  (cost=9603638.01..9612152.81 rows=3405920 width=199)
                     Sort Key: trans.show_id, trans.purchased_log_id
                     ->  Index Scan using tf_tip_isdel_idx on tf_transaction_item_person trans  (cost=0.56..8908143.78
rows=3405920width=199) 
               ->  Materialize  (cost=9590793.59..9607823.19 rows=3405920 width=216)
                     ->  Sort  (cost=9590793.59..9599308.39 rows=3405920 width=216)
                           Sort Key: a.show_id, a.purchased_log_id
                           ->  Index Scan using foo2 on tf_transaction_item_person a  (cost=0.56..8872017.35
rows=3405920width=216) 
         ->  Index Scan using foo1 on tf_purchases_person b  (cost=0.56..6.09 rows=1 width=46)
               Index Cond: (((general_ledger_code)::text = (a.general_ledger_code)::text) AND ((show_id)::text =
(a.show_id)::text)AND ((purchased_log_id)::text = (a.purchased 
_log_id)::text))
               Filter: ((general_ledger_code)::text <> ''::text)
(14 rows)


explain analyze took well in excess of 10 minutes

The idea is an update needs to find the records to update to begin with.
The inner select with the above mentioned indexes runs in

                                                                              QUERY PLAN
                                               

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Anti Join  (cost=1.12..15466467.80 rows=3405920 width=176) (actual time=245.940..63987.645 rows=3405920 loops=1)
   Merge Cond: ((trans.general_ledger_code)::text = (purch.general_ledger_code)::text)
   Join Filter: ((trans.purchased_log_id)::text = (purch.purchased_log_id)::text)
   ->  Index Scan using foo2 on tf_transaction_item_person trans  (cost=0.56..8162817.35 rows=3405920 width=200)
(actualtime=245.928..59480.444 rows=3405920 loops=1) 
   ->  Index Only Scan using foo1 on tf_purchases_person purch  (cost=0.56..7243277.80 rows=20743134 width=30) (never
executed)
         Filter: ((general_ledger_code)::text <> ''::text)
         Heap Fetches: 0
 Planning time: 216.738 ms
 Execution time: 64901.139 ms


as opposed to a good 5 minutes

The update itself




I am at a bit of a loss.

Any ideas / pointers as to what I could do to make things better ?



Thanks in advance


- Armand



Re: [PERFORM] update from performance question

From
Albe Laurenz
Date:
Armand Pirvu wrote:
> Running 9.5.2
> 
> I have the following update and run into a bit of a trouble . I realize the tables
> involved have quite some data but here goes
> 
> 
> UPDATE
>     tf_transaction_item_person TRANS
> SET
>     general_ledger_code = PURCH.general_ledger_code,
>     general_ledger_code_desc = PURCH.general_ledger_code_desc,
>     update_datetime = now()::timestamp(0)
> FROM
>    tf_purchases_person PURCH
> WHERE
>     PURCH.general_ledger_code != '' AND
>     TRANS.purchased_log_id = PURCH.purchased_log_id AND
>     TRANS.general_ledger_code != PURCH.general_ledger_code
> ;
[...]
>                               Table "tf_transaction_item_person"
[...]
> Indexes:
>     "tf_transaction_item_person_pkey" PRIMARY KEY, btree (person_transaction_item_id)
>     "tf_tip_idx" btree (client_id, update_datetime)
>     "tf_tip_isdel_idx" btree (show_id, person_transaction_item_id)

You don't show EXPLAIN (ANALYZE, BUFFERS) output for the problematic query,
so it is difficult to say where the time is spent.

But since you say that the same query without the UPDATE also takes more than
a minute, the duration for the UPDATE is not outrageous.
It may well be that much of the time is spent updating the index
entries for the 3.5 million affected rows.

I don't know if dropping indexes for the duration of the query and recreating
them afterwards would be a net win, but you should consider it.

It may be that the only ways to improve performance would be general
things like faster I/O, higher max_wal_size setting, and, most of all,
enough RAM in the machine to contain the whole database.

Yours,
Laurenz Albe

Re: [PERFORM] update from performance question

From
"Armand Pirvu (home)"
Date:
Hi Albe


Thank you for your reply

The query changed a bit


explain (analyze, buffers)
UPDATE
    csischema.tf_transaction_item_person TRANS
SET
    general_ledger_code = PURCH.general_ledger_code,
    general_ledger_code_desc = PURCH.general_ledger_code_desc,
    update_datetime = now()::timestamp(0)
FROM
    csischema.tf_purchases_person PURCH
WHERE
    PURCH.general_ledger_code IS NOT NULL AND
    TRANS.purchased_log_id = PURCH.purchased_log_id AND
    TRANS.general_ledger_code IS NULL
;

                                                             ^
select count(*) from csischema.tf_transaction_item_person where general_ledger_code is null;
  count
---------
 1393515

select count(*) from csischema.tf_transaction_item_person ;
  count
---------
 3408380

select count(*) from csischema.tf_purchases_person;
  count
----------
 20760731

select count(*) from csischema.tf_purchases_person where general_ledger_code IS NOT NULL;
  count
---------
 6909204


But the kicker is this

A select count to see how many records will be used for update gets me zero


select count(trans.purchased_log_id) from
    csischema.tf_transaction_item_person TRANS,
    csischema.tf_purchases_person PURCH
WHERE
    PURCH.general_ledger_code IS NOT NULL AND
    TRANS.purchased_log_id = PURCH.purchased_log_id AND
    TRANS.general_ledger_code IS NULL
;
 count
-------
     0
(1 row)

Considering this , I wonder if an index on csischema.tf_purchases_person (purchased_log_id, general_ledger_code) and
oneon tf_transaction_item_person (purchased_log_id, general_ledger_code) would not help ? 

This is what bugs me.

I got the explain out


without indexes


                                                                          QUERY PLAN
                                      

--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on tf_transaction_item_person trans  (cost=1164684.43..1572235.51 rows=507748 width=227) (actual
time=230320.060..230320.060rows=0 loops=1) 
   Buffers: shared hit=120188 read=876478, temp read=93661 written=93631
   ->  Hash Join  (cost=1164684.43..1572235.51 rows=507748 width=227) (actual time=230320.054..230320.054 rows=0
loops=1)
         Hash Cond: ((trans.purchased_log_id)::text = (purch.purchased_log_id)::text)
         Buffers: shared hit=120188 read=876478, temp read=93661 written=93631
         ->  Seq Scan on tf_transaction_item_person trans  (cost=0.00..228945.93 rows=1542683 width=199) (actual
time=13.312..52046.689rows=1393515 loops=1) 
               Filter: (general_ledger_code IS NULL)
               Rows Removed by Filter: 2014865
               Buffers: shared read=191731
         ->  Hash  (cost=1012542.32..1012542.32 rows=6833049 width=52) (actual time=152339.000..152339.000 rows=6909204
loops=1)
               Buckets: 524288  Batches: 16  Memory Usage: 39882kB
               Buffers: shared hit=120188 read=684747, temp written=57588
               ->  Seq Scan on tf_purchases_person purch  (cost=0.00..1012542.32 rows=6833049 width=52) (actual
time=8.252..140992.716rows=6909204 loops=1) 
                     Filter: (general_ledger_code IS NOT NULL)
                     Rows Removed by Filter: 13851527
                     Buffers: shared hit=120188 read=684747
 Planning time: 0.867 ms
 Execution time: 230328.223 ms
(18 rows)



with indexes



                                                                          QUERY PLAN
                                      

--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on tf_transaction_item_person trans  (cost=1161742.22..1567806.87 rows=497927 width=228) (actual
time=155171.388..155171.388rows=0 loops=1) 
   Buffers: shared hit=88095 read=908571, temp read=93661 written=93631
   ->  Hash Join  (cost=1161742.22..1567806.87 rows=497927 width=228) (actual time=155171.358..155171.358 rows=0
loops=1)
         Hash Cond: ((trans.purchased_log_id)::text = (purch.purchased_log_id)::text)
         Buffers: shared hit=88095 read=908571, temp read=93661 written=93631
         ->  Seq Scan on tf_transaction_item_person trans  (cost=0.00..228945.93 rows=1542683 width=199) (actual
time=16.801..31016.221rows=1393515 loops=1) 
               Filter: (general_ledger_code IS NULL)
               Rows Removed by Filter: 2014865
               Buffers: shared read=191731
         ->  Hash  (cost=1012542.32..1012542.32 rows=6700872 width=53) (actual time=105101.946..105101.946 rows=6909204
loops=1)
               Buckets: 524288  Batches: 16  Memory Usage: 39882kB
               Buffers: shared hit=88095 read=716840, temp written=57588
               ->  Seq Scan on tf_purchases_person purch  (cost=0.00..1012542.32 rows=6700872 width=53) (actual
time=13.823..95970.776rows=6909204 loops=1) 
                     Filter: (general_ledger_code IS NOT NULL)
                     Rows Removed by Filter: 13851527
                     Buffers: shared hit=88095 read=716840
 Planning time: 90.409 ms
 Execution time: 155179.181 ms
(18 rows)

Thanks
Armand



On Apr 19, 2017, at 3:06 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

> Armand Pirvu wrote:
>> Running 9.5.2
>>
>> I have the following update and run into a bit of a trouble . I realize the tables
>> involved have quite some data but here goes
>>
>>
>> UPDATE
>>    tf_transaction_item_person TRANS
>> SET
>>    general_ledger_code = PURCH.general_ledger_code,
>>    general_ledger_code_desc = PURCH.general_ledger_code_desc,
>>    update_datetime = now()::timestamp(0)
>> FROM
>>   tf_purchases_person PURCH
>> WHERE
>>    PURCH.general_ledger_code != '' AND
>>    TRANS.purchased_log_id = PURCH.purchased_log_id AND
>>    TRANS.general_ledger_code != PURCH.general_ledger_code
>> ;
> [...]
>>                              Table "tf_transaction_item_person"
> [...]
>> Indexes:
>>    "tf_transaction_item_person_pkey" PRIMARY KEY, btree (person_transaction_item_id)
>>    "tf_tip_idx" btree (client_id, update_datetime)
>>    "tf_tip_isdel_idx" btree (show_id, person_transaction_item_id)
>
> You don't show EXPLAIN (ANALYZE, BUFFERS) output for the problematic query,
> so it is difficult to say where the time is spent.
>
> But since you say that the same query without the UPDATE also takes more than
> a minute, the duration for the UPDATE is not outrageous.
> It may well be that much of the time is spent updating the index
> entries for the 3.5 million affected rows.
>
> I don't know if dropping indexes for the duration of the query and recreating
> them afterwards would be a net win, but you should consider it.
>
> It may be that the only ways to improve performance would be general
> things like faster I/O, higher max_wal_size setting, and, most of all,
> enough RAM in the machine to contain the whole database.
>
> Yours,
> Laurenz Albe