Improving Query - Mailing list pgsql-performance

From Ketema
Subject Improving Query
Date
Msg-id 1193746737.222960.206150@50g2000hsm.googlegroups.com
Whole thread Raw
Responses Re: Improving Query
Re: Improving Query
List pgsql-performance
I have the following query that is a part of a function:

select order_details.tpv_success_id, order_details.tpv_id,
order_details.ver_code, order_details.app_id,
        order_details.acct_id, order_details.first_name || ' ' ||
order_details.last_name as customer_name,
        order_details.order_date as creation_date,
verification_success.checked_out, order_details.csr_name,
        order_details.products,    order_details.promotion,
order_details.division_id, order_details.has_dish_billing_info,
             (select
array_upper(acct_product_data_requirements_details.acct_prod_ids, 1))
as num_prods,
        count(distinct case when provision_issues.account_product_id is not
null and provision_issues.resolved_date is null then
provision_issues.account_product_id end ) as num_open_issues,
        count(case when reconciliations.rec_id is not null and
reconciliation_cancels.rec_cancel_id is null then
reconciliations.rec_id end ) as num_provisioned,
        count(case when reconciliation_cancels.rec_cancel_id is not null
then reconciliation_cancels.rec_cancel_id end ) as num_canceled
        from frontier.order_details
       inner join frontier.verification_success
        on order_details.tpv_success_id =
verification_success.tpv_success_id
        inner join frontier.acct_product_data_requirements_details
            left outer join frontier.provision_issues
            on provision_issues.account_product_id =
any(acct_product_data_requirements_details.acct_prod_ids) and
provision_issues.resolved_date is null
            left outer join frontier.reconciliations
                left outer join frontier.reconciliation_cancels
                on reconciliations.rec_id =
reconciliation_cancels.rec_id
            on reconciliations.account_product_id =
any(acct_product_data_requirements_details.acct_prod_ids)
        on order_details.acct_id =
acct_product_data_requirements_details.acct_id
        where verification_success.checked_out is null
        group by order_details.tpv_success_id, order_details.tpv_id,
order_details.ver_code, order_details.app_id,
        order_details.acct_id, order_details.first_name || ' ' ||
order_details.last_name, order_details.order_date,
verification_success.checked_out, order_details.csr_name,
        order_details.products,    order_details.promotion, num_prods,
order_details.division_id, order_details.has_dish_billing_info
having ( count(case when reconciliations.rec_id is not null and
reconciliation_cancels.rec_cancel_id is null then
reconciliations.rec_id end ) <  (select
array_upper(acct_product_data_requirements_details.acct_prod_ids,
1)) ) and
        (
        count(distinct case when provision_issues.account_product_id is not
null and provision_issues.resolved_date is null then
provision_issues.account_product_id end ) +
        count(case when reconciliations.rec_id is not null and
reconciliation_cancels.rec_cancel_id is null then
reconciliations.rec_id end ) +
        count(case when reconciliation_cancels.rec_cancel_id is not null
then reconciliation_cancels.rec_cancel_id end )
        ) < (select
array_upper(acct_product_data_requirements_details.acct_prod_ids, 1))
    and order_details.division_id =
any('{79,38,70,66,35,40,37,36,67,41,65,39}') --this array here
varies.  indexes are present for the different variations
order by order_details.order_date

here is the execution plan:

Sort  (cost=1350779962.18..1350969875.28 rows=75965240 width=290)
(actual time=16591.711..16591.723 rows=110 loops=1)
  Sort Key: frontier.order_details.order_date
  ->  GroupAggregate  (cost=1235567017.53..1295217399.34 rows=75965240
width=290) (actual time=16583.383..16591.420 rows=110 loops=1)
        Filter: ((count(CASE WHEN ((rec_id IS NOT NULL) AND
(rec_cancel_id IS NULL)) THEN rec_id ELSE NULL::integer END) <
(subplan)) AND (((count(DISTINCT CASE WHEN ((account_product_id IS NOT
NULL) AND (resolved_date IS NULL)) THEN account_product_id ELSE
NULL::integer END) + count(CASE WHEN ((rec_id IS NOT NULL) AND
(rec_cancel_id IS NULL)) THEN rec_id ELSE NULL::integer END)) +
count(CASE WHEN (rec_cancel_id IS NOT NULL) THEN rec_cancel_id ELSE
NULL::integer END)) < (subplan)))
        ->  Sort  (cost=1235567017.53..1238002161.29 rows=974057502
width=290) (actual time=16576.997..16577.513 rows=3366 loops=1)
              Sort Key: frontier.order_details.tpv_success_id,
frontier.order_details.tpv_id, frontier.order_details.ver_code,
frontier.order_details.app_id, frontier.order_details.acct_id,
(((frontier.order_details.first_name)::text || ' '::text) ||
(frontier.order_details.last_name)::text),
frontier.order_details.order_date, verification_success.checked_out,
frontier.order_details.csr_name, frontier.order_details.products,
frontier.order_details.promotion, (subplan),
frontier.order_details.division_id,
frontier.order_details.has_dish_billing_info
              ->  Merge Join  (cost=100001383.41..310142000.26
rows=974057502 width=290) (actual time=1055.584..16560.634 rows=3366
loops=1)
                    Merge Cond: ("outer".acct_id = "inner".acct_id)
                    ->  Nested Loop Left Join
(cost=328.94..173666048.48 rows=1928122302 width=53) (actual
time=0.236..16499.771 rows=7192 loops=1)
                          Join Filter: ("inner".account_product_id =
ANY ("outer".acct_prod_ids))
                          ->  Nested Loop Left Join
(cost=2.21..134714.57 rows=564852 width=45) (actual
time=0.215..1021.209 rows=5523 loops=1)
                                Join Filter:
("inner".account_product_id = ANY ("outer".acct_prod_ids))
                                ->  Index Scan using "FKI_acct_id" on
acct_product_data_requirements_details  (cost=0.00..488.19 rows=5484
width=33) (actual time=0.011..15.502 rows=5484 loops=1)
                                ->  Bitmap Heap Scan on
provision_issues  (cost=2.21..17.27 rows=206 width=12) (actual
time=0.035..0.106 rows=206 loops=5484)
                                      Recheck Cond: (resolved_date IS
NULL)
                                      ->  Bitmap Index Scan on
"IX_resolved_date_null"  (cost=0.00..2.21 rows=206 width=0) (actual
time=0.032..0.032 rows=206 loops=5484)
                          ->  Materialize  (cost=326.74..395.01
rows=6827 width=12) (actual time=0.000..0.852 rows=6827 loops=5523)
                                ->  Merge Left Join
(cost=0.00..319.91 rows=6827 width=12) (actual time=0.016..13.426
rows=6827 loops=1)
                                      Merge Cond: ("outer".rec_id =
"inner".rec_id)
                                      ->  Index Scan using
reconciliation_pkey on reconciliations  (cost=0.00..215.59 rows=6827
width=8) (actual time=0.004..4.209 rows=6827 loops=1)
                                      ->  Index Scan using
"FKI_rec_id" on reconciliation_cancels  (cost=0.00..56.80 rows=2436
width=8) (actual time=0.004..1.534 rows=2436 loops=1)
                    ->  Sort  (cost=100001054.46..100001061.39
rows=2770 width=241) (actual time=18.984..19.937 rows=3366 loops=1)
                          Sort Key: frontier.order_details.acct_id
                          ->  Hash Join
(cost=100000131.90..100000896.08 rows=2770 width=241) (actual
time=6.525..13.644 rows=2459 loops=1)
                                Hash Cond: ("outer".tpv_success_id =
"inner".tpv_success_id)
                                ->  Append
(cost=100000000.00..100000694.84 rows=2776 width=199) (actual
time=0.092..4.627 rows=2459 loops=1)
                                      ->  Seq Scan on order_details
(cost=100000000.00..100000012.45 rows=35 width=199) (actual
time=0.001..0.001 rows=0 loops=1)
                                            Filter: (division_id = ANY
('{79,38,70,66,35,40,37,36,67,41,65,39}'::integer[]))
                                      ->  Bitmap Heap Scan on
order_details_august_2007 order_details  (cost=2.33..88.63 rows=380
width=158) (actual time=0.089..0.557 rows=330 loops=1)
                                            Recheck Cond: (division_id
= ANY ('{79,38,70,66,35,40,37,36,67,41,65,39}'::integer[]))
                                            ->  Bitmap Index Scan on
"IX_august_2007_provision_list2"  (cost=0.00..2.33 rows=380 width=0)
(actual time=0.075..0.075 rows=330 loops=1)
                                      ->  Bitmap Heap Scan on
order_details_july_2007 order_details  (cost=2.31..71.39 rows=288
width=159) (actual time=0.082..0.521 rows=314 loops=1)
                                            Recheck Cond: (division_id
= ANY ('{79,38,70,66,35,40,37,36,67,41,65,39}'::integer[]))
                                            ->  Bitmap Index Scan on
"IX_july_2007_provision_list2"  (cost=0.00..2.31 rows=288 width=0)
(actual time=0.069..0.069 rows=314 loops=1)
                                      ->  Bitmap Heap Scan on
order_details_june_2007 order_details  (cost=2.05..71.82 rows=279
width=148) (actual time=0.029..0.106 rows=51 loops=1)
                                            Recheck Cond: (division_id
= ANY ('{79,38,70,66,35,40,37,36,67,41,65,39}'::integer[]))
                                            ->  Bitmap Index Scan on
"IX_june_2007_provision_list2"  (cost=0.00..2.05 rows=279 width=0)
(actual time=0.022..0.022 rows=51 loops=1)
                                      ->  Bitmap Heap Scan on
order_details_october_2007 order_details  (cost=7.24..279.34 rows=1060
width=159) (actual time=0.285..2.035 rows=1244 loops=1)
                                            Recheck Cond: (division_id
= ANY ('{79,38,70,66,35,40,37,36,67,41,65,39}'::integer[]))
                                            ->  Bitmap Index Scan on
"IX_october_2007_provision_list2"  (cost=0.00..7.24 rows=1060 width=0)
(actual time=0.239..0.239 rows=1244 loops=1)
                                      ->  Bitmap Heap Scan on
order_details_september_2007 order_details  (cost=4.52..171.21
rows=734 width=150) (actual time=0.130..0.856 rows=520 loops=1)
                                            Recheck Cond: (division_id
= ANY ('{79,38,70,66,35,40,37,36,67,41,65,39}'::integer[]))
                                            ->  Bitmap Index Scan on
"IX_september_2007_provision_list2"  (cost=0.00..4.52 rows=734
width=0) (actual time=0.110..0.110 rows=520 loops=1)
                                ->  Hash  (cost=118.21..118.21
rows=5473 width=8) (actual time=6.414..6.414 rows=5484 loops=1)
                                      ->  Bitmap Heap Scan on
verification_success  (cost=22.48..118.21 rows=5473 width=8) (actual
time=0.731..3.311 rows=5484 loops=1)
                                            Recheck Cond: (checked_out
IS NULL)
                                            ->  Bitmap Index Scan on
"IX_checked_out_isnull"  (cost=0.00..22.48 rows=5473 width=0) (actual
time=0.722..0.722 rows=5485 loops=1)
                    SubPlan
                      ->  Result  (cost=0.00..0.01 rows=1 width=0)
(actual time=0.001..0.001 rows=1 loops=3366)
        SubPlan
          ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.000..0.000 rows=1 loops=780)
          ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.000..0.000 rows=1 loops=2459)
Total runtime: 16593.353 ms


I have attached an erd of the tables used in this query.  If it is
stripped out it can be viewed here: http://www.ketema.net/provision_list_tables_erd.jpg

My concern is with the sort step that takes 15 seconds by itself:

->  Sort  (cost=1235567017.53..1238002161.29 rows=974057502 width=290)
(actual time=16576.997..16577.513 rows=3366 loops=1)
              Sort Key: frontier.order_details.tpv_success_id,
frontier.order_details.tpv_id, frontier.order_details.ver_code,
frontier.order_details.app_id, frontier.order_details.acct_id,
(((frontier.order_details.first_name)::text || ' '::text) ||
(frontier.order_details.last_name)::text),
frontier.order_details.order_date, verification_success.checked_out,
frontier.order_details.csr_name, frontier.order_details.products,
frontier.order_details.promotion, (subplan),
frontier.order_details.division_id,
frontier.order_details.has_dish_billing_info

I believe this is due to the aggregate done in the select clause as
well as the sub select:

(select
array_upper(acct_product_data_requirements_details.acct_prod_ids, 1))
as num_prods,
        count(distinct case when provision_issues.account_product_id is not
null and provision_issues.resolved_date is null then
provision_issues.account_product_id end ) as num_open_issues,
        count(case when reconciliations.rec_id is not null and
reconciliation_cancels.rec_cancel_id is null then
reconciliations.rec_id end ) as num_provisioned,
        count(case when reconciliation_cancels.rec_cancel_id is not null
then reconciliation_cancels.rec_cancel_id end ) as num_canceled
I believe the counts and sub select cause the sort to be performed
multiple times?

How can I improve this step?

Things I have thought about:
1)Creating indexes on the aggregates...Found out this can't be done.
2)Create Views of the counts and the sub select...is this any faster
as the view is executed at run time anyway?
3)Create actual tables of the sub select and aggregates...How would
this be maintained to ensure it was always accurate?
4)Increasing hardware resources.  Currently box is on a single
processor amd64 with 8Gb of RAM.  below are the settings for resource
usage.
shared_buffers = 65536
temp_buffers = 5000
max_prepared_transactions = 2000
work_mem = 131072
maintenance_work_mem = 512000
max_stack_depth = 7168
max_fsm_pages = 160000
max_fsm_relations = 4000
The only function of this box if for Pg, so I do not mind it using
every last drop of ram and resources that it can.
5)Upgrade version of pg..currently is running 8.1.4

Would appreciate any suggestions.

Thanks

http://pgsql.privatepaste.com/7ffDdPQvIN


pgsql-performance by date:

Previous
From: Gregory Stark
Date:
Subject: Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1
Next
From: Richard Huxton
Date:
Subject: Re: Improving Query