Re: Is there any good optimization solution to improve the query efficiency? - Mailing list pgsql-general

From Oliver Kohll
Subject Re: Is there any good optimization solution to improve the query efficiency?
Date
Msg-id CAMS=m5LtOO0NscRYr_7y0HVUw9T+LGog-RvV+T9p-_o=jrRYCQ@mail.gmail.com
Whole thread Raw
In response to Is there any good optimization solution to improve the query efficiency?  (gzh <gzhcoder@126.com>)
List pgsql-general
On Mon, 5 Jun 2023 at 07:56, gzh <gzhcoder@126.com> wrote:

Hi everyone,

I'm running into some performance issues with my SQL query.
The following SQL query is taking a long time to execute.

Execution Plan:
explain analyse
select * from TBL_RES
left outer join(select T_CUST.RSNO RSNO2 ,
T_CUST.KNO ,
T_CUST.AGE ,
T_CUST.GST
from TBL_CUST T_CUST ,
(select T_CUST.RSNO ,
T_CUST.KNO ,
MIN(T_CUST.GSTSEQ) GSTSEQ
from TBL_CUST T_CUST ,
TBL_POV T_POV ,
TBL_RES T_RES
where T_CUST.STSFLG = 'T'
and T_CUST.DISPSEQ <> 9999
AND T_CUST.KFIX = '0'
and T_POV.CRSNO = T_RES.CRSNO
and T_RES.RSNO = T_CUST.RSNO
group by T_CUST.RSNO , T_CUST.KNO) T_POV2
where T_POV2.RSNO = T_CUST.RSNO
and T_POV2.KNO = T_CUST.KNO
and T_POV2.GSTSEQ = T_CUST.GSTSEQ) T_POV3 on TBL_RES.RSNO = T_POV3.RSNO2
and TBL_RES.KNO = T_POV3.KNO
where TBL_RES.CID >= to_date('2022/07/01', 'YYYY/MM/DD')
and TBL_RES.CID <= to_date('2022/07/31', 'YYYY/MM/DD')
and TBL_RES.COD >= to_date('2022/07/01', 'YYYY/MM/DD')
and TBL_RES.COD <= to_date('2022/07/31', 'YYYY/MM/DD')
----- Execution Plan -----
Nested Loop Left Join  (cost=254388.44..452544.70 rows=473 width=3545) (actual time=3077.312..996048.714 rows=15123 loops=1)
  Join Filter: ((TBL_RES.RSNO = T_CUST.RSNO) AND ((TBL_RES.KNO)::text = (T_CUST.KNO)::text))
  Rows Removed by Join Filter: 4992268642
  ->  Gather  (cost=1000.00..58424.35 rows=473 width=3489) (actual time=0.684..14.158 rows=15123 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Parallel Seq Scan on TBL_RES  (cost=0.00..57377.05 rows=197 width=3489) (actual time=0.096..279.504 rows=5041 loops=3)
              Filter: ((CID >= to_date('2022/07/01'::text, 'YYYY/MM/DD'::text)) AND (CID <= to_date('2022/07/31'::text, 'YYYY/MM/DD'::text)) AND (COD >= to_date('2022/07/01'::text, 'YYYY/MM/DD'::text)) AND (COD <= to_date('2022/07/31'::text, 'YYYY/MM/DD'::text)))
              Rows Removed by Filter: 161714
  ->  Materialize  (cost=253388.44..394112.08 rows=1 width=56) (actual time=0.081..26.426 rows=330111 loops=15123)
        ->  Hash Join  (cost=253388.44..394112.07 rows=1 width=56) (actual time=1197.484..2954.084 rows=330111 loops=1)
              Hash Cond: ((T_CUST.RSNO = T_CUST_1.RSNO) AND ((T_CUST.KNO)::text = (T_CUST_1.KNO)::text) AND (T_CUST.gstseq = (min(T_CUST_1.gstseq))))
              ->  Seq Scan on TBL_CUST T_CUST  (cost=0.00..79431.15 rows=2000315 width=61) (actual time=0.015..561.005 rows=2000752 loops=1)
              ->  Hash  (cost=246230.90..246230.90 rows=262488 width=50) (actual time=1197.025..1209.957 rows=330111 loops=1)
                    Buckets: 65536  Batches: 8  Memory Usage: 2773kB
                    ->  Finalize GroupAggregate  (cost=205244.84..243606.02 rows=262488 width=50) (actual time=788.552..1116.074 rows=330111 loops=1)
                          Group Key: T_CUST_1.RSNO, T_CUST_1.KNO
                          ->  Gather Merge  (cost=205244.84..238964.80 rows=268846 width=50) (actual time=788.547..982.479 rows=330111 loops=1)
                                Workers Planned: 2
                                Workers Launched: 1
                                ->  Partial GroupAggregate  (cost=204244.81..206933.27 rows=134423 width=50) (actual time=784.032..900.979 rows=165056 loops=2)
                                      Group Key: T_CUST_1.RSNO, T_CUST_1.KNO
                                      ->  Sort  (cost=204244.81..204580.87 rows=134423 width=23) (actual time=784.019..833.791 rows=165061 loops=2)
                                            Sort Key: T_CUST_1.RSNO, T_CUST_1.KNO
                                            Sort Method: external merge  Disk: 5480kB
                                            Worker 0:  Sort Method: external merge  Disk: 5520kB
                                            ->  Parallel Hash Join  (cost=111758.80..190036.38 rows=134423 width=23) (actual time=645.302..716.247 rows=165061 loops=2)
                                                  Hash Cond: (T_CUST_1.RSNO = T_RES.RSNO)
                                                  ->  Parallel Seq Scan on TBL_CUST T_CUST_1  (cost=0.00..74013.63 rows=204760 width=23) (actual time=0.018..264.390 rows=165058 loops=2)
                                                        Filter: ((dispseq <> '9999'::numeric) AND ((stsflg)::text = 'T'::text) AND ((KFIX)::text = '0'::text))
                                                        Rows Removed by Filter: 835318
                                                  ->  Parallel Hash  (cost=109508.52..109508.52 rows=137142 width=8) (actual time=343.593..343.896 rows=165058 loops=2)
                                                        Buckets: 131072  Batches: 8  Memory Usage: 3008kB
                                                        ->  Parallel Hash Join  (cost=51834.70..109508.52 rows=137142 width=8) (actual time=256.732..314.368 rows=165058 loops=2)
                                                              Hash Cond: ((T_RES.crsno)::text = (T_POV.crsno)::text)
                                                              ->  Parallel Seq Scan on TBL_RES T_RES  (cost=0.00..53199.02 rows=208902 width=17) (actual time=0.007..100.510 rows=250132 loops=2)
                                                              ->  Parallel Hash  (cost=49450.42..49450.42 rows=137142 width=9) (actual time=122.308..122.309 rows=165054 loops=2)
                                                                    Buckets: 131072  Batches: 8  Memory Usage: 2976kB
                                                                    ->  Parallel Seq Scan on TBL_POV T_POV  (cost=0.00..49450.42 rows=137142 width=9) (actual time=0.037..89.470 rows=165054 loops=2)
Planning Time: 1.064 ms
Execution Time: 996062.382 ms
--------------------------------------------------------------------------------

The amount of data in the table is as follows.
TBL_RES    500265
TBL_CUST   2000752
TBL_POV    330109

Any suggestions for improving the performance of the query would be greatly appreciated.

Thanks in advance!


Welcome to query optimisation, I hope you'll enjoy working on this problem, it's one of the things I and many others love.

From a quick glance, it looks like there are opportunities for index use there but the plan doesn't contain any. Do any indexes exist?

You could also start looking at it in parts, e.g. separate out the inner T_POV2 subselect. It looks like that contains a cross join, which is then whittled down with WHERE clauses. You could try adding ON to the FROM instead. https://www.postgresql.org/docs/15/sql-select.html#SQL-FROM

Oliver

pgsql-general by date:

Previous
From: Laurenz Albe
Date:
Subject: Re: [Question]What will happen if the server active close the connection?
Next
From: Ron
Date:
Subject: Re: Is there any good optimization solution to improve the query efficiency?