Re: [EXT] YNT: Need help tuning a query - Mailing list pgsql-general

From Vladimir Sitnikov
Subject Re: [EXT] YNT: Need help tuning a query
Date
Msg-id CAB=Je-G7j5yaXdAOe6pxzMsVf-Mcxg77tfWBQM=9eV=rDJ6yRQ@mail.gmail.com
Whole thread Raw
In response to RE: [EXT] YNT: Need help tuning a query  ("Wong, Kam Fook (TR Technology)" <kamfook.wong@thomsonreuters.com>)
Responses Re: [EXT] YNT: Need help tuning a query  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
List pgsql-general
Steve,

It looks like PostgreSQL was not able to push the join condition into group by subquery.

The problematic bits in the PostgreSQL plan are

         ->  Nested Loop Left Join  (cost=3423317.32..4522882.01 rows=4 width=261) (actual time=30839.132..35117.682 rows=39 loops=1)
               Join Filter: (sub2.judge_id = judg1.judge_id)
               Rows Removed by Join Filter: 858300

               ->  Nested Loop Left Join  (cost=1504656.29..2084303.10 rows=2 width=205) (actual time=19156.871..21775.159 rows=3 loops=1)
                     Join Filter: (sub4.judge_id = judg1.judge_id)
                     Rows Removed by Join Filter: 55263

In other words, it did not use sub2.judge_id = judg1.judge_id condition for efficient data retrieval, and it selected all the data and then filtered.

I believe you could workaround the issue by using LATERAL subqueries, so you could manually push join conditions into the group by subqueries.
See https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-LATERAL

Frankly speaking, it looks strange that you have "group by .. limit 11" at the very end of the query.
There's no "order by", so it effectively means "return 11 random rows".
I would suggest adding order by if the order is important, or remove "limit 11" and apply the same "lateral" trick for the third subquery as well.
Technically speaking, you can have limit within lateral.

----

I would suggest trying the following:

--- original.sql 2023-09-29 09:20:09
+++ tuned.sql 2023-09-29 09:25:35
@@ -83,6 +83,7 @@
                                AND (judg1.jrt_opt_out_flag <> 'Y' OR judg1.jrt_opt_out_flag IS NULL)
                     ) sub0
                LEFT OUTER JOIN
+                    LATERAL
                    (  SELECT sub4.case_year_number,
                                  sub4.judge_wld_id,
                                  sub4.judge_id,
@@ -94,6 +95,7 @@
                                  'A' AS grouping_flg,
                                  jrtf1.case_document_id AS subcount
                               FROM wln_mart.jrt_fact jrtf1
+                             WHERE jrtf1.judge_id = sub0.judge_id
                            GROUP BY jrtf1.case_year_number,
                                  jrtf1.higher_judge_wld_id,
                                  jrtf1.higher_judge_id,
@@ -106,6 +108,7 @@
                   ) sub1
                ON sub1.judge_id = sub0.judge_id
              LEFT OUTER JOIN
+                  LATERAL
                  (SELECT sub5.case_year_number,
                                sub5.judge_wld_id,
                                sub5.judge_id,
@@ -121,6 +124,7 @@
                            FROM wln_mart.jrt_fact jrtf2
                           WHERE jrtf2.lower_judge_court_level_id > 1000
                            AND jrtf2.lower_judge_court_level_id <= 1004
+                           AND jrtf2.judge_id = sub0.judge_id
                        GROUP BY jrtf2.case_year_number,
                                jrtf2.lower_judge_wld_id,
                                jrtf2.lower_judge_id,



Could you please share explain (analyze, buffers) for the tuned query?

SELECT agg_sub.judge_id,
       agg_sub.display_name,
       agg_sub.active_flag,
       agg_sub.judge_court_level,
       agg_sub.jrt_fact_first_year_trial,
       agg_sub.jrt_fact_last_year_trial,
       agg_sub.jrt_fact_totalcount_trial,
       agg_sub.filtered_first_year_trial,
       agg_sub.filtered_last_year_trial,
       agg_sub.jrt_fact_count_trial,
       agg_sub.jrt_fact_first_year_appeal,
       agg_sub.jrt_fact_last_year_appeal,
       agg_sub.jrt_fact_totalcount_appeal,
       agg_sub.filtered_first_year_appeal,
       agg_sub.filtered_last_year_appeal,
       agg_sub.jrt_fact_count_appeal,
       appellate_flag_sub.appellate_flag
   FROM (SELECT sub3.judge_id,
             sub3.display_name,
             sub3.active_flag,
             sub3.judge_court_level,
             (MIN(sub3.trial_unfilt_case_year_number)) AS jrt_fact_first_year_trial,
             (MAX(sub3.trial_unfilt_case_year_number)) AS jrt_fact_last_year_trial,
             (SUM(sub3.trial_unfilt_subcount)) AS jrt_fact_totalcount_trial,
             (MIN(sub3.trial_filt_case_year_number)) AS filtered_first_year_trial,
             (MAX(sub3.trial_filt_case_year_number)) AS filtered_last_year_trial,
             (SUM(sub3.trial_filt_subcount)) AS jrt_fact_count_trial,
             (MIN(sub3.appeal_unfilt_case_year_number)) AS jrt_fact_first_year_appeal,
             (MAX(sub3.appeal_unfilt_case_year_number)) AS jrt_fact_last_year_appeal,
             (SUM(sub3.appeal_unfilt_subcount)) AS jrt_fact_totalcount_appeal,
             (MIN(sub3.appeal_filt_case_year_number)) AS filtered_first_year_appeal,
             (MAX(sub3.appeal_filt_case_year_number)) AS filtered_last_year_appeal,
             (SUM(sub3.appeal_filt_subcount)) AS jrt_fact_count_appeal
           FROM (SELECT sub0.judge_id,
                    sub0.display_name,
                    sub0.active_flag,
                    sub0.judge_court_level,
                    (CASE WHEN sub2.grouping_flg = 'T' AND sub2.judge_wld_id = sub0.judge_wld_id THEN sub2.case_year_number ELSE NULL END) AS trial_unfilt_case_year_number,
                    (CASE WHEN sub2.grouping_flg = 'T'AND sub2.judge_wld_id = sub0.judge_wld_id THEN sub2.subcount ELSE NULL END                       ) AS trial_unfilt_subcount,
                    (CASE WHEN sub2.grouping_flg = 'T' AND sub2.judge_wld_id = sub0.judge_wld_id THEN (CASE WHEN sub2.case_year_number BETWEEN sub0.low_case_year_number AND sub0.high_case_year_number
                           THEN sub2.case_year_number ELSE NULL END) ELSE NULL END) AS trial_filt_case_year_number,
                    (CASE WHEN sub2.grouping_flg = 'T' AND sub2.judge_wld_id = sub0.judge_wld_id THEN (CASE WHEN sub2.case_year_number BETWEEN sub0.low_case_year_number AND sub0.high_case_year_number
                           THEN sub2.subcount ELSE NULL END       ) ELSE NULL END        ) AS trial_filt_subcount,
                    (CASE WHEN sub1.grouping_flg = 'A'AND sub1.judge_wld_id = sub0.judge_wld_id THEN sub1.case_year_number WHEN sub2.grouping_flg = 'A'   AND sub2.judge_wld_id = sub0.judge_wld_id
                           THEN sub2.case_year_number ELSE NULL END         ) AS appeal_unfilt_case_year_number,
                    (
                    CASE WHEN sub1.grouping_flg = 'A'
                    AND sub1.judge_wld_id = sub0.judge_wld_id THEN sub1.subcount WHEN sub2.grouping_flg = 'A'
                    AND sub2.judge_wld_id = sub0.judge_wld_id THEN sub2.subcount ELSE NULL END
                    ) AS appeal_unfilt_subcount,
                    (
                    CASE WHEN sub1.grouping_flg = 'A'
                    AND sub1.judge_wld_id = sub0.judge_wld_id THEN (
                          CASE WHEN sub1.case_year_number BETWEEN sub0.low_case_year_number
                           AND sub0.high_case_year_number THEN sub1.case_year_number ELSE NULL END
                    ) WHEN sub2.grouping_flg = 'A'
                    AND sub2.judge_wld_id = sub0.judge_wld_id THEN (
                           CASE WHEN sub2.case_year_number BETWEEN sub0.low_case_year_number
                           AND sub0.high_case_year_number THEN sub2.case_year_number ELSE NULL END
                    ) ELSE NULL END
                    ) AS appeal_filt_case_year_number,
                    (
                    CASE WHEN sub1.grouping_flg = 'A'
                    AND sub1.judge_wld_id = sub0.judge_wld_id THEN (
                           CASE WHEN sub1.case_year_number BETWEEN sub0.low_case_year_number
                           AND sub0.high_case_year_number THEN sub1.subcount ELSE NULL END
                    ) WHEN sub2.grouping_flg = 'A'
                    AND sub2.judge_wld_id = sub0.judge_wld_id THEN (
                           CASE WHEN sub2.case_year_number BETWEEN sub0.low_case_year_number
                           AND sub0.high_case_year_number THEN sub2.subcount ELSE NULL END
                    ) ELSE NULL END
                    ) AS appeal_filt_subcount
               FROM ( SELECT 0104119201 AS judge_wld_id,
                                 2013 AS low_case_year_number,
                                 2023 AS high_case_year_number,
                                 judg1.judge_id,
                                 judg1.display_name,
                                 judg1.active_flag,
                                 judg1.judge_court_level
                       FROM wln_mart.judge judg1
                      WHERE judg1.wld_id = 01041192
                               AND judg1.profile_id = 01
                               AND (judg1.jrt_opt_out_flag <> 'Y' OR judg1.jrt_opt_out_flag IS NULL)
                    ) sub0
               LEFT OUTER JOIN
                    LATERAL
                   (  SELECT sub4.case_year_number,
                                 sub4.judge_wld_id,
                                 sub4.judge_id,
                                 sub4.grouping_flg,
                                 COUNT(*) AS subcount
                      FROM  (SELECT jrtf1.case_year_number,
                                 jrtf1.higher_judge_wld_id AS judge_wld_id,
                                 jrtf1.higher_judge_id AS judge_id,
                                 'A' AS grouping_flg,
                                 jrtf1.case_document_id AS subcount
                              FROM wln_mart.jrt_fact jrtf1
                             WHERE jrtf1.judge_id = sub0.judge_id
                           GROUP BY jrtf1.case_year_number,
                                 jrtf1.higher_judge_wld_id,
                                 jrtf1.higher_judge_id,
                                 jrtf1.case_document_id
                            ) sub4
                GROUP BY sub4.case_year_number,
                                sub4.judge_wld_id,
                                sub4.judge_id,
                                sub4.grouping_flg
                  ) sub1
               ON sub1.judge_id = sub0.judge_id
             LEFT OUTER JOIN
                  LATERAL
                 (SELECT sub5.case_year_number,
                               sub5.judge_wld_id,
                               sub5.judge_id,
                               sub5.grouping_flg,
                               COUNT(*) AS subcount
                   FROM (SELECT jrtf2.case_year_number,
                               jrtf2.lower_judge_wld_id AS judge_wld_id,
                               jrtf2.lower_judge_id AS judge_id,
                               (
                                        CASE WHEN jrtf2.lower_judge_court_level_id = 1004 THEN 'T' ELSE 'A' END
                               ) AS grouping_flg,
                               jrtf2.case_document_id AS subcount
                           FROM wln_mart.jrt_fact jrtf2
                          WHERE jrtf2.lower_judge_court_level_id > 1000
                           AND jrtf2.lower_judge_court_level_id <= 1004
                           AND jrtf2.judge_id = sub0.judge_id
                       GROUP BY jrtf2.case_year_number,                
                               jrtf2.lower_judge_wld_id,
                               jrtf2.lower_judge_id,
                               (
                                        CASE WHEN jrtf2.lower_judge_court_level_id = 1004 THEN 'T' ELSE 'A' END
                               ),
                               jrtf2.case_document_id
                              ) sub5
              GROUP BY sub5.case_year_number,
                              sub5.judge_wld_id,
                              sub5.judge_id,
                              sub5.grouping_flg
                 ) sub2
             ON sub2.judge_id = sub0.judge_id
      ) sub3
    GROUP BY sub3.judge_id,
           sub3.display_name,
                  sub3.active_flag,
           sub3.judge_court_level
  ) agg_sub,
  (SELECT judge_id,
        (CASE WHEN (SUM (appellate_flag) < (COUNT(*) / 2)) THEN 0 ELSE 1 END    ) AS appellate_flag
     FROM (SELECT DISTINCT jrtf.case_year_number,
               jrtf.case_document_id,
              jrtf.lower_judge_id,
               jrtf.higher_judge_id,
               (
                    CASE WHEN (
                           lower_judge_wld_id = 0104119201
                           AND lower_judge_court_level_id = 1004
              ) THEN 0 ELSE 1 END      ) AS appellate_flag,
               (
                    CASE WHEN lower_judge_wld_id = 0104119201 THEN lower_judge_id ELSE higher_judge_id END
               ) AS judge_id
            FROM wln_mart.JRT_FACT jrtf
           WHERE LOWER_JUDGE_WLD_ID = 0104119201
              OR HIGHER_JUDGE_WLD_ID = 0104119201
        ORDER BY case_year_number DESC
      ) sub0
  GROUP BY judge_id
  LIMIT 11
  ) appellate_flag_sub
WHERE
  Agg_sub.judge_id = appellate_flag_sub.judge_id;



Vladimir

pgsql-general by date:

Previous
From: Raivo Rebane
Date:
Subject: Re: Right version of jdbc
Next
From: Vladimir Sitnikov
Date:
Subject: Re: [EXT] YNT: Need help tuning a query