Re: SET or STRICT modifiers on function affect planner row estimates - Mailing list pgsql-hackers

From Michał Kłeczek
Subject Re: SET or STRICT modifiers on function affect planner row estimates
Date
Msg-id 52BD3E3B-1514-4983-BEC9-1A8742A726DB@kleczek.org
Whole thread Raw
In response to Re: SET or STRICT modifiers on function affect planner row estimates  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Responses Re: SET or STRICT modifiers on function affect planner row estimates
List pgsql-hackers
Hi,
Thanks for taking a look.

On 30 Sep 2024, at 14:14, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:

Hi Michal,
It is difficult to understand the exact problem from your description.
Can you please provide EXPLAIN outputs showing the expected plan and
the unexpected plan; plans on the node where the query is run and
where the partitions are located.

The table structure is as follows:

CREATE TABLE tbl (…) PARTITION BY RANGE year(col02_date)

CREATE TABLE tbl_2015 PARTITION OF tbl FOR VALUES BETWEEN (2023) AND (2024) PARTITION BY HASH (col01_no)
… subsequent years


CREATE TABLE tbl_2021_32_9 PARTITION OF tbl_2021 FOR VALUES WITH (MODULUS 32 REMAINDER 9)

CREATE FOREIGN TABLE tbl_2022_16_9 PARTITION OF tbl_2022 FOR VALUES WITH (MODULUS 32 REMAINDER 9)

All tables are freshly ANALYZEd.

I have a function:

CREATE FUNCTION report(col01 text, year_from, month_from, year_to, month_to) RETURNS … LANGUAGE sql
$$
SELECT
  col01_no, year(col02_date), month(col02_date), sum(col03) FROM tbl WHERE col02_no = col02 AND (col02_date conditions) GROUP BY 1, 2, 3
$$

EXPLAIN (…) SELECT * FROM report(…);

gives

Plan 1 (performs pushdown of aggregates):

 

Append  (cost=9.33..76322501.41 rows=3 width=58) (actual time=5.051..6.414 rows=21 loops=1)

  InitPlan 1 (returns $0)

    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)

          Output: '2021-01-01'::date

  InitPlan 2 (returns $1)

    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)

          Output: '2023-12-31'::date

  ->  GroupAggregate  (cost=9.31..9.85 rows=1 width=58) (actual time=0.073..0.074 rows=0 loops=1)

        Output: op.col01_no, (year(op.col02_date)), (month(op.col02_date)), sum(op.col03) FILTER (WHERE (op.debit_flag = '0'::bpchar)), sum(op.col03) FILTER (WHERE (op.debit_flag <> '0'::bpchar)), count(1)

        Group Key: (year(op.col02_date)), (month(op.col02_date))

        ->  Sort  (cost=9.31..9.32 rows=1 width=44) (actual time=0.072..0.073 rows=0 loops=1)

              Output: (year(op.col02_date)), (month(op.col02_date)), op.col01_no, op.col03, op.debit_flag

              Sort Key: (year(op.col02_date)), (month(op.col02_date))

              Sort Method: quicksort  Memory: 25kB

              ->  Index Only Scan using tbl_2021_32_9_universal_gist_idx_3a2df25af5bc48a on cbt.tbl_2021_32_9 op  (cost=0.28..9.30 rows=1 width=44) (actual time=0.063..0.063 rows=0 loops=1)

                    Output: year(op.col02_date), month(op.col02_date), op.col01_no, op.col03, op.debit_flag

                    Index Cond: ((op.col01_no = '22109020660000000110831697'::text) AND (op.col02_date >= $0) AND (op.col02_date <= $1))

                    Filter: ((year(op.col02_date) >= 2021) AND (year(op.col02_date) <= 2023))

                    Heap Fetches: 0

  ->  Async Foreign Scan  (cost=100.02..76322480.36 rows=1 width=58) (actual time=0.753..0.755 rows=11 loops=1)

        Output: op_1.col01_no, (year(op_1.col02_date)), (month(op_1.col02_date)), (sum(op_1.col03) FILTER (WHERE (op_1.debit_flag = '0'::bpchar))), (sum(op_1.col03) FILTER (WHERE (op_1.debit_flag <> '0'::bpchar))), ((count(1))::double precision)

        Relations: Aggregate on (cbt_c61d467d1b5fd1d218d9e6e7dd44a333.tbl_2022_16_9 op_1)

        Remote SQL: SELECT col01_no, cbt.year(col02_date), cbt.month(col02_date), sum(col03) FILTER (WHERE (debit_flag = '0')), sum(col03) FILTER (WHERE (debit_flag <> '0')), count(1) FROM cbt.tbl_2022_16_9 WHERE ((cbt.year(col02_date) >= 2021)) AND ((cbt.year(col02_date) <= 2023)) AND ((col02_date >= $1::date)) AND ((col02_date <= $2::date)) AND ((col01_no = '22109020660000000110831697')) GROUP BY 1, 2, 3

  ->  GroupAggregate  (cost=10.63..11.17 rows=1 width=58) (actual time=4.266..4.423 rows=10 loops=1)

        Output: op_2.col01_no, (year(op_2.col02_date)), (month(op_2.col02_date)), sum(op_2.col03) FILTER (WHERE (op_2.debit_flag = '0'::bpchar)), sum(op_2.col03) FILTER (WHERE (op_2.debit_flag <> '0'::bpchar)), count(1)

        Group Key: (year(op_2.col02_date)), (month(op_2.col02_date))

        ->  Sort  (cost=10.63..10.64 rows=1 width=44) (actual time=4.238..4.273 rows=735 loops=1)

              Output: (year(op_2.col02_date)), (month(op_2.col02_date)), op_2.col01_no, op_2.col03, op_2.debit_flag

              Sort Key: (year(op_2.col02_date)), (month(op_2.col02_date))

              Sort Method: quicksort  Memory: 82kB

              ->  Index Only Scan using tbl_2023_128_9_universal_gist_idx_3a2df25af5bc48a on cbt.tbl_2023_128_9 op_2  (cost=0.54..10.62 rows=1 width=44) (actual time=0.295..4.059 rows=735 loops=1)

                    Output: year(op_2.col02_date), month(op_2.col02_date), op_2.col01_no, op_2.col03, op_2.debit_flag

                    Index Cond: ((op_2.col01_no = '22109020660000000110831697'::text) AND (op_2.col02_date >= $0) AND (op_2.col02_date <= $1))

                    Filter: ((year(op_2.col02_date) >= 2021) AND (year(op_2.col02_date) <= 2023))

                    Heap Fetches: 0

 

 


BUT after I perform

CREATE OR REPLACE report(…)

STRICT TO … AS … (same code)


The plan (as reported by auto_explain) changes to:


Plan 2 (no pushdown):


GroupAggregate  (cost=1781983216.68..1781983324.62 rows=200 width=58) (actual time=16.065..16.432 rows=21 loops=1)

  Output: op.col01_no, (year(op.col02_date)), (month(op.col02_date)), sum(op.col03) FILTER (WHERE (op.debit_flag = '0'::bpchar)), sum(op.col03) FILTER (WHERE (op.debit_flag <> '0'::bpchar)), count(1)

  Group Key: (year(op.col02_date)), (month(op.col02_date))

  InitPlan 1 (returns $0)

    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1)

          Output: make_date($2, $3, 1)

  InitPlan 2 (returns $1)

    ->  Result  (cost=0.00..0.02 rows=1 width=4) (actual time=0.002..0.003 rows=1 loops=1)

          Output: (((make_date($4, $5, 1) + '1 mon'::interval) - '1 day'::interval))::date

  ->  Sort  (cost=1781983216.65..1781983217.33 rows=272 width=44) (actual time=16.041..16.117 rows=1564 loops=1)

        Output: (year(op.col02_date)), (month(op.col02_date)), op.col01_no, op.col03, op.debit_flag

        Sort Key: (year(op.col02_date)), (month(op.col02_date))

        Sort Method: quicksort  Memory: 171kB

        ->  Append  (cost=0.55..1781983205.65 rows=272 width=44) (actual time=1.013..15.445 rows=1564 loops=1)

              Subplans Removed: 269

              ->  Index Only Scan using accoper_2021_32_9_universal_gist_idx_3a2df25af5bc48a on cbt.accoper_2021_32_9 op_1  (cost=0.28..9.30 rows=1 width=44) (actual time=0.084..0.084 rows=0 loops=1)

                    Output: year(op_1.col02_date), month(op_1.col02_date), op_1.col01_no, op_1.col03, op_1.debit_flag

                    Index Cond: ((op_1.col01_no = $1) AND (op_1.col02_date >= $0) AND (op_1.col02_date <= $1))

                    Filter: ((year(op_1.col02_date) >= $2) AND (year(op_1.col02_date) <= $4))

                    Heap Fetches: 0

              ->  Async Foreign Scan on cbt_c61d467d1b5fd1d218d9e6e7dd44a333.accoper_2022_16_9 op_2  (cost=100.00..76322479.83 rows=1 width=44) (actual time=0.658..3.870 rows=829 loops=1)

                    Output: year(op_2.col02_date), month(op_2.col02_date), op_2.col01_no, op_2.col03, op_2.debit_flag

                    Remote SQL: SELECT col03, col02_date, debit_flag, col01_no FROM cbt.accoper_2022_16_9 WHERE ((col02_date >= $1::date)) AND ((col02_date <= $2::date)) AND ((col01_no = $3::text)) AND ((cbt.year(col02_date) >= $4::integer)) AND ((cbt.year(col02_date) <= $5::integer))

              ->  Index Only Scan using accoper_2023_128_9_universal_gist_idx_3a2df25af5bc48a on cbt.accoper_2023_128_9 op_3  (cost=0.54..10.62 rows=1 width=44) (actual time=0.361..4.043 rows=735 loops=1)

                    Output: year(op_3.col02_date), month(op_3.col02_date), op_3.col01_no, op_3.col03, op_3.debit_flag

                    Index Cond: ((op_3.col01_no = $1) AND (op_3.col02_date >= $0) AND (op_3.col02_date <= $1))

                    Filter: ((year(op_3.col02_date) >= $2) AND (year(op_3.col02_date) <= $4))

                    Heap Fetches: 0


I understand wrong rows estimates ( =1 ) are due to missing statistics on expressions year() and month().


But why plans are different?



BTW. I tried to add extended statistics on the above expressions but ANALYZE didn’t seem to update any values (as seen in pg_stat_ext) - ndistinct is NULL for example.


Thanks,


--

Michal



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Inconsistency in reporting checkpointer stats
Next
From: Noah Misch
Date:
Subject: Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation