Re: Planner join order regression from PG 15 to PG 16+: 70ms -> 1440ms (self-contained reproducer included) - Mailing list pgsql-performance

From Tom Lane
Subject Re: Planner join order regression from PG 15 to PG 16+: 70ms -> 1440ms (self-contained reproducer included)
Date
Msg-id 14531.1772747078@sss.pgh.pa.us
Whole thread Raw
In response to Re: Planner join order regression from PG 15 to PG 16+: 70ms -> 1440ms (self-contained reproducer included)  (Andrei Lepikhov <lepihov@gmail.com>)
Responses Re: Planner join order regression from PG 15 to PG 16+: 70ms -> 1440ms (self-contained reproducer included)
List pgsql-performance
Andrei Lepikhov <lepihov@gmail.com> writes:
> On 5/3/26 17:25, Mauro Gatti wrote:
>> ## Questions for the community

> Thanks for stable reproduction!

Yes, we appreciate that much effort being put into trouble reports.
Makes it a lot easier to see what's going wrong.

> Your case is typical for 'never executed' nodes. As you can see, the
> costs of your query plans are very close, and the estimation error is
> large due to multiple clauses in your filter. As I see, for the planner,
> there is no difference in which version of the plan to choose - it is
> just a game of chance.

Right.  Given the very-far-off rowcount estimates for some of the
index scans, it'd be surprising if the planner arrived at a good
join order.  It's a "garbage in, garbage out" situation.  As Andrei
suggested, you can often improve bad rowcount estimates by creating
custom statistics.

I found it was sufficient to do

=# create statistics on brand_id,line_code,model_year,model_code,version_code,pricelist_id from pricelist_options;
CREATE STATISTICS
=# analyze pricelist_options;
ANALYZE

That doesn't result in fully accurate estimates:

=# explain analyze select * from pricelist_options pl where ((pl.brand_id = 10) AND ((pl.line_code)::text =
'ABC'::text)AND ((pl.model_year)::text = 'YR'::text) AND ((pl.model_code)::text = 'ABC'::text) AND
((pl.version_code)::text= 'VER-001'::text) AND (pl.pricelist_id = 100)); 
                                                                                                        QUERY PLAN
                                                                                                  

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using ix_pricelist_options_1 on pricelist_options pl  (cost=0.43..117.12 rows=28 width=106) (actual
time=0.015..0.037rows=165.00 loops=1) 
   Index Cond: ((brand_id = 10) AND ((line_code)::text = 'ABC'::text) AND ((model_year)::text = 'YR'::text) AND
((model_code)::text= 'ABC'::text) AND ((version_code)::text = 'VER-001'::text) AND (pricelist_id = 100)) 

but "28 rows" is a lot closer to 165 than "1 row", and it's enough
to push the planner to choose the plan you want.

I do concur with Andrei's recommendation to create stats matching
your other multicolumn indexes, though.

            regards, tom lane



pgsql-performance by date:

Previous
From: Andrei Lepikhov
Date:
Subject: Re: Planner join order regression from PG 15 to PG 16+: 70ms -> 1440ms (self-contained reproducer included)
Next
From: Mauro Gatti
Date:
Subject: Re: Planner join order regression from PG 15 to PG 16+: 70ms -> 1440ms (self-contained reproducer included)