Re: Planner join order regression from PG 15 to PG 16+: 70ms -> 1440ms (self-contained reproducer included) - Mailing list pgsql-performance
| From | Mauro Gatti |
|---|---|
| Subject | Re: Planner join order regression from PG 15 to PG 16+: 70ms -> 1440ms (self-contained reproducer included) |
| Date | |
| Msg-id | CAGWVher6_MmYHUWYZ+=Yu_-4udnw+Jwif_uxQ8NpPXMjhgpy4g@mail.gmail.com Whole thread |
| In response to | Re: Planner join order regression from PG 15 to PG 16+: 70ms -> 1440ms (self-contained reproducer included) (Tom Lane <tgl@sss.pgh.pa.us>) |
| Responses |
Re: Planner join order regression from PG 15 to PG 16+: 70ms -> 1440ms (self-contained reproducer included)
|
| List | pgsql-performance |
Andrei, Tom, thank you for the explanation.
I had already tried using extended statistics but without success; I probably did something wrong.
After creating statistics on all the tables, I’m now getting optimal timings in production.
This is not the ideal solution because this issue will delay the PostgreSQL version upgrade:
our application will also need to be tested with respect to response times, and we will need to
create dedicated statistics depending on the specific cases. In addition, there may be cases
that slip through testing and then surface in production due to a different data distribution.
Based on your explanation, my take is that the planner is not able to produce adequate estimates
when there are filters on many columns; when there are millions of rows involved this become
noticeable and extended statistics may be needed. One or more commits in PG16 changed the context
sligthly and caused the unfortunate path to be chosen.
In any case, I don’t think there are other workable solutions without changing the code.
I get good results using CTEs or LATERAL with OFFSET 0, but the point was to find
a solution without having to change the code.
Thanks again, regards.
Mauro Gatti
Il giorno gio 5 mar 2026 alle ore 22:44 Tom Lane <tgl@sss.pgh.pa.us> ha scritto:
Andrei Lepikhov <lepihov@gmail.com> writes:
> On 5/3/26 17:25, Mauro Gatti wrote:
>> ## Questions for the community
> Thanks for stable reproduction!
Yes, we appreciate that much effort being put into trouble reports.
Makes it a lot easier to see what's going wrong.
> Your case is typical for 'never executed' nodes. As you can see, the
> costs of your query plans are very close, and the estimation error is
> large due to multiple clauses in your filter. As I see, for the planner,
> there is no difference in which version of the plan to choose - it is
> just a game of chance.
Right. Given the very-far-off rowcount estimates for some of the
index scans, it'd be surprising if the planner arrived at a good
join order. It's a "garbage in, garbage out" situation. As Andrei
suggested, you can often improve bad rowcount estimates by creating
custom statistics.
I found it was sufficient to do
=# create statistics on brand_id,line_code,model_year,model_code,version_code,pricelist_id from pricelist_options;
CREATE STATISTICS
=# analyze pricelist_options;
ANALYZE
That doesn't result in fully accurate estimates:
=# explain analyze select * from pricelist_options pl where ((pl.brand_id = 10) AND ((pl.line_code)::text = 'ABC'::text) AND ((pl.model_year)::text = 'YR'::text) AND ((pl.model_code)::text = 'ABC'::text) AND ((pl.version_code)::text = 'VER-001'::text) AND (pl.pricelist_id = 100));
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using ix_pricelist_options_1 on pricelist_options pl (cost=0.43..117.12 rows=28 width=106) (actual time=0.015..0.037 rows=165.00 loops=1)
Index Cond: ((brand_id = 10) AND ((line_code)::text = 'ABC'::text) AND ((model_year)::text = 'YR'::text) AND ((model_code)::text = 'ABC'::text) AND ((version_code)::text = 'VER-001'::text) AND (pricelist_id = 100))
but "28 rows" is a lot closer to 165 than "1 row", and it's enough
to push the planner to choose the plan you want.
I do concur with Andrei's recommendation to create stats matching
your other multicolumn indexes, though.
regards, tom lane
pgsql-performance by date: