Re: unstable query plan on pg 16,17,18 - Mailing list pgsql-performance
| From | Attila Soki |
|---|---|
| Subject | Re: unstable query plan on pg 16,17,18 |
| Date | |
| Msg-id | A92040E8-9AFC-4B41-AB19-88E09ACB71AF@gmx.net Whole thread Raw |
| In response to | Re: unstable query plan on pg 16,17,18 (Laurenz Albe <laurenz.albe@cybertec.at>) |
| Responses |
Re: unstable query plan on pg 16,17,18
|
| List | pgsql-performance |
On 23 Feb 2026, at 20:59, Laurenz Albe <laurenz.albe@cybertec.at> wrote:On Mon, 2026-02-23 at 16:10 +0100, Attila Soki wrote:On 23 Feb 2026, at 10:41, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Mon, 2026-02-23 at 10:37 +0100, Attila Soki wrote:When upgrading from PostgreSQL 14.4, I noticed that one of my somewhat complex
analytical queries sometimes gets an inefficient plan under PostgreSQL 16, 17, and 18.
Under 14.4, the query runs with a stable plan and completes in 19 to 22 seconds.
In newer versions, the plan seems to be unstable, sometimes the query completes
in 17 to 20 seconds, sometimes it runs for 5 to 18 minutes with the inefficient plan.
This also happens even if the data is not significantly changed.
This is very likely owing to a bad estimate.
Could you turn on "track_io_timing" and send us the EXPLAIN (ANALYZE, BUFFERS) output
for both the good and the bad plan?
Thank you for your reply. Here are the two explains.
In order to be able to publish the plans here, I have obfuscated the table and field names, but this is reversible, so I can provide more info if needed.
plan-ok:
https://explain.depesz.com/s/hQvM
plan-wrong:
https://explain.depesz.com/s/uLvl
Thanks.
The difference in the plans is under the "Subquery Scan on odg", starting with
plan node 50 (everything under the "Sort"). I suspect that the mis-estimate
that is at the root of the problem is here:
-> Index Scan using table_k_late_spec_dp_end_dat_key on schema1.table_k kal (... rows=196053 ...) (... rows=471.00 ...)
Index Cond: (kal.dp_end_dat < ('now'::cstring)::date)
Index Searches: 1
Buffers: shared hit=230 read=49
I/O Timings: shared read=0.142
PostgreSQL overestimates the row count by a factor of over 400.
Try to fix that estimate and see if that gets PostgreSQL to do the right thing.
Perhaps a simple ANALYZE on the table can do the trick.
In the examples I used table_k to flip the plan with
vacuumed -Upostgres -vZ -t schema1.tbl_used_in_query db1
in the explain output schema1.tbl_used_in_query is table_k
The right side of the comparison looks awkward, as if you wrote 'now'::text::date
My experiments show that PostgreSQL v18 estimates well even with such a weird
condition, but perhaps if you write "current_date" instead, you'd get better results.
I didn't realize that made a difference. I will replace all occurrences. It also looks more clean with current_date.
I'd play just with a query like
EXPLAIN (ANALYZE)
SELECT * FROM schema1.table_k AS kal
WHERE dp_end_dat < current_date;
until I get a good estimate.
I will try to set custom statistics for dp_end_dat and the fields used by the table_k_late_spec_dp_end_dat_key index.
Let’s see if that helps.
I am on UTC+1. I will try all of this tomorrow and get back to you with the results later.
Thank you
regards,
Attila
pgsql-performance by date: