On 9/25/25, 2:46 PM, "Peter Geoghegan" <pg@bowt.ie <mailto:pg@bowt.ie>> wrote:
> What does EXPLAIN ANALYZE actually show you on 18, compared to 16,
> with the same real world (non-minimal) test case? Are the joins and
> scan nodes all the same as before (i.e. is the Postgres 18 plan
> *identical to the "bad" Postgres 17 plan)?
>
> Do you see any "Heap Fetches:", particularly with the problematic
> index-only scan? Could it just be that you made sure to run VACUUM
> ahead of the test this time, allowing the index-only scan seen on
> Postgres 17 and 18 to avoid heap accesses?
>
> Recall that the 15 and 16 plan had a plain index scan on another
> index, and that the 17 plan you showed a few weeks back had "Heap
> Fetches: 598,916" (meaning that the index-only scan was completely
> ineffective at avoiding heap accesses). If you're now able to get the
> most out of index-only access, it could be enough to flip things in
> favor of the new plan -- in spite of the fact that there is definitely
> still a regression tied to needlessly sorting the scan's SAOP array
> many times.
The tests I reported on are application-level load tests that last about
10 hours and generate ~4 million queries, so I don't have any of that
per-query info. We generate a lot of queries with "IN (constant_list)"
expressions, so that seemed like a logical explanation, but maybe
something else is going on too? Nothing really stands out from
crawling through pg_stat_statements, so maybe it's a small effect that
is multiplied by repetition?
FWIW, except for large multi-join queries with "IN (constant_list)"
expressions, nearly every individual query I run is as fast or faster in
17 than in 16. The ones that are slower are ones that were previously
munged around to get a specific plan; those that I've unmunged have
so far been much faster on 17.
It just occurred to me while typing this that I should go count joins to
see if we're exceeding join_collapse_limit. Could something have
changed in 17 that would affect how such queries are planned?
-- todd