Re: BUG #19449: Massive performance degradation for complex query on Postgres 16+ (few seconds -> multiple hours) - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: BUG #19449: Massive performance degradation for complex query on Postgres 16+ (few seconds -> multiple hours)
Date
Msg-id bbaf15a4-d743-47df-92fe-a1c5e94165ba@vondra.me
Whole thread Raw
In response to Re: BUG #19449: Massive performance degradation for complex query on Postgres 16+ (few seconds -> multiple hours)  (Tomas Vondra <tomas@vondra.me>)
Responses Re: BUG #19449: Massive performance degradation for complex query on Postgres 16+ (few seconds -> multiple hours)
List pgsql-bugs
On 4/2/26 21:00, Tomas Vondra wrote:
> ...
> FWIW I think that's what's happening. If I add an elog(WARNING) into
> ExecParallelHashJoinSetUpBatches, I see this:
> 
>     WARNING:  0x55dbe375a5e8 initializing 16 batches
>     WARNING:  0x7f3868a3a978 initializing 32 batches
>     WARNING:  0x7f3868a3ab80 initializing 4 batches
>     WARNING:  0x55dbe36148c0 initializing 4 batches
>     WARNING:  0x7f3868a3b230 initializing 16 batches
>     WARNING:  0x7f3868a3a978 initializing 64 batches
>     WARNING:  0x55dbe36144b0 initializing 128 batches
>     WARNING:  0x55dbe36144b0 initializing 256 batches
>     WARNING:  0x55dbe36144b0 initializing 512 batches
>     WARNING:  0x55dbe36144b0 initializing 1024 batches
>     WARNING:  0x7f3868a3a978 initializing 2048 batches
>     WARNING:  0x7f3868a3a978 initializing 4096 batches
>     WARNING:  0x55dbe36144b0 initializing 8192 batches
>     WARNING:  0x55dbe36144b0 initializing 16384 batches
>     WARNING:  0x55dbe36144b0 initializing 32768 batches
>     WARNING:  0x7f3868a3a978 initializing 65536 batches
>     WARNING:  0x55dbe36144b0 initializing 131072 batches
>     WARNING:  0x7f3868a3a978 initializing 262144 batches
> 
> so we're ending with 256k batches, for this one join. I'm not sure how
> exactly this maps to the 2M files from pg_stat_database, but it means
> ~0.5M tuplestores and ~10GB virtual memory (at lest per top).
> 
> I don't know what triggers the batch increase, but I still suspect it's
> similar to the explosion we fixed (or mitigated) in PG18, but only for
> serial (non-parallel) joins.
> 

An interesting question is "What changed in PG16?" causing the query to
fail, when it worked OK on earlier versions. I guess the main suspect is
this item from release notes

  Allow parallelization of FULL and internal right OUTER hash joins

So I guess it might be interesting to flip the joins to inner, see if it
still fails like that, and then see if that crashes on PG15 too.

Although the query has only inner and left outer joins, which seems
unrelated to the change. It might be simply a consequence of the planner
picking a different join tree (due to some general optimizer changes).

It might be interesting to try forcing the same join tree (which might
be possible with join_collapse_limit=1) on PG15. Maybe it'll crash the
same way?

Maybe it'd be easier to try reducing the query first, before doing any
of this. Start removing the joins one by one from the "top" (per the
explain), until it stops failing. That might leave a much smaller query.


regards

-- 
Tomas Vondra




pgsql-bugs by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: BUG #19449: Massive performance degradation for complex query on Postgres 16+ (few seconds -> multiple hours)
Next
From: Tom Lane
Date:
Subject: Re: BUG #19449: Massive performance degradation for complex query on Postgres 16+ (few seconds -> multiple hours)