Re: BUG #19449: Massive performance degradation for complex query on Postgres 16+ (few seconds -> multiple hours) - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: BUG #19449: Massive performance degradation for complex query on Postgres 16+ (few seconds -> multiple hours)
Date
Msg-id e43f543b-fac2-46da-9a4c-951c038ac0bc@vondra.me
Whole thread Raw
In response to Re: BUG #19449: Massive performance degradation for complex query on Postgres 16+ (few seconds -> multiple hours)  (Tomas Vondra <tomas@vondra.me>)
Responses Re: BUG #19449: Massive performance degradation for complex query on Postgres 16+ (few seconds -> multiple hours)
List pgsql-bugs

On 4/2/26 20:12, Tomas Vondra wrote:
> Hi,
> 
> I can reproduce the performance getting much worse in 16, using the
> provided SQL scripts. This is what I see:
> 
>   14: 1551.363 ms
>   15: 1385.414 ms
>   16: 161571.400 ms
>   17: 156434.543 ms
>   18: 159095.001 ms
> 
> I'm attaching the explains for 15+16. I don't know what's causing it,
> but I have a couple interesting observations.
> 
> 1) If I disable parallel query, the timings change to
> 
>   14: 3990.439 ms
>   15: 3518.453 ms
>   16: 3606.460 ms
>   17: 3591.039 ms
>   18: 3617.872 ms
> 
> So no regression in this case. It seems to be related to parallelism.
> 
> 
> 2) There seems to be an explosion of temporary files. We don't have that
> in explain, but I queried pg_stat_database before/after the query, and
> there's huge difference. Both start at
> 
>   temp_files               | 112
>   temp_bytes               | 1942275280
> 
> so 112 files, ~2GB disk space. But after the query, 15 says
> 
>   temp_files               | 721
>   temp_bytes               | 2755839184
> 
> while 16 has
> 
>   temp_files               | 2078995
>   temp_bytes               | 70607906000
> 
> 2M files and 70GB? Wow!
> 
> 
> 3) Indeed, before the query completes the pgsql_tmp directory has this:
> 
>   63M    pgsql_tmp3499395.0.fileset
>   63G    pgsql_tmp3499395.1.fileset
>   95M    pgsql_tmp3499395.2.fileset
>   95M    pgsql_tmp3499395.3.fileset
>   127M    pgsql_tmp3499395.4.fileset
> 
> So I guess that's one of the parallel hash joins doing something, and
> consuming 63GB of disk space? I don't see anything suspicious in the
> plan, but I assume parallel HJ may not report the relevant stats.
> 
> FWIW bumping up work_mem (to 64MB) solved this with the sample data.
> 
> I suspect this is going to be something like the hash join explosion,
> where we just happen to add more and more batches. I don't have time to
> investigate this more at the moment.
> 

FWIW I think that's what's happening. If I add an elog(WARNING) into
ExecParallelHashJoinSetUpBatches, I see this:

    WARNING:  0x55dbe375a5e8 initializing 16 batches
    WARNING:  0x7f3868a3a978 initializing 32 batches
    WARNING:  0x7f3868a3ab80 initializing 4 batches
    WARNING:  0x55dbe36148c0 initializing 4 batches
    WARNING:  0x7f3868a3b230 initializing 16 batches
    WARNING:  0x7f3868a3a978 initializing 64 batches
    WARNING:  0x55dbe36144b0 initializing 128 batches
    WARNING:  0x55dbe36144b0 initializing 256 batches
    WARNING:  0x55dbe36144b0 initializing 512 batches
    WARNING:  0x55dbe36144b0 initializing 1024 batches
    WARNING:  0x7f3868a3a978 initializing 2048 batches
    WARNING:  0x7f3868a3a978 initializing 4096 batches
    WARNING:  0x55dbe36144b0 initializing 8192 batches
    WARNING:  0x55dbe36144b0 initializing 16384 batches
    WARNING:  0x55dbe36144b0 initializing 32768 batches
    WARNING:  0x7f3868a3a978 initializing 65536 batches
    WARNING:  0x55dbe36144b0 initializing 131072 batches
    WARNING:  0x7f3868a3a978 initializing 262144 batches

so we're ending with 256k batches, for this one join. I'm not sure how
exactly this maps to the 2M files from pg_stat_database, but it means
~0.5M tuplestores and ~10GB virtual memory (at lest per top).

I don't know what triggers the batch increase, but I still suspect it's
similar to the explosion we fixed (or mitigated) in PG18, but only for
serial (non-parallel) joins.

regards

-- 
Tomas Vondra




pgsql-bugs by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: BUG #19449: Massive performance degradation for complex query on Postgres 16+ (few seconds -> multiple hours)
Next
From: Tomas Vondra
Date:
Subject: Re: BUG #19449: Massive performance degradation for complex query on Postgres 16+ (few seconds -> multiple hours)