Re: Hash aggregate collisions cause excessive spilling - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Hash aggregate collisions cause excessive spilling
Date
Msg-id vhafwgkrzqlc5cm7vuz6df5ohmf4rjfgrfnd6stqag5i7edi46@wcdtbcib4nnd
Whole thread
In response to Re: Hash aggregate collisions cause excessive spilling  (Ants Aasma <ants.aasma@cybertec.at>)
Responses Re: Hash aggregate collisions cause excessive spilling
List pgsql-hackers
Hi,

On 2026-02-19 18:16:57 +0200, Ants Aasma wrote:
> > So it's a parallel aggregate? Partial + Finalize? I wonder if that might
> > be "correlating" the data in a way that makes it more likely to hit
> > SH_GROW_MAX_MOVE. But If that was the case, wouldn't we see this issue
> > more often?
> 
> Interestingly the plan doesn't have partial and final on those hash agg nodes:
> 
>                      ->  HashAggregate  (cost=142400.87..142800.87
> rows=40000 width=16) (actual time=7978.262..9591.682 rows=3698243
> loops=1)
>                            Group Key: "*SELECT* 2_4".vehicle_id,
> "*SELECT* 2_4".day
>                            Batches: 21  Memory Usage: 65593kB  Disk
> Usage: 118256kB
>                            ->  Gather  (cost=133600.87..142000.87
> rows=80000 width=16) (actual time=1898.473..4772.296 rows=3698243
> loops=1)
>                                  Workers Planned: 2
>                                  Workers Launched: 2
>                                  ->  HashAggregate
> (cost=132600.87..133000.87 rows=40000 width=16) (actual
> time=1586.697..2040.368 rows=1232748 loops=3)
>                                        Group Key: "*SELECT*
> 2_4".vehicle_id, "*SELECT* 2_4".day
>                                        Batches: 1  Memory Usage: 5137kB
>                                        Worker 0:  Batches: 5  Memory
> Usage: 79921kB  Disk Usage: 40024kB
>                                        Worker 1:  Batches: 5  Memory
> Usage: 81969kB  Disk Usage: 36112kB
> 
> There are timescale tables involved in the plan, so I think timescale
> might be behind that.

Hm, so timescale creates a plan that we would not?


> There is this comment above the simplehash growing logic:

> * To avoid negative consequences from overly imbalanced
> * hashtables, grow the hashtable if collisions would require
> * us to move a lot of entries.  The most likely cause of such
> * imbalance is filling a (currently) small table, from a
> * currently big one, in hash-table order.
> 
> The problem disappears if I have a breakpoint on tuplehash_grow, so
> apparently triggering the problem requires that the lower hashtable
> scans interleave in a particular manner to trigger the excess growth
> of the upper node.
> 
> I'm wondering if some way to decorrelate the hashtables would help.
> For example a hashtable specific (pseudo)random salt.

We do try to add a hash-IV that's different for each worker:

    /*
     * If parallelism is in use, even if the leader backend is performing the
     * scan itself, we don't want to create the hashtable exactly the same way
     * in all workers. As hashtables are iterated over in keyspace-order,
     * doing so in all processes in the same way is likely to lead to
     * "unbalanced" hashtables when the table size initially is
     * underestimated.
     */
    if (use_variable_hash_iv)
        hash_iv = murmurhash32(ParallelWorkerNumber);


I don't remember enough of how the parallel aggregate stuff works. Perhaps the
issue is that the leader is also building a hashtable and it's being inserted
into the post-gather hashtable, using the same IV?

In which case parallel_leader_participation=off should make a difference.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Ants Aasma
Date:
Subject: Re: Hash aggregate collisions cause excessive spilling
Next
From: Dragos Andriciuc
Date:
Subject: Re: DOCS - Add introductory paragraph to Getting Started chapter