Home > mailing lists

Re: Hash aggregate collisions cause excessive spilling - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: Hash aggregate collisions cause excessive spilling
Date	February 19 20:30:07
Msg-id	vx4azu62rgrnkt4oauviepbydxj5q7wbtzycwmqnmby2sfpvwc@xfvp3pcjnv2w Whole thread
In response to	Re: Hash aggregate collisions cause excessive spilling (Ants Aasma <ants.aasma@cybertec.at>)
Responses	Re: Hash aggregate collisions cause excessive spilling
List	pgsql-hackers

Tree view

Hi,

On 2026-02-19 19:06:04 +0200, Ants Aasma wrote:
> >
> >         /*
> >          * If parallelism is in use, even if the leader backend is performing the
> >          * scan itself, we don't want to create the hashtable exactly the same way
> >          * in all workers. As hashtables are iterated over in keyspace-order,
> >          * doing so in all processes in the same way is likely to lead to
> >          * "unbalanced" hashtables when the table size initially is
> >          * underestimated.
> >          */
> >         if (use_variable_hash_iv)
> >                 hash_iv = murmurhash32(ParallelWorkerNumber);
> >
> >
> > I don't remember enough of how the parallel aggregate stuff works. Perhaps the
> > issue is that the leader is also building a hashtable and it's being inserted
> > into the post-gather hashtable, using the same IV?
> >
> > In which case parallel_leader_participation=off should make a difference.
> 
> After turning leader participation off the problem no longer
> reproduced even after 10 iterations, turning it back on it reproduced
> on the 4th iteration. Is there any reason why the hash table couldn't
> have an unconditional iv that includes the plan node?

You mean just use the numerical value of the pointer? I think that'd be pretty
likely to be the same between parallel workers. And I think it's not great for
benchmarking / debugging if every run ends up with a different IV.

But we certainly should do something about the IV for the leader in these
cases.

Greetings,

Andres Freund

pgsql-hackers by date:

From: Nathan Bossart
Date: 19 February, 20:20:44
Subject: assume availability of "inline" keyword

From: Dmitry Dolgov
Date: 19 February, 20:44:57
Subject: Add ssl_(supported|shared)_groups to sslinfo

Re: Hash aggregate collisions cause excessive spilling - Mailing list pgsql-hackers

Previous

Next