Re: Hash aggregate collisions cause excessive spilling - Mailing list pgsql-hackers

From Ants Aasma
Subject Re: Hash aggregate collisions cause excessive spilling
Date
Msg-id CANwKhkOMbFeFwmzB0zoFDXKYMV-xM8===0LD3c4DHJ1wht3=6w@mail.gmail.com
Whole thread Raw
In response to Re: Hash aggregate collisions cause excessive spilling  (Andres Freund <andres@anarazel.de>)
Responses Re: Hash aggregate collisions cause excessive spilling
List pgsql-hackers
On Thu, 19 Feb 2026 at 21:01, Andres Freund <andres@anarazel.de> wrote:
> On 2026-02-19 20:24:17 +0200, Ants Aasma wrote:
> > On Thu, 19 Feb 2026 at 20:07, Ants Aasma <ants.aasma@cybertec.at> wrote:
> > > I was thinking more along the lines of hashing together the pointer
> > > value and worker number. But something more deterministic would indeed
> > > be better. How about this?
> > >
> > > --- a/src/backend/executor/execGrouping.c
> > > +++ b/src/backend/executor/execGrouping.c
> > > @@ -201,3 +201,3 @@ BuildTupleHashTable(PlanState *parent,
> > >         MemoryContext oldcontext;
> > > -       uint32          hash_iv = 0;
> > > +       uint32          hash_iv = parent->plan->plan_node_id;
> >
> > I can confirm that this fixes the issue. A standalone reproducer is here:
>
> I think this needs should use something that smears the bits from the plan_id
> more widely. The hash functions of some types are ... peculiar (partially due
> to cross-type hashjoin support).
>
> I'd also combine it with use_variable_hash_iv, rather than just have
> use_variable_hash_iv overwrite it.

This problem should only affect cases where multiple hash tables worth
of tuples are trying to fit into one. I think right now that limits it
to hash aggregates with parallel query. ExecBuildHash32FromAttrs
comment says that we should be using non-zero init_value only when
needed for a marginal performance gain. So I limited the plan node id
inclusion to use_variable_hash_iv, but made it unconditional for
aggregates.

I used hash_combine to smear together the bits of the two. I think
that should be good enough. Alternatively a xor of murmurhash32 of the
two should be better.

There is also a call to ExecBuildHash32FromAttrs from ExecInitSubPlan
that specifies a hash_iv for a hashed sub plan execution. This must
match the value BuildTupleHashTable decides on. Worth adding a
comment?

Regards,
Ants Aasma

Attachment

pgsql-hackers by date:

Previous
From: Mihail Nikalayeu
Date:
Subject: Re: Adding REPACK [concurrently]
Next
From: Andrey Borodin
Date:
Subject: Re: [WiP] B-tree page merge during vacuum to reduce index bloat