Re: Incorrect CHUNKHDRSZ in nodeAgg.c - Mailing list pgsql-hackers

From David Rowley
Subject Re: Incorrect CHUNKHDRSZ in nodeAgg.c
Date
Msg-id CAApHDvozg3jDN4oVYvtpA91P1DA50gdRZ5071SnPg_24oL5PbA@mail.gmail.com
Whole thread Raw
In response to Re: Incorrect CHUNKHDRSZ in nodeAgg.c  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Incorrect CHUNKHDRSZ in nodeAgg.c
Re: Incorrect CHUNKHDRSZ in nodeAgg.c
List pgsql-hackers
On Thu, 2 Jan 2025 at 12:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I thought for a bit about whether we shouldn't try to account for
> palloc power-of-2-block-size overhead here.  That omission would
> typically be a far larger error than the one you are fixing.  However,
> given that the inputs to hash_agg_entry_size are only estimates,
> I'm not sure that we can hope to do better than the current behavior.

Likely the most correct way would be to use GetMemoryChunkSpace(), but
there might be some additional overhead to consider there. I looked at
[1] and didn't see any mention of using that function for this
purpose. There likely is a small overhead to doing so, which is
something to consider.

> Should tuple hash tables be using a different memory context type
> that doesn't impose that power-of-2 overhead?  It's only useful
> when we expect a fair amount of pfree-and-recycle behavior, but
> I think we don't have much retail entry removal in tuple hash
> tables.  Could we use a generation or even bump context?

Bump wouldn't work due to the SH_FREE() in SH_GROW() when resizing the
table. If sizeof(TupleHashEntryData) were a power-of-two, then there'd
be no wastage as the hash table always has a power-of-two bucket count
and two powers-of-two multiplied are always a power-of-two value.
Unfortunately, TupleHashEntryData is 24 bytes and I don't see any easy
way to shrink it to 16 bytes.

I think what would be more interesting is seeing if we can store the
TupleHashEntryData.firstTuple in a bump context. I'd need to check if
we ever pfree() those minimal tuples or if we just throw the entire
batch of tuples away with a context reset. Since these minimal tuples
only contain the GROUP BY columns, for most cases they should be very
narrow tuples indeed, so not having a MemoryChunk header would save
quite a bit of memory in many cases. I think that's basically
1083f94da plus 6ed83d5fa for nodeAgg.c.

David

[1]
https://www.postgresql.org/message-id/flat/20200325220936.il3ni2fj2j2b45y5%40alap3.anarazel.de#78592829b675371eadf592a99897bcb3



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Next
From: Tom Lane
Date:
Subject: Re: Incorrect CHUNKHDRSZ in nodeAgg.c