On Fri, 2024-11-08 at 10:48 -0800, Jeff Davis wrote:
> I can think of two approaches to solve it:
Another thought: the point of spilling is to avoid using additional
memory by adding a group.
If the bucket array is 89.8% full, adding a new group doesn't require
new buckets need to be allocated, so we only have the firstTuple and
pergroup data to worry about. If that causes the memory limit to be
exceeded, it's the perfect time to switch to spill mode.
But if the bucket array is 89.9% full, then adding a new group will
cause the bucket array to double. If that causes the memory limit to be
exceeded, then we can switch to spill mode, but it's wasteful to do so
because (a) we won't be using most of those new buckets; (b) the new
buckets will crowd out space for subsequent batches and even fewer of
the buckets will be used; and (c) the memory accounting can be off by
quite a bit.
What if we have a check where, if the metacxt is using more than 40% of
the memory, and if adding a new group would reach the grow_threshold,
then enter spill mode immediately? To make this work, I think we either
need to use a tuplehash_lookup() followed by a tuplehash_insert() (two
lookups for each new group), or we would need a new API into simplehash
like tuplehash_insert_without_growing() that would return NULL instead
of growing. This approach might not be backportable, but it might be a
good approach for 18+.
Regards,
Jeff Davis