Hi,
I did look at this because of the thread about "nbatch overflow" [1].
And the patches I just posted in that thread resolve the issue for me,
in the sense that the reproducer [2] no longer fails for me.
But I think that's actually mostly an accident - the balancing reduces
nbatch, exchanging it for in-memory hash table. In this case we start
with nbatch=2M, but it gets reduced to 64k. Which is low enough to fit
into the 1GB allocation limit.
Which is nice, but I can't guarantee it will always work out like this.
It's unlikely we'd need 2M batches, but is it impossible?
So we may still need something like this the max_batches protection. I
don't think we should apply this to non-parallel hash joins, though.
Which is what the last patch would do, I think.
However, why don't we simply allow huge allocations for this?
/* Allocate space. */
pstate->batches =
dsa_allocate_extended(hashtable->area,
EstimateParallelHashJoinBatch(hashtable) * nbatch,
(DSA_ALLOC_ZERO | DSA_ALLOC_HUGE));
This fixes the issue for me, even with the balancing disabled. Or is
there a reason why this would be a bad idea?
It seems a bit strange to force parallel scans to use fewer batches,
when (presumably) parallelism is more useful for larger data sets.
regards
[1]
https://www.postgresql.org/message-id/244dc6c1-3b3d-4de2-b3de-b1511e6a6d10%40vondra.me
[2]
https://www.postgresql.org/message-id/52b94d5b-a135-489d-9833-2991a69ec623%40garret.ru
--
Tomas Vondra