Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Sun, Apr 21, 2019 at 10:36:43AM -0400, Tom Lane wrote:
>> Jeff Janes <jeff.janes@gmail.com> writes:
>>> If a single 32 bit hash value has enough tuples by itself to not fit in
>>> work_mem, then it will keep splitting until that value is in a batch by
>>> itself before shutting off
>> Right, that's the code's intention. If that's not good enough for this
>> case, we'll need to understand the details a bit better before we can
>> design a better(?) heuristic.
> I think we only disable growing when there are no other values in the
> batch, but that seems rather easy to defeat - all you need is a single
> tuple with a hash that falls into the same batch, and it's over. Maybe
> we should make this a bit less accurate - say, if less than 5% memory
> gets freed, don't add more batches.
Yeah, something like that, but it's hard to design it without seeing some
concrete misbehaving examples.
I think though that this is somewhat independent of the problem that we're
not including the I/O buffers in our reasoning about memory consumption.
> An alternative would be spilling the extra tuples into a special
> overflow file, as I explained earlier. That would actually enforce
> work_mem I think.
Well, no, it won't. If you have umpteen gigabytes of RHS tuples with the
exact same hash code, nothing we can do here is going to prevent you from
having to process those in a single table load. (If it's a plain inner
join, maybe you could break that into subsections anyway ... but that
won't work for left or full joins where you need per-tuple match status.)
I think our ambition here should just be to not have the code go crazy
trying to keep its memory consumption under work_mem when it's ultimately
going to fail to do so anyhow.
regards, tom lane