Simon Riggs <simon@2ndquadrant.com> writes:
> On Mon, 2006-01-16 at 20:02 -0500, Tom Lane wrote:
>> But our idea of the number of batches needed can change during that
>> process, resulting in some inner tuples being initially assigned to the
>> wrong temp file. This would also be true for hashagg.
> So we correct that before we start reading the outer table.
Why? That would require a useless additional pass over the data. With
the current design, we can process and discard at least *some* of the
data in a temp file when we read it, but a reorganization pass would
mean that it *all* goes back out to disk a second time.
Also, you assume that we can accurately tell how many tuples will fit in
memory in advance of actually processing them --- a presumption clearly
false in the hashagg case, and not that easy to do even for hashjoin.
(You can tell the overall size of a temp file, sure, but how do you know
how it will split when the batch size changes? A perfectly even split
is unlikely.)
> OK, I see what you mean. Sounds like we should have a new definition for
> Aggregates, "Sort Insensitive" that allows them to work when the input
> ordering does not effect the result, since that case can be optimised
> much better when using HashAgg.
Please don't propose pushing this problem onto the user until it's
demonstrated that there's no other way. I don't want to become the
next Oracle, with forty zillion knobs that it takes a highly trained
DBA to deal with.
> But all of them sound ugly.
I was thinking along the lines of having multiple temp files per hash
bucket. If you have a tuple that needs to migrate from bucket M to
bucket N, you know that it arrived before every tuple that was assigned
to bucket N originally, so put such tuples into a separate temp file
and process them before the main bucket-N temp file. This might get a
little tricky to manage after multiple hash resizings, but in principle
it seems doable.
regards, tom lane