On 10/16/24 16:19, Thomas Munro wrote:
> On Mon, Oct 14, 2024 at 10:16 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>> See the attachment for a sketch of the solution.
>
> Thanks Andrei, I mostly agree with your analysis, but I came up with a
> slightly different patch. I think we should check for extreme skew if
> old_batch->space_exhausted (the parent partition). Your sketch always
> does it for batch 0, which works for these examples but I don't think
> it's strictly correct: if batch 0 didn't run out of memory, it might
> falsely report extreme skew just because it had (say) 0 or 1 tuples.
Yeah, I misunderstood the meaning of the estimated_size variable. Your
solution is more universal. Also, I confirm, it passes my synthetic test.
Also, it raises the immediate question: What if we have too many
duplicates? Sometimes, in user complaints, I see examples where they,
analysing the database's logical consistency, pass through millions of
duplicates to find an unexpected value. Do we need a top memory
consumption limit here? I recall a thread in the mailing list with a
general approach to limiting backend memory consumption, but it is
finished with no result.
The patch looks good as well as commentary.
--
regards, Andrei Lepikhov