Re: Fix overflow of nbatch - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Fix overflow of nbatch
Date
Msg-id CAAKRu_bZVZY8XWt5pSif2RV1sJSpJhbd9pON=LUvofqj1FqJvw@mail.gmail.com
Whole thread Raw
In response to Re: Fix overflow of nbatch  (Tomas Vondra <tomas@vondra.me>)
Responses Re: Fix overflow of nbatch
List pgsql-hackers
On Thu, Oct 9, 2025 at 7:36 PM Tomas Vondra <tomas@vondra.me> wrote:
>
> 1) A couple comments adjusted. It feels a bit too audacious to correct
> comments written by native speaker, but it seems cleaner to me like this.

I attached a patch with a few more suggested adjustments (0003). The
more substantive tweaks are:

I don't really like that this comment says it is about nbuckets
overflowing MaxAllocSize because overflow means something specific and
this sounds like we are saying the nbuckets variable will overflow
MaxAllocSize but what we mean is that nbuckets worth of HashJoinTuples
could overflow MaxAllocSize. You don't have to use my wording, but I'm
not sure about this wording either.

/* Check that nbuckets wont't overflow MaxAllocSize */
if (nbuckets > MaxAllocSize / sizeof(HashJoinTuple) / 2)
        break;

Also, upon re-reading the comment we wrote together:

        * With extremely low work_mem values, nbuckets may have been set
        * higher than hash_table_bytes / sizeof(HashJoinTuple). We don't try
        * to correct that here, we accept nbuckets to be oversized.

I'm not so sure it belongs above the nbuckets allocation check.
Perhaps we should move it to where we are doubling nbuckets. Or even
above where we clamp it to 1024.

I'm actually wondering if we want this comment at all. Does it
emphasize an edge case to a confusing point? I'm imagining myself
studying it in the future and having no idea what it means.

I've kept it in but moved it in 0003.

> 4) added the doubling of num_skew_mcvs, and also the overflow protection
> for that

Do we really need to check if num_skew_mcvs will overflow? Shouldn't
it always be smaller than nbuckets? Maybe it can be an assert.

> You suggested this in another message:
>
> > We do need to recalculate num_skew_mcvs once at the end before
> > returning.
>
> But I think the doubling has the same effect, right? I don't want to
> redo the whole "if (useskew) { ... }" block at the end.

Yea, it would have to be some kind of helper or something. I worried
just doubling num_skew_mcvs would drift significantly because of
integer truncation -- perhaps even a meaningful amount. But it was
just an intuition -- I didn't plug in any numbers and try.

- Melanie

Attachment

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Changing shared_buffers without restart
Next
From: Tom Lane
Date:
Subject: Re: add function argument name to substring and substr