Thread: Document if width_bucket's low and high are inclusive/exclusive

Document if width_bucket's low and high are inclusive/exclusive

From
Ben Peachey Higdon
Date:
The current documentation for width_bucket (https://www.postgresql.org/docs/current/functions-math.html) does not mention if the range’s low and high are inclusive or exclusive.

Returns the number of the bucket in which operand falls in a histogram having count equal-width buckets spanning the range low to high. Returns 0 or count+1 for an input outside that range.

I had assumed that both the low and high were inclusive but actually the low is inclusive while the high is exclusive.

For example:
SELECT width_bucket(0, 0, 1, 4)

returns 1, the first of 4 bins

SELECT width_bucket(1, 0, 1, 4) 

returns 5, because the high was outside the exclusive bound of high = 1

Thank you!
On Fri, Feb 28, 2025 at 7:15 AM Ben Peachey Higdon
<bpeacheyhigdon@gmail.com> wrote:
> The current documentation for width_bucket (https://www.postgresql.org/docs/current/functions-math.html) does not
mentionif the range’s low and high are inclusive or exclusive. 
>
> Returns the number of the bucket in which operand falls in a histogram having count equal-width buckets spanning the
rangelow to high. Returns 0 or count+1 for an input outside that range. 
>
> I had assumed that both the low and high were inclusive but actually the low is inclusive while the high is
exclusive.
>

I'm not sure it's the most ground breaking thing, but would probably
save a bunch of future people from having to gin up an example to test
it, so I'd probably update it per the following patch.


Robert Treat
https://xzilla.net

Attachment
On Wed, Jun 18, 2025 at 4:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Treat <rob@xzilla.net> writes:
> > On Fri, Feb 28, 2025 at 7:15 AM Ben Peachey Higdon
> > <bpeacheyhigdon@gmail.com> wrote:
> >> The current documentation for width_bucket (https://www.postgresql.org/docs/current/functions-math.html) does not
mentionif the range’s low and high are inclusive or exclusive. 
>
> > I'm not sure it's the most ground breaking thing, but would probably
> > save a bunch of future people from having to gin up an example to test
> > it, so I'd probably update it per the following patch.
>
> Seems reasonable, but do we need to do anything with the other
> version of width_bucket (the one taking an array of lower bounds)?
> Perhaps this change provides enough context, but I'm unsure.
>

Since they are all lower bounds, they all operate the same way, so it
isn't quite as clear that it needs documenting. Are you thinking
something like this?

Returns the number of the bucket in which operand falls given an array
listing the lower bounds (inclusive) of the buckets


Robert Treat
https://xzilla.net



Robert Treat <rob@xzilla.net> writes:
> Since they are all lower bounds, they all operate the same way, so it
> isn't quite as clear that it needs documenting. Are you thinking
> something like this?

> Returns the number of the bucket in which operand falls given an array
> listing the lower bounds (inclusive) of the buckets

Yeah, though I might write "inclusive lower bounds" rather than use
parens.  What's bugging me though is the lack of any mention of the
bucket upper bounds: you have to deduce that the upper bounds must
be exclusive if the lower bounds are inclusive.  If that's obvious
here, why is it non-obvious for the other case?  Maybe instead of
the parenthetical form you suggested, add a sentence like

    Buckets have inclusive lower bounds, and therefore exclusive
    upper bounds.

and then we could either rely on the reader remembering that,
or else repeat it, for the second form of width_bucket.

Another thing I just remembered (think I knew it once) is the
behavior of the first form when low > high.  It's not an error!
I think we need to document that, perhaps along the lines of

    If low > high, the behavior is mirror-reversed, with bucket 1
    now being the one just below low, and the inclusive bounds
    now being on the upper side.

plus an example.

            regards, tom lane



On Fri, 20 Jun 2025 at 22:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> So concretely, how about the attached?
>

LGTM (though I'm not sure it really needs the word "therefore" in the
first hunk).

There are also a couple of code comments that need fixing --
width_bucket_float8() comes with the following comment:

 * 'bound1' and 'bound2' are the lower and upper bounds of the
 * histogram's range, respectively. 'count' is the number of buckets
 * in the histogram. width_bucket() returns an integer indicating the
 * bucket number that 'operand' belongs to in an equiwidth histogram
 * with the specified characteristics. An operand smaller than the
 * lower bound is assigned to bucket 0. An operand greater than the
 * upper bound is assigned to an additional bucket (with number
 * count+1). We don't allow "NaN" for any of the float8 inputs, and we
 * don't allow either of the histogram bounds to be +/- infinity.

so at the very least, that should be made to say "greater than or
equal to", instead of "greater than". Similarly for
width_bucket_numeric().

Also, since PG14, type numeric has supported infinity, so its comment
should probably include that last part about not allowing +/- infinity
in the histogram bounds.

Regards,
Dean



Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> On Fri, 20 Jun 2025 at 22:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So concretely, how about the attached?

> LGTM (though I'm not sure it really needs the word "therefore" in the
> first hunk).

OK, done that way.

> There are also a couple of code comments that need fixing --

Good points, also done.

While looking at those comments, I also noted that there is a
strange inconsistency between width_bucket_array and
width_bucket_float8/width_bucket_numeric.  Namely, the latter
two reject an "operand" that is NaN, while width_bucket_array
goes out of its way to accept it and treat it in our usual
fashion as sorting higher than all non-NaNs.

Clearly these functions must reject NaN histogram bounds, for
the same reason they reject infinite bounds.  But I don't see
any reason why they couldn't treat a NaN operand as valid.
Should we change them?  (I imagine this'd be a HEAD-only
change, and probably v19 material at this point.)

            regards, tom lane



On Sat, 21 Jun 2025 at 18:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> While looking at those comments, I also noted that there is a
> strange inconsistency between width_bucket_array and
> width_bucket_float8/width_bucket_numeric.  Namely, the latter
> two reject an "operand" that is NaN, while width_bucket_array
> goes out of its way to accept it and treat it in our usual
> fashion as sorting higher than all non-NaNs.
>
> Clearly these functions must reject NaN histogram bounds, for
> the same reason they reject infinite bounds.  But I don't see
> any reason why they couldn't treat a NaN operand as valid.
> Should we change them?  (I imagine this'd be a HEAD-only
> change, and probably v19 material at this point.)
>

Yes, I think that's a good idea (for v19 I would have thought).
Allowing the operand to be NaN definitely seems preferable to throwing
an error, since the operand might well come from data in a table
containing NaNs.

Regards,
Dean



Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> On Sat, 21 Jun 2025 at 18:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Clearly these functions must reject NaN histogram bounds, for
>> the same reason they reject infinite bounds.  But I don't see
>> any reason why they couldn't treat a NaN operand as valid.
>> Should we change them?  (I imagine this'd be a HEAD-only
>> change, and probably v19 material at this point.)

> Yes, I think that's a good idea (for v19 I would have thought).
> Allowing the operand to be NaN definitely seems preferable to throwing
> an error, since the operand might well come from data in a table
> containing NaNs.

I started a new thread for that, since it's no longer docs material:

https://www.postgresql.org/message-id/2822872.1750540911%40sss.pgh.pa.us

            regards, tom lane