Re: Wrong results from in_range() tests with infinite offset - Mailing list pgsql-hackers
From | Dean Rasheed |
---|---|
Subject | Re: Wrong results from in_range() tests with infinite offset |
Date | |
Msg-id | CAEZATCVQO1vUXo0PjHToxvyZ8piwH-u-mTMqMhrGku2dYMhWbw@mail.gmail.com Whole thread Raw |
In response to | Re: Wrong results from in_range() tests with infinite offset (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Wrong results from in_range() tests with infinite offset
|
List | pgsql-hackers |
On Fri, 17 Jul 2020 at 01:59, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Dean Rasheed <dean.a.rasheed@gmail.com> writes: > > On Thu, 16 Jul 2020, 22:50 Tom Lane, <tgl@sss.pgh.pa.us> wrote: > >> Actually, after staring at those results awhile longer, I decided > >> they were wrong. The results shown here seem actually sane --- > >> for instance, -Infinity shouldn't "infinitely precede" itself, > >> I think. (Maybe if you got solipsistic enough you could argue > >> that that is valid, but it seems pretty bogus.) > > > Hmm, that code looks a bit fishy to me, but I really need to think about it > > some more. I'll take another look tomorrow, and maybe it'll become clearer. > > It's certainly verbose, so I'd like to find a more concise way to > write the logic. But the v2 results seem right. > I'm finding it hard to come up with a principled argument to say exactly what the right results should be. As things stand (pre-patch), a window frame defined as "BETWEEN 'inf' PRECEDING AND 'inf' PRECEDING", produces the following: id | f_float4 | first_value | last_value ----+-----------+-------------+------------ 0 | -Infinity | | 1 | -3 | | 2 | -1 | | 3 | 0 | | 4 | 1.1 | | 5 | 1.12 | | 6 | 2 | | 7 | 100 | | 8 | Infinity | | 9 | NaN | 9 | 9 (10 rows) which is clearly wrong, because -Inf obviously infinitely precedes all the other (non-NaN) values. With the first version of the patch, that result became id | f_float4 | first_value | last_value ----+-----------+-------------+------------ 0 | -Infinity | 0 | 0 1 | -3 | 0 | 0 2 | -1 | 0 | 0 3 | 0 | 0 | 0 4 | 1.1 | 0 | 0 5 | 1.12 | 0 | 0 6 | 2 | 0 | 0 7 | 100 | 0 | 0 8 | Infinity | 0 | 0 9 | NaN | 9 | 9 (10 rows) which is definitely better, but the one obvious problem is last_value for id=8, because all the values in earlier rows infinitely precede +Inf, so they should be included in the window frame for that row. With the second version of the patch, the result is id | f_float4 | first_value | last_value ----+-----------+-------------+------------ 0 | -Infinity | | 1 | -3 | 0 | 0 2 | -1 | 0 | 0 3 | 0 | 0 | 0 4 | 1.1 | 0 | 0 5 | 1.12 | 0 | 0 6 | 2 | 0 | 0 7 | 100 | 0 | 0 8 | Infinity | 0 | 7 9 | NaN | 9 | 9 (10 rows) That fixes last_value for id=8, using the fact that all values less than +Inf infinitely precede it, and also assuming that +Inf does not infinitely precede itself, which seems reasonable. The other change is in the first row, because it now assumes that -Inf doesn't infinitely precede itself, which seems reasonable from a consistency point of view. However, that is also a bit odd because it goes against the documented contract of in_range(), which is supposed to do the tests val <= base +/- offset1 val >= base +/- offset2 which for "BETWEEN 'inf' PRECEDING AND 'inf' PRECEDING" become val = base - Inf which is -Inf, even if base = -Inf. So I'd say that the window infinitely preceding -Inf contains -Inf, since -Inf - Inf = -Inf. But if -Inf infinitely precedes -Inf, it probably also makes sense to say that +Inf infinitely precedes +Inf for consistency, even though that really isn't well-defined, since Inf - Inf = NaN. Doing that is certainly a lot easier to code, because it just needs to return true if base +/- offset would be NaN, i.e., /* * Deal with cases where both base and offset are infinite, and computing * base +/- offset would produce NaN. This corresponds to a window frame * whose boundary infinitely precedes +inf or infinitely follows -inf, * which is not well-defined. For consistency with other cases involving * infinities, such as the fact that +inf infinitely follows +inf, we * choose to assume that +inf infinitely precedes +inf and -inf infinitely * follows -inf, and therefore that all finite and infinite values are in * such a window frame. */ if (isinf(base) && isinf(offset)) { if ((base > 0 && sub) || (base < 0 && !sub)) PG_RETURN_BOOL(true); } and the result is id | f_float8 | first_value | last_value ----+-----------+-------------+------------ 0 | -Infinity | 0 | 0 1 | -3 | 0 | 0 2 | -1 | 0 | 0 3 | 0 | 0 | 0 4 | 1.1 | 0 | 0 5 | 1.12 | 0 | 0 6 | 2 | 0 | 0 7 | 100 | 0 | 0 8 | Infinity | 0 | 8 9 | NaN | 9 | 9 (10 rows) which looks about equally sensible. To me, the fact that the window infinitely preceding -Inf includes -Inf makes more sense, but the meaning of the window infinitely preceding +Inf is much less obvious, and not really well-defined. Regards, Dean
pgsql-hackers by date: