Re: Wrong results from in_range() tests with infinite offset - Mailing list pgsql-hackers

From Dean Rasheed
Subject Re: Wrong results from in_range() tests with infinite offset
Date
Msg-id CAEZATCVQO1vUXo0PjHToxvyZ8piwH-u-mTMqMhrGku2dYMhWbw@mail.gmail.com
Whole thread Raw
In response to Re: Wrong results from in_range() tests with infinite offset  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Wrong results from in_range() tests with infinite offset
List pgsql-hackers
On Fri, 17 Jul 2020 at 01:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> > On Thu, 16 Jul 2020, 22:50 Tom Lane, <tgl@sss.pgh.pa.us> wrote:
> >> Actually, after staring at those results awhile longer, I decided
> >> they were wrong.  The results shown here seem actually sane ---
> >> for instance, -Infinity shouldn't "infinitely precede" itself,
> >> I think.  (Maybe if you got solipsistic enough you could argue
> >> that that is valid, but it seems pretty bogus.)
>
> > Hmm, that code looks a bit fishy to me, but I really need to think about it
> > some more. I'll take another look tomorrow, and maybe it'll become clearer.
>
> It's certainly verbose, so I'd like to find a more concise way to
> write the logic.  But the v2 results seem right.
>

I'm finding it hard to come up with a principled argument to say
exactly what the right results should be.

As things stand (pre-patch), a window frame defined as "BETWEEN 'inf'
PRECEDING AND 'inf' PRECEDING", produces the following:

 id | f_float4  | first_value | last_value
----+-----------+-------------+------------
  0 | -Infinity |             |
  1 |        -3 |             |
  2 |        -1 |             |
  3 |         0 |             |
  4 |       1.1 |             |
  5 |      1.12 |             |
  6 |         2 |             |
  7 |       100 |             |
  8 |  Infinity |             |
  9 |       NaN |           9 |          9
(10 rows)

which is clearly wrong, because -Inf obviously infinitely precedes all
the other (non-NaN) values.

With the first version of the patch, that result became

 id | f_float4  | first_value | last_value
----+-----------+-------------+------------
  0 | -Infinity |           0 |          0
  1 |        -3 |           0 |          0
  2 |        -1 |           0 |          0
  3 |         0 |           0 |          0
  4 |       1.1 |           0 |          0
  5 |      1.12 |           0 |          0
  6 |         2 |           0 |          0
  7 |       100 |           0 |          0
  8 |  Infinity |           0 |          0
  9 |       NaN |           9 |          9
(10 rows)

which is definitely better, but the one obvious problem is last_value
for id=8, because all the values in earlier rows infinitely precede
+Inf, so they should be included in the window frame for that row.

With the second version of the patch, the result is

 id | f_float4  | first_value | last_value
----+-----------+-------------+------------
  0 | -Infinity |             |
  1 |        -3 |           0 |          0
  2 |        -1 |           0 |          0
  3 |         0 |           0 |          0
  4 |       1.1 |           0 |          0
  5 |      1.12 |           0 |          0
  6 |         2 |           0 |          0
  7 |       100 |           0 |          0
  8 |  Infinity |           0 |          7
  9 |       NaN |           9 |          9
(10 rows)

That fixes last_value for id=8, using the fact that all values less
than +Inf infinitely precede it, and also assuming that +Inf does not
infinitely precede itself, which seems reasonable.

The other change is in the first row, because it now assumes that -Inf
doesn't infinitely precede itself, which seems reasonable from a
consistency point of view.

However, that is also a bit odd because it goes against the documented
contract of in_range(), which is supposed to do the tests

  val <= base +/- offset1
  val >= base +/- offset2

which for "BETWEEN 'inf' PRECEDING AND 'inf' PRECEDING" become

  val = base - Inf

which is -Inf, even if base = -Inf. So I'd say that the window
infinitely preceding -Inf contains -Inf, since -Inf - Inf = -Inf.

But if -Inf infinitely precedes -Inf, it probably also makes sense to
say that +Inf infinitely precedes +Inf for consistency, even though
that really isn't well-defined, since Inf - Inf = NaN. Doing that is
certainly a lot easier to code, because it just needs to return true
if base +/- offset would be NaN, i.e.,

    /*
     * Deal with cases where both base and offset are infinite, and computing
     * base +/- offset would produce NaN.  This corresponds to a window frame
     * whose boundary infinitely precedes +inf or infinitely follows -inf,
     * which is not well-defined.  For consistency with other cases involving
     * infinities, such as the fact that +inf infinitely follows +inf, we
     * choose to assume that +inf infinitely precedes +inf and -inf infinitely
     * follows -inf, and therefore that all finite and infinite values are in
     * such a window frame.
     */
    if (isinf(base) && isinf(offset))
    {
        if ((base > 0 && sub) || (base < 0 && !sub))
            PG_RETURN_BOOL(true);
    }

and the result is

 id | f_float8  | first_value | last_value
----+-----------+-------------+------------
  0 | -Infinity |           0 |          0
  1 |        -3 |           0 |          0
  2 |        -1 |           0 |          0
  3 |         0 |           0 |          0
  4 |       1.1 |           0 |          0
  5 |      1.12 |           0 |          0
  6 |         2 |           0 |          0
  7 |       100 |           0 |          0
  8 |  Infinity |           0 |          8
  9 |       NaN |           9 |          9
(10 rows)

which looks about equally sensible. To me, the fact that the window
infinitely preceding -Inf includes -Inf makes more sense, but the
meaning of the window infinitely preceding +Inf is much less obvious,
and not really well-defined.

Regards,
Dean



pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: factorial function/phase out postfix operators?
Next
From: Amit Kapila
Date:
Subject: Re: Patch for reorderbuffer.c documentation.