Re: Strange behavior with polygon and NaN - Mailing list pgsql-hackers

From gkokolatos@pm.me
Subject Re: Strange behavior with polygon and NaN
Date
Msg-id 64PSX1WPq0NPxk54tN4B017GlIq95rlqtDU8yOGJRE1Vhv02xXhIjzwxGPoK8guFdX9m4sSNb4A38JpgndAW-M4rp5qvm7tF9xMTyxHjV9Y=@pm.me
Whole thread Raw
In response to Re: Strange behavior with polygon and NaN  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Strange behavior with polygon and NaN  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers




‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, 27 August 2020 14:24, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

> At Wed, 26 Aug 2020 08:18:49 -0400, Tom Lane tgl@sss.pgh.pa.us wrote in
>
> > Kyotaro Horiguchi horikyota.ntt@gmail.com writes:
> >
> > > At Tue, 25 Aug 2020 19:03:50 -0400, Bruce Momjian bruce@momjian.us wrote in
> > >
> > > > I can confirm that this two-month old email report still produces
> > > > different results with indexes on/off in git master, which I don't think
> > > > is ever correct behavior.
> >
> > > I agree to that the behavior is broken.
> >
> > Yeah, but ... what is "non broken" in this case? I'm not convinced
> > that having point_inside() return zero for any case involving NaN
> > is going to lead to noticeably saner behavior than today.
>
> Yes, just doing that leaves many unfixed behavior come from NaNs, but
> at least it seems to me one of sane definition candidates that a point
> cannot be inside a polygon when NaN is involved. It's similar to
> Fpxx() returns false if NaN is involved. As mentioned, I had't fully
> checked and haven't considered this seriously, but I changed my mind
> to check all the callers. I started checking all the callers of
> point_inside, then finally I had to check all functions in geo_ops.c:(
>

For what is worth, I agree with this definition.


> The attached is the result as of now.
>
> === Resulting behavior
>
> With the attached:
>
> -   All boolean functions return false if NaN is involved.
> -   All float8 functions return NaN if NaN is involved.
> -   All geometric arithmetics return NaN as output if NaN is involved.

Agreed! As in both this behaviour conforms to the definition above and the patch provides this behaviour with the
exceptionsbelow. 

>
>     With some exceptions:
>
> -   line_eq: needs to consider that NaNs are equal each other.
> -   point_eq/ne (point_eq_pint): ditto
> -   lseg_eq/ne: ditto
>
>     The change makes some difference in the regression test.
>     For example,
>
>     <obj containing NaN> <-> <any obj> changed from 0 to NaN. (distance)
>
>
> <obj containing NaN> <@ <any obj> changed from true to false. (contained)
> <obj containing NaN> <-> <any obj> changed from 0 to NaN. (distance)
> <obj containing NaN> ?# <any obj> changed from true to false (overlaps)
>
> === pg_hypot mistake?
>
> I noticed that pg_hypot returns inf for the parameters (NaN, Inf) but
> I think NaN is appropriate here since other operators behaves that
> way. This change causes a change of distance between point(1e+300,Inf)
> and line (1,-1,0) from infinity to NaN, which I think is correct
> because the arithmetic generates NaN as an intermediate value.
>
> === Infinity annoyances
>
> Infinity makes some not-great changes in regresssion results. For example:
>
> -   point '(1e+300,Infinity)' <-> path '((10,20))' returns
>     NaN(previously Infinity), but point '(1e+300,Infinity)' <-> path
>     '[(1,2),(3,4)]' returns Infinity. The difference of the two
>     expressions is whether (0 * Inf = NaN) is performed or not. The
>     former performs it but that was concealed by not propagating NaN to
>     upper layer without the patch.

Although I understand the reasoning for this change. I am not certain I agree with the result. I feel that:
    point '(1e+300,Infinity)' <-> path '((10,20))'
should return Infinity. Even if I am wrong to think that, the two results should be expected to behave the same. Am I
wrongon that too? 


>
> -   Without the patch, point '(1e+300,Infinity)' ## box '(2,2),(0,0)'
>     generates '(0,2)', which is utterly wrong. It is because
>     box_closept_point evaluates float8_lt(Inf, NaN) as true(!) and sets
>     the wrong point for distance=NaN is set. With the patch, the NaN
>     makes the result NULL.

Agreed.

>
> -   This is not a difference caused by this patch, but for both patched
>     and unpatched, point '(1e+300,Inf)' <-> line '{3,0,0}' returns NaN,
>     which should be 1e+300. However, the behavior comes from arithmetic
>     reasons and wouldn't be a problem.
>
>     create_index.out is changed since point(NaN,NaN) <@ polygon changed
>     from true to false, which seems rather saner.
>
>     I haven't checked unchanged results but at least changed results seems
>     saner to me.

All in all a great patch!

It is clean, well reasoned and carefully crafted.

Do you think that the documentation needs to get updated to the 'new' behaviour?


//Georgios


>
>     regards.
>
>     --
>     Kyotaro Horiguchi
>     NTT Open Source Software Center
>


Attachment

pgsql-hackers by date:

Previous
From: Surafel Temesgen
Date:
Subject: Re: proposal: possibility to read dumped table's name from file
Next
From: Amit Kapila
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions