Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types - Mailing list pgsql-hackers

From Emre Hasegeli
Subject Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types
Date
Msg-id CAE2gYzzNufOZqh4HO3Od8urzamNSeX-OXJxfNkRcU3ex9RD8jw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types
List pgsql-hackers
> - Floating point comparisons for gemetric types
>
>   Comparison related to geometric types is performed by FPeq
>   macro and similars defined in geo_decls.h. This intends to give
>   tolerance to the comparisons.
>
>                  A
>   FPeq:     |<=e-|-e=>|    (<= means inclusive, e = epsilon = tolerance)
>   FPne:   ->|  e | e  |<-  (<- means exclusive)
>   FPlt:          | e  |<-
>   FPle:     |<=e |
>   FPgt:   ->|  e |
>   FPge:          | e=>|
>
>   These seems reasonable ignoring the tolerance amount issue.

I tried to explain below why I don't find this method reasonable.

> At least for the point type, (bitmap) index scan is consistent
> with sequential scan.  I remember that the issue was
> "inconsistency between indexed and non-indexed scans over
> geometric types" but they seem consistent with each other.

You can see on the Git history that it was a lot of trouble to make
the indexes consistent with the operators, and this is limiting
improving indexing of the geometric types.

> You mentioned b-tree, which don't have predefined opclass for
> geometric types. So the "inconsistency" may be mentioning the
> difference between geometric values and combinations of plain
> (first-class) values. But the two are different things and
> apparently using different operators (~= and = for equality) so
> the issue is not fit for this. More specifically, "p ~= p0" and
> "x = x0 and y = y0" are completely different.

Yes, the default btree operator class doesn't have to implement
standard equality and basic comparison operator symbols.  We can
implement it with different symbols.

> Could you let me (or any other guys on this ml) have more precise
> description on the problem and/or what you want to do with this
> patch?

I will try to itemize the current problems with the epsilon method:

= Operator Inconsistencies

My current patches address all of them.

- Only some operators are using the epsilon.

I included the list of the ones which doesn't use the epsilon on
initial post of this thread.  This makes the operators not compatible
with each other.  For example, you cannot say "if box_a @> box_b and
box_b @> point then box_a @> box_b".

- The application of epsilon is implementation dependent.

Epsilon works between two numbers.  There is not always a single way
of implementing geometric operators.  This is surprising to the users.
For example, you cannot say "if box_a @> box_b then box_a <-> box_b <=
epsilon".

This is also creating a high barrier on developing those operators.
Fixing bugs and performance regressions are very likely to change the
results.

- Some operators are just wrong:

Line operators are not well maintained.  The following are giving
wrong results when neither A or B of lines are not exactly 1:

* line # line
* line <-> line
* point ## line
* point ## lseg

They are broken independently from the epsilon, though it is not easy
to fix them without changing the results of the cases that are
currently working.

- Some operators are violating commutative property.

For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a".

- Some operators are violating transitive property.

For example, you cannot say "if point_a ~= point_b and point_b ~=
point_c then point_a ~= point_c".

- The operators with epsilon are not compatible with float operators.

This is also surprising for the users.  For example, you cannot say
"if point_a << point_b then point_a[0] < point_b[0]".

- The operators are not checking for underflow or overflow.
- NaN values are not treated same as the float datatype.

Those are independent problems, though checking them with epsilon
would create additional set of inconsistencies.  Epsilon creates
especially more problems around 0.

= Missing Bits on the Operator Classes

I want to address those in the future, if my patches are accepted.

- There is no GiST or SP-GiST support for cross datatype operators
other than containment.

We can develop nice cross-datatype operator families to support more
operators.  It would have been easier, if we could rely on some
operators to implement others.  Geometric types serve the purpose of
demonstrating our indexing capabilities.  We should make them good
examples, not full of special cases with hidden bugs.

- There is no BRIN support for types other than the box.

BRIN inclusion operator class is datatype agnostic.  It uses some
operators to implement others which doesn't work for anything more the
box vs box operators, because of the inconsistencies.

- GROUP BY and DISTINCT are not working.
- ORDER BY is not working.

There are regular user complaints about this on the mailing list.  We
can easily provide default hash and btree operator classes that would
fix the problem, if we haven't had the epsilon.



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] WARM and indirect indexes
Next
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] Logical replication existing data copy