Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types - Mailing list pgsql-hackers
From | Emre Hasegeli |
---|---|
Subject | Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types |
Date | |
Msg-id | CAE2gYzzNufOZqh4HO3Od8urzamNSeX-OXJxfNkRcU3ex9RD8jw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Responses |
Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types
|
List | pgsql-hackers |
> - Floating point comparisons for gemetric types > > Comparison related to geometric types is performed by FPeq > macro and similars defined in geo_decls.h. This intends to give > tolerance to the comparisons. > > A > FPeq: |<=e-|-e=>| (<= means inclusive, e = epsilon = tolerance) > FPne: ->| e | e |<- (<- means exclusive) > FPlt: | e |<- > FPle: |<=e | > FPgt: ->| e | > FPge: | e=>| > > These seems reasonable ignoring the tolerance amount issue. I tried to explain below why I don't find this method reasonable. > At least for the point type, (bitmap) index scan is consistent > with sequential scan. I remember that the issue was > "inconsistency between indexed and non-indexed scans over > geometric types" but they seem consistent with each other. You can see on the Git history that it was a lot of trouble to make the indexes consistent with the operators, and this is limiting improving indexing of the geometric types. > You mentioned b-tree, which don't have predefined opclass for > geometric types. So the "inconsistency" may be mentioning the > difference between geometric values and combinations of plain > (first-class) values. But the two are different things and > apparently using different operators (~= and = for equality) so > the issue is not fit for this. More specifically, "p ~= p0" and > "x = x0 and y = y0" are completely different. Yes, the default btree operator class doesn't have to implement standard equality and basic comparison operator symbols. We can implement it with different symbols. > Could you let me (or any other guys on this ml) have more precise > description on the problem and/or what you want to do with this > patch? I will try to itemize the current problems with the epsilon method: = Operator Inconsistencies My current patches address all of them. - Only some operators are using the epsilon. I included the list of the ones which doesn't use the epsilon on initial post of this thread. This makes the operators not compatible with each other. For example, you cannot say "if box_a @> box_b and box_b @> point then box_a @> box_b". - The application of epsilon is implementation dependent. Epsilon works between two numbers. There is not always a single way of implementing geometric operators. This is surprising to the users. For example, you cannot say "if box_a @> box_b then box_a <-> box_b <= epsilon". This is also creating a high barrier on developing those operators. Fixing bugs and performance regressions are very likely to change the results. - Some operators are just wrong: Line operators are not well maintained. The following are giving wrong results when neither A or B of lines are not exactly 1: * line # line * line <-> line * point ## line * point ## lseg They are broken independently from the epsilon, though it is not easy to fix them without changing the results of the cases that are currently working. - Some operators are violating commutative property. For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a". - Some operators are violating transitive property. For example, you cannot say "if point_a ~= point_b and point_b ~= point_c then point_a ~= point_c". - The operators with epsilon are not compatible with float operators. This is also surprising for the users. For example, you cannot say "if point_a << point_b then point_a[0] < point_b[0]". - The operators are not checking for underflow or overflow. - NaN values are not treated same as the float datatype. Those are independent problems, though checking them with epsilon would create additional set of inconsistencies. Epsilon creates especially more problems around 0. = Missing Bits on the Operator Classes I want to address those in the future, if my patches are accepted. - There is no GiST or SP-GiST support for cross datatype operators other than containment. We can develop nice cross-datatype operator families to support more operators. It would have been easier, if we could rely on some operators to implement others. Geometric types serve the purpose of demonstrating our indexing capabilities. We should make them good examples, not full of special cases with hidden bugs. - There is no BRIN support for types other than the box. BRIN inclusion operator class is datatype agnostic. It uses some operators to implement others which doesn't work for anything more the box vs box operators, because of the inconsistencies. - GROUP BY and DISTINCT are not working. - ORDER BY is not working. There are regular user complaints about this on the mailing list. We can easily provide default hash and btree operator classes that would fix the problem, if we haven't had the epsilon.
pgsql-hackers by date: