Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types |
Date | |
Msg-id | 20170126.211122.140249805.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types (Emre Hasegeli <emre@hasegeli.com>) |
Responses |
Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types
|
List | pgsql-hackers |
Hello, At Thu, 26 Jan 2017 11:53:28 +0100, Emre Hasegeli <emre@hasegeli.com> wrote in <CAE2gYzwYLcx-3ffToVm7JEEniYt1fU31y5BAikXzEqvCbQyTMg@mail.gmail.com> > > Even though I'm not sure but I don't see a "natural" (or > > agreeable by many poeple) ordering of geometric types in > > general. Anyway it's quite application (not application program > > but the relationship with the real world) specific. > > We can just define it for point as "ORDER BY point.x, point.y". It's nonsense. Totally for a convenient. Anyone can do so on their own application but PostgreSQL cannot have it as a platform. > > What we should not forget is that PostGIS does the same thing and > > it is widly used (I believe..). This means it not broken at least > > on a certain context. But it is a fact that it also quite > > inconvenient to get performance from, say, indexes. > > I understand from Paul Ramsey's email [1] on this thread that PostGIS > doesn't currently have a tolerance. Thank you for the pointer. (My memory is too small as 8bit CPU) Looking into the source of PostGIS 2.3.0 (Maybe the latest) Surely EPSILON is used is someplaces. FPeq and similars are also defined. liblwgeom_internal.h even defines another tolerance EPSILON_SQLMM.. Paul> The real answer for a GIS system is to have an explicit tolerance Paul> parameter for calculations like distance/touching/containment, but Paul> unfortunately we didn't do that so now we have our own Paul> compatibility/boil the ocean problem if we ever wanted/were funded to Paul> add one. This doesn't seem saying PostGIS doesn't have fixed-amount tolerance. Alhtough we don't necessarily need compatibility with PostGIS, it makes the problem rather complex since we lose an apparent start point for this. It would be good if we make geom comparators to have explicit tolerances as Paul said but maybe it is overdone. So I proposed the varialbe implicit tolerance. > > Yeah, the EPSILON is mere a fuzz factor so we cannot expect any > > kind of stable behavior for differences under the level. But on > > the analogy of comparisons of floating point numbers, I suppose > > that inequality comparison could be done without the tolerance. > > What do you mean exactly? Sorry for poor wording. I'll try in different way. It means comparison on numbers that contains certain amount of error gives unstable result for differences small enough comparing their precision. > >> >> - Some operators are violating commutative property. > >> >> > >> >> For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a". > >> > > >> > Whether EPSILON is introduced or not, commutativity cannot be > >> > assured for it from calculation error, I suppose. > >> > >> It can easily be assured by treating both sides of the operator the > >> same. It is actually assured on my patch. > > > > It surely holds for certain cases. Even how many applicable cases > > we guess, finally we cannot proof that it works generally. Just > > three times of 1/3 rotation breakes it. > > It is a different story. We cannot talk about commutative property of > rotation function that we currently don't have, because it would be an > asymmetrical operator. That's wrong. Any shpaes represented by geometric types assumed to get any geometric operatsions such as transision, rotation and others. It is fundamental for geometric types. > The parallel operator is currently marked as commutative. The planner > is free to switch the sides of the operation. Therefore this is not > only a surprise, but a bug. Strictly it is not commutative, but assuming FP error or EPSILON, commutation among them would be acceptable. Having larger tolerance, the defference from commutated expression becomes relatively small for the type's domain. As far as I know, even summation of two floating points is not guaranteed to yield strictly the same result for commutated opeation. But the difference doesn't affect for most cases and programmers make a program so that such differences don't matter in the objective. This is basically the same thing with the case of geo-types. > > Hmm, I have nothing more to say if you don't agree that floating > > point numbers involving any kind of arithmetic is hardly > > deterministic especially not defining its usage. > > The floating point results are not random. There are certain > guarantees. The transitive property of equality is one of them. Our > aim should be making things easier for our users by providing more > guarantees not breaking what is already there. Sorry for repeating the same thing but floating point numbers after getting arithmetic operations must considered that they have fuzziness or error (the same can occur for even just after assigning). They must not be handled as exact numbers. Please study about handling floating point numbers. If such strictness is required and no arithmetic involved, it is proper to store them, say, in a string form. If such a behavior is required but want to use floating points, maybe it is easier to create a extension conforming such a specification, rather than chainging the core behavior. > > The world of the geometric types in PostgreSQL *is* built > > so. There is nothing different with that Windows client can make > > different results from PostgreSQL on a Linux server for a simple > > floating point arithmetics, or even different binaries made by > > different version of compilers on the same platoform can. Relying > > on such coherency by accident is a bad policy. > > Yes, the results are not portable. We should only rely on the results > being stable on the same build. The epsilon doesn't cure this > problem. It arguably makes it worse. Yes, EPSILON doesn't improve such cross-platform consistency. It is totally a different issue. Such exact consistency should not be expected regardless of EPSILON. > > PostGIS or libgeos seems to prove it. They are designed exactly > > for this purpose and actually used. > > Yes, PostGIS is a GIS application. We are not. Geometric types name > suggests to me them being useful for general purpose. > > > So, the union of the two requirements seems to be having such > > parameter as a GUC. > > That sounds doable to me. We can use this opportunity to make all > operators consistent. So the epsilon would apply to the ones that it > current is not. We can still add btree and hash opclasses, and make > them give an error when this GUC is not 0. We can even make this or > another GUC apply to floats making whole system more consistent. Maybe. > Though, I know the community is against behaviour changing GUCs. I > will not spend more time on this, before I get positive feedback from > others. That's too bad. I'm sorry that I wans't very helpful.. > [1] https://www.postgresql.org/message-id/CACowWR0DBEjCfBscKKumdRLJUkObjB7D%3Diw7-0_ZwSFJM9_gpw%40mail.gmail.com regards, -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: