Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types
Date
Msg-id 20170126.211122.140249805.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types  (Emre Hasegeli <emre@hasegeli.com>)
Responses Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types
List pgsql-hackers
Hello,

At Thu, 26 Jan 2017 11:53:28 +0100, Emre Hasegeli <emre@hasegeli.com> wrote in
<CAE2gYzwYLcx-3ffToVm7JEEniYt1fU31y5BAikXzEqvCbQyTMg@mail.gmail.com>
> > Even though I'm not sure but I don't see a "natural" (or
> > agreeable by many poeple) ordering of geometric types in
> > general. Anyway it's quite application (not application program
> > but the relationship with the real world) specific.
> 
> We can just define it for point as "ORDER BY point.x, point.y".

It's nonsense. Totally for a convenient. Anyone can do so on
their own application but PostgreSQL cannot have it as a
platform.

> > What we should not forget is that PostGIS does the same thing and
> > it is widly used (I believe..). This means it not broken at least
> > on a certain context. But it is a fact that it also quite
> > inconvenient to get performance from, say, indexes.
> 
> I understand from Paul Ramsey's email [1] on this thread that PostGIS
> doesn't currently have a tolerance.

Thank you for the pointer. (My memory is too small as 8bit CPU)
Looking into the source of PostGIS 2.3.0 (Maybe the latest)
Surely EPSILON is used is someplaces. FPeq and similars are also
defined. liblwgeom_internal.h even defines another tolerance
EPSILON_SQLMM..

Paul> The real answer for a GIS system is to have an explicit tolerance
Paul> parameter for calculations like distance/touching/containment, but
Paul> unfortunately we didn't do that so now we have our own
Paul> compatibility/boil the ocean problem if we ever wanted/were funded to
Paul> add one.

This doesn't seem saying PostGIS doesn't have fixed-amount
tolerance. Alhtough we don't necessarily need compatibility with
PostGIS, it makes the problem rather complex since we lose an
apparent start point for this. It would be good if we make geom
comparators to have explicit tolerances as Paul said but maybe it
is overdone. So I proposed the varialbe implicit tolerance.

> > Yeah, the EPSILON is mere a fuzz factor so we cannot expect any
> > kind of stable behavior for differences under the level. But on
> > the analogy of comparisons of floating point numbers, I suppose
> > that inequality comparison could be done without the tolerance.
> 
> What do you mean exactly?

Sorry for poor wording. I'll try in different way. It means
comparison on numbers that contains certain amount of error gives
unstable result for differences small enough comparing their
precision.

> >> >> - Some operators are violating commutative property.
> >> >>
> >> >> For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a".
> >> >
> >> > Whether EPSILON is introduced or not, commutativity cannot be
> >> > assured for it from calculation error, I suppose.
> >>
> >> It can easily be assured by treating both sides of the operator the
> >> same.  It is actually assured on my patch.
> >
> > It surely holds for certain cases. Even how many applicable cases
> > we guess, finally we cannot proof that it works generally. Just
> > three times of 1/3 rotation breakes it.
> 
> It is a different story.  We cannot talk about commutative property of
> rotation function that we currently don't have, because it would be an
> asymmetrical operator.

That's wrong. Any shpaes represented by geometric types assumed
to get any geometric operatsions such as transision, rotation and
others. It is fundamental for geometric types.

> The parallel operator is currently marked as commutative.  The planner
> is free to switch the sides of the operation.  Therefore this is not
> only a surprise, but a bug.

Strictly it is not commutative, but assuming FP error or EPSILON,
commutation among them would be acceptable. Having larger
tolerance, the defference from commutated expression becomes
relatively small for the type's domain. As far as I know, even
summation of two floating points is not guaranteed to yield
strictly the same result for commutated opeation. But the
difference doesn't affect for most cases and programmers make a
program so that such differences don't matter in the
objective. This is basically the same thing with the case of
geo-types.

> > Hmm, I have nothing more to say if you don't agree that floating
> > point numbers involving any kind of arithmetic is hardly
> > deterministic especially not defining its usage.
> 
> The floating point results are not random.  There are certain
> guarantees.  The transitive property of equality is one of them.  Our
> aim should be making things easier for our users by providing more
> guarantees not breaking what is already there.

Sorry for repeating the same thing but floating point numbers
after getting arithmetic operations must considered that they
have fuzziness or error (the same can occur for even just after
assigning). They must not be handled as exact numbers. Please
study about handling floating point numbers. If such strictness
is required and no arithmetic involved, it is proper to store
them, say, in a string form.

If such a behavior is required but want to use floating points,
maybe it is easier to create a extension conforming such a
specification, rather than chainging the core behavior.

> > The world of the geometric types in PostgreSQL *is* built
> > so. There is nothing different with that Windows client can make
> > different results from PostgreSQL on a Linux server for a simple
> > floating point arithmetics, or even different binaries made by
> > different version of compilers on the same platoform can. Relying
> > on such coherency by accident is a bad policy.
> 
> Yes, the results are not portable.  We should only rely on the results
> being stable on the same build.  The epsilon doesn't cure this
> problem.  It arguably makes it worse.

Yes, EPSILON doesn't improve such cross-platform consistency. It
is totally a different issue. Such exact consistency should not
be expected regardless of EPSILON.

> > PostGIS or libgeos seems to prove it. They are designed exactly
> > for this purpose and actually used.
> 
> Yes, PostGIS is a GIS application.  We are not.  Geometric types name
> suggests to me them being useful for general purpose.
> 
> > So, the union of the two requirements seems to be having such
> > parameter as a GUC.
> 
> That sounds doable to me.  We can use this opportunity to make all
> operators consistent.  So the epsilon would apply to the ones that it
> current is not.  We can still add btree and hash opclasses, and make
> them give an error when this GUC is not 0.  We can even make this or
> another GUC apply to floats making whole system more consistent.

Maybe.

> Though, I know the community is against behaviour changing GUCs.  I
> will not spend more time on this, before I get positive feedback from
> others.

That's too bad. I'm sorry that I wans't very helpful..


> [1] https://www.postgresql.org/message-id/CACowWR0DBEjCfBscKKumdRLJUkObjB7D%3Diw7-0_ZwSFJM9_gpw%40mail.gmail.com

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: [HACKERS] Performance improvement for joins where outer side is unique
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: [HACKERS] Radix tree for character conversion