Re: Floating point comparison inconsistencies of the geometric types - Mailing list pgsql-hackers

From Emre Hasegeli
Subject Re: Floating point comparison inconsistencies of the geometric types
Date
Msg-id CAE2gYzymeQXGGmhU1Vc35DpugwfRd-QRK3BM-6TGg0rwHcDN_w@mail.gmail.com
Whole thread Raw
In response to Re: Floating point comparison inconsistencies of the geometric types  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: Floating point comparison inconsistencies of the geometric types
List pgsql-hackers
> We can remove the fuzz factor altogether but I think we also
> should provide a means usable to do similar things. At least "is
> a point on a line" might be useless for most cases without any
> fuzzing feature. (Nevertheless, it is a problem only when it is
> being used to do that:) If we don't find reasonable policy on
> fuzzing operations, it would be the proof that we shouldn't
> change the behavior.

It was my initial idea to keep the fuzzy comparison behaviour on some
places, but the more I get into I realised that it is almost
impossible to get this right.  Instead, I re-implemented some
operators to keep precision as much as possible.  The previous "is a
point on a line" operator would *never* give the true result without
the fuzzy comparison.  The new implementation would return true, when
precision is not lost.  I think this is a behaviour people, who are
working with floating points, are prepared to deal with.  By the way,
"is a point on a line" operator is quite wrong with the fuzzy
comparison at the moment [1].

> The 0001 patch adds many FP comparison functions individually
> considering NaN. As the result the sort order logic involving NaN
> is scattered around into the functions, then, you implement
> generic comparison function using them. It seems inside-out to
> me. Defining ordering at one place, then comparison using it
> seems to be reasonable.

I agree that it would be simpler to use the comparison function for
implementing other operators.  I have done it other way around to make
them more optimised.  They are called very often.  I don't think
checking exit code of the comparison function would be optimised the
same way.  I could leave the comparison functions as they are, but
re-implemented them using the others to keep documentation of NaN
comparison in the single place.

> If the center somehow goes extremely near to the origin, it could
> result in a false error.
>
>> =# select @@ box'(-8e-324, -8e-324), (4.9e-324, 4.9e-324)';
>> ERROR:  value out of range: underflow
>
> I don't think this underflow is an error, and actually it is a
> change of the current behavior without a reasonable reason. More
> significant (and maybe unacceptable) side-effect is that it
> changes the behavior of ordinary operators. I don't think this is
> acceptable. More consideration is needed.
>
>> =# select ('-8e-324'::float8 + '4.9e-324'::float8) / 2.0;
>> ERROR:  value out of range: underflow

This is the current behaviour of float datatype.  My patch doesn't
change that.  This problem would probably also apply to multiplying
very small values.  I agree that this is not the ideal behaviour.
Though I am not sure, if we should go to a different direction than
the float datatypes.

I think there is value in making geometric types compatible with the
float.  Users are going to mix them, anyway.  For example, users can
calculate the center of a box manually, and confuse when the built-in
operator behaves differently.

> In regard to fuzzy operations, libgeos seems to have several
> types of this kind of feature. (I haven't looked closer into
> them). Other than reducing precision seems overkill or
> unappliable for PostgreSQL bulitins. As Jim said, can we replace
> the fixed scale fuzz factor by precision reduction? Maybe, with a
> GUC variable (I hear someone's roaring..) to specify the amount
> defaults to fit the current assumption.

I am disinclined to try to implement something complicated for the
geometric types.  I think they are mostly useful for 2 purposes: uses
simple enough to not worth looking for better solutions, and
demonstrating our indexing capabilities.  The inconsistencies harm
both of those.

[1]
https://www.postgresql.org/message-id/flat/CAE2gYzw_-z%3DV2kh8QqFjenu%3D8MJXzOP44wRW%3DAzzeamrmTT1%3DQ%40mail.gmail.com



pgsql-hackers by date:

Previous
From: Rahila Syed
Date:
Subject: Re: Improvements in psql hooks for variables
Next
From: 余森彬
Date:
Subject: Re: Why PostgreSQL doesn't implement a semi sync replication?