Re: Floating point comparison inconsistencies of the geometric types - Mailing list pgsql-hackers
| From | Emre Hasegeli |
|---|---|
| Subject | Re: Floating point comparison inconsistencies of the geometric types |
| Date | |
| Msg-id | CAE2gYzymeQXGGmhU1Vc35DpugwfRd-QRK3BM-6TGg0rwHcDN_w@mail.gmail.com Whole thread Raw |
| In response to | Re: Floating point comparison inconsistencies of the geometric types (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
| Responses |
Re: Floating point comparison inconsistencies of the
geometric types
|
| List | pgsql-hackers |
> We can remove the fuzz factor altogether but I think we also
> should provide a means usable to do similar things. At least "is
> a point on a line" might be useless for most cases without any
> fuzzing feature. (Nevertheless, it is a problem only when it is
> being used to do that:) If we don't find reasonable policy on
> fuzzing operations, it would be the proof that we shouldn't
> change the behavior.
It was my initial idea to keep the fuzzy comparison behaviour on some
places, but the more I get into I realised that it is almost
impossible to get this right. Instead, I re-implemented some
operators to keep precision as much as possible. The previous "is a
point on a line" operator would *never* give the true result without
the fuzzy comparison. The new implementation would return true, when
precision is not lost. I think this is a behaviour people, who are
working with floating points, are prepared to deal with. By the way,
"is a point on a line" operator is quite wrong with the fuzzy
comparison at the moment [1].
> The 0001 patch adds many FP comparison functions individually
> considering NaN. As the result the sort order logic involving NaN
> is scattered around into the functions, then, you implement
> generic comparison function using them. It seems inside-out to
> me. Defining ordering at one place, then comparison using it
> seems to be reasonable.
I agree that it would be simpler to use the comparison function for
implementing other operators. I have done it other way around to make
them more optimised. They are called very often. I don't think
checking exit code of the comparison function would be optimised the
same way. I could leave the comparison functions as they are, but
re-implemented them using the others to keep documentation of NaN
comparison in the single place.
> If the center somehow goes extremely near to the origin, it could
> result in a false error.
>
>> =# select @@ box'(-8e-324, -8e-324), (4.9e-324, 4.9e-324)';
>> ERROR: value out of range: underflow
>
> I don't think this underflow is an error, and actually it is a
> change of the current behavior without a reasonable reason. More
> significant (and maybe unacceptable) side-effect is that it
> changes the behavior of ordinary operators. I don't think this is
> acceptable. More consideration is needed.
>
>> =# select ('-8e-324'::float8 + '4.9e-324'::float8) / 2.0;
>> ERROR: value out of range: underflow
This is the current behaviour of float datatype. My patch doesn't
change that. This problem would probably also apply to multiplying
very small values. I agree that this is not the ideal behaviour.
Though I am not sure, if we should go to a different direction than
the float datatypes.
I think there is value in making geometric types compatible with the
float. Users are going to mix them, anyway. For example, users can
calculate the center of a box manually, and confuse when the built-in
operator behaves differently.
> In regard to fuzzy operations, libgeos seems to have several
> types of this kind of feature. (I haven't looked closer into
> them). Other than reducing precision seems overkill or
> unappliable for PostgreSQL bulitins. As Jim said, can we replace
> the fixed scale fuzz factor by precision reduction? Maybe, with a
> GUC variable (I hear someone's roaring..) to specify the amount
> defaults to fit the current assumption.
I am disinclined to try to implement something complicated for the
geometric types. I think they are mostly useful for 2 purposes: uses
simple enough to not worth looking for better solutions, and
demonstrating our indexing capabilities. The inconsistencies harm
both of those.
[1]
https://www.postgresql.org/message-id/flat/CAE2gYzw_-z%3DV2kh8QqFjenu%3D8MJXzOP44wRW%3DAzzeamrmTT1%3DQ%40mail.gmail.com
pgsql-hackers by date: