Re: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT" - Mailing list pgsql-bugs

From Casey Witt
Subject Re: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT"
Date
Msg-id 003501d2898a$0aa73c70$1ff5b550$@gmail.com
Whole thread Raw
In response to Re: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT"  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Tom,

Thanks for the explanation.

It looks like there are three separate issues here for the point type:
1) how to know if two points are equal (really equal, not fuzzy)
2) how to know if two points are "close enough" (fuzzy equal, which I assume
must be very useful for GIS stuff but haven't used myself yet)
3) how to order the points in a btree (ie. which is bigger (0,1) or (1,0))

Does the following make sense:
1) implement an equality operator for a point (x1=x2 AND y1=y2). This would
allow "IS [NOT] DISTINCT" to work out of the box, and most importantly (from
my perspective at least) would allow "UPDATE table SET (p1 = @p1) WHERE p1
IS NOT DISTINCT FROM @p1_original;" to just work.
2) points still have the "~=" operator, so no impact there
3) couldn't a general ordering operator be defined as:
    If x1 < x2 then point1 < point 2
    If x1 > x2 then point 1 > point 2
    If x1 = x2 then
        If y1 < y2 then point1 < point 2
        If y1 > y2 then point1 > point 2
        If y1 = y2 then point1 = point 2

For my use case, the reason it is so important to get the "IS [NOT]
DISTINCT" working for points is because my update queries are generated
automatically based on a list of "dirty fields" in my application front-end.
The application front-end creates a SQL update statement saying "update all
these values which have changed AS LONG AS they haven't been updated by
somebody else in the meantime". The point field is just one of many fields
(and data types) subject to change. So without having "=" defined for point
I only have two [very ugly] options:
1) store the point as two separate numeric columns (but I want to ensure
that I either have a complete point or no point, so then I would have to
also add check constraints), and when there are lots of points in the table
this just "feels icky"
2) update my SQL generator to make a special exception for points (which is
what I am doing now), but again, this just "feels icky".

Regarding the ordering operator, somebody could say that for their case the
y value should be compared before the x value, but does it really matter for
the purposes of implementing a general comparison btree operator? Isn't the
only thing that is truly required is that it is deterministic and guaranteed
to produce the same ordering every time, with minimizing computation being a
distant second place consideration? I say that minimizing computation should
be a "distant" second place to imply that it is better to have an
inefficient implementation of something than no implementation at all.

I know that this discussion of "what is equals" comes up a lot, and in a lot
of different programming languages, and the answer is often "we leave it
undefined because we don't know the exact use case". I have never embraced
this approach, and much prefer the "implement an equality operator that is
rational for the simplest general case".

As an exaggerated example, one could say, we can't implement an equality
operator for "numeric" because somebody may be storing a "length" in that
field and without knowing whether each value is in feet or meters it is not
possible to compare two of these values. But in practice, nobody gets too
excited about this because for most use cases the numeric type works as
expected and when there is a special use case such as storing length we are
accustomed to the design pattern of storing the length value in one column,
the units of measure in another column, write a function to convert the
length to a common base, and then use that function to create an index.

Extending this general idea beyond points then, two "boxes" could be
considered equal if all four points are equal, and could be ordered by the
length of the perimeter (I proposed perimeter because I assume it would be
the cheapest to calculate, but it could be area or anything else as long as
it is the cheapest to calculate and deterministic). Anybody needing more
than that (ie. geometric transforms) would still be able to write functions
to do their transforms which could be used in the index. The key point here
is that if the extra information placing doubt on the equality is not
actually embedded in the data type, then it should not be considered when
determining "built-in" data type equality and ordering operations.

If there is currently no "=" defined for points, then no existing code base
uses the "=" operator for points, so how could defining this operator break
backwards compatibility?

I know that this is a bit of a simplistic view of the situation, and I
appreciate your time to help me understand the real complexities of this
issue.

Thanks,
Casey

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us] 
Sent: Thursday, February 16, 2017 02:29
To: kcwitt@gmail.com
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #14546: "point" type does not work with "IS
DISTINCT"

kcwitt@gmail.com writes:
> Whatever issue there is with comparing points should be dealt with in 
> postgres and not pushed to client applications.

The problem with this is that it's not entirely clear what definition of
"equality" IS [NOT] DISTINCT FROM should use.

The current definition is "use the operator named =", which is rather ad
hoc, but you can point to chapter and verse in the SQL standard where X IS
[NOT] DISTINCT FROM Y is defined in terms of "X = Y", so it's not completely
nuts either.  Personally I'd prefer it to be defined as "use the equality
operator of the data type's default btree opclass, or hash opclass if no
btree opclass"; but it's likely that that would break some cross-type cases
that work today, so it wouldn't be a panacea.

But in either case, "point" loses because it has neither an operator named
"=" nor a btree or hash opclass.  This is not just an oversight.
There is a "point ~= point" operator but it implements fuzzy equality,
making it unsuitable as a basis for btree or hash behavior, even assuming
that you could invent the linear ordering of points that would also be
needed for a btree opclass.  It would surely not be terribly hard to invent
an exact-equality "=" operator for points, but it's not clear how useful
that would be.  Almost all the existing operators for points are fuzzy.

The same problems exist for other geometric types, usually worse, because
many of the others do have "=" operators but they compare areas :-(.
Nobody's really wanted to break backwards compatibility enough to bring some
sanity to that mess.

Short answer is I don't think this is likely to change in the near future.

            regards, tom lane



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: [BUGS] BUG #14552: tsquery converts AND operator into OR when nested inside OR operations
Next
From: "Casey Witt"
Date:
Subject: Re: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT"