Re: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT" - Mailing list pgsql-bugs
From | Casey Witt |
---|---|
Subject | Re: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT" |
Date | |
Msg-id | 003501d2898a$0aa73c70$1ff5b550$@gmail.com Whole thread Raw |
In response to | Re: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT" (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-bugs |
Tom, Thanks for the explanation. It looks like there are three separate issues here for the point type: 1) how to know if two points are equal (really equal, not fuzzy) 2) how to know if two points are "close enough" (fuzzy equal, which I assume must be very useful for GIS stuff but haven't used myself yet) 3) how to order the points in a btree (ie. which is bigger (0,1) or (1,0)) Does the following make sense: 1) implement an equality operator for a point (x1=x2 AND y1=y2). This would allow "IS [NOT] DISTINCT" to work out of the box, and most importantly (from my perspective at least) would allow "UPDATE table SET (p1 = @p1) WHERE p1 IS NOT DISTINCT FROM @p1_original;" to just work. 2) points still have the "~=" operator, so no impact there 3) couldn't a general ordering operator be defined as: If x1 < x2 then point1 < point 2 If x1 > x2 then point 1 > point 2 If x1 = x2 then If y1 < y2 then point1 < point 2 If y1 > y2 then point1 > point 2 If y1 = y2 then point1 = point 2 For my use case, the reason it is so important to get the "IS [NOT] DISTINCT" working for points is because my update queries are generated automatically based on a list of "dirty fields" in my application front-end. The application front-end creates a SQL update statement saying "update all these values which have changed AS LONG AS they haven't been updated by somebody else in the meantime". The point field is just one of many fields (and data types) subject to change. So without having "=" defined for point I only have two [very ugly] options: 1) store the point as two separate numeric columns (but I want to ensure that I either have a complete point or no point, so then I would have to also add check constraints), and when there are lots of points in the table this just "feels icky" 2) update my SQL generator to make a special exception for points (which is what I am doing now), but again, this just "feels icky". Regarding the ordering operator, somebody could say that for their case the y value should be compared before the x value, but does it really matter for the purposes of implementing a general comparison btree operator? Isn't the only thing that is truly required is that it is deterministic and guaranteed to produce the same ordering every time, with minimizing computation being a distant second place consideration? I say that minimizing computation should be a "distant" second place to imply that it is better to have an inefficient implementation of something than no implementation at all. I know that this discussion of "what is equals" comes up a lot, and in a lot of different programming languages, and the answer is often "we leave it undefined because we don't know the exact use case". I have never embraced this approach, and much prefer the "implement an equality operator that is rational for the simplest general case". As an exaggerated example, one could say, we can't implement an equality operator for "numeric" because somebody may be storing a "length" in that field and without knowing whether each value is in feet or meters it is not possible to compare two of these values. But in practice, nobody gets too excited about this because for most use cases the numeric type works as expected and when there is a special use case such as storing length we are accustomed to the design pattern of storing the length value in one column, the units of measure in another column, write a function to convert the length to a common base, and then use that function to create an index. Extending this general idea beyond points then, two "boxes" could be considered equal if all four points are equal, and could be ordered by the length of the perimeter (I proposed perimeter because I assume it would be the cheapest to calculate, but it could be area or anything else as long as it is the cheapest to calculate and deterministic). Anybody needing more than that (ie. geometric transforms) would still be able to write functions to do their transforms which could be used in the index. The key point here is that if the extra information placing doubt on the equality is not actually embedded in the data type, then it should not be considered when determining "built-in" data type equality and ordering operations. If there is currently no "=" defined for points, then no existing code base uses the "=" operator for points, so how could defining this operator break backwards compatibility? I know that this is a bit of a simplistic view of the situation, and I appreciate your time to help me understand the real complexities of this issue. Thanks, Casey -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Thursday, February 16, 2017 02:29 To: kcwitt@gmail.com Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT" kcwitt@gmail.com writes: > Whatever issue there is with comparing points should be dealt with in > postgres and not pushed to client applications. The problem with this is that it's not entirely clear what definition of "equality" IS [NOT] DISTINCT FROM should use. The current definition is "use the operator named =", which is rather ad hoc, but you can point to chapter and verse in the SQL standard where X IS [NOT] DISTINCT FROM Y is defined in terms of "X = Y", so it's not completely nuts either. Personally I'd prefer it to be defined as "use the equality operator of the data type's default btree opclass, or hash opclass if no btree opclass"; but it's likely that that would break some cross-type cases that work today, so it wouldn't be a panacea. But in either case, "point" loses because it has neither an operator named "=" nor a btree or hash opclass. This is not just an oversight. There is a "point ~= point" operator but it implements fuzzy equality, making it unsuitable as a basis for btree or hash behavior, even assuming that you could invent the linear ordering of points that would also be needed for a btree opclass. It would surely not be terribly hard to invent an exact-equality "=" operator for points, but it's not clear how useful that would be. Almost all the existing operators for points are fuzzy. The same problems exist for other geometric types, usually worse, because many of the others do have "=" operators but they compare areas :-(. Nobody's really wanted to break backwards compatibility enough to bring some sanity to that mess. Short answer is I don't think this is likely to change in the near future. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
pgsql-bugs by date: