Thread: point types in "DISTINCT" queries

point types in "DISTINCT" queries

From
"Jonathan S. Katz"
Date:
Hi,

I am running PostgreSQL 9.0.4 and I am getting an error with a SELECT DISTINCT query that contains a point type in the SELECT clause.  To be more specific, a query such as:

-- explicit declaration that it's a point type
SELECT DISTINCT a.geocode::point 
FROM a
WHERE a.region = 'x';

Will return the error:

ERROR: could not identify an equality operator for type point

I read the notes about how point types do not have "=" defined for them, but "~=" aka the "same as" operator (http://www.postgresql.org/docs/9.0/static/functions-geometry.html).  For points, I would treat ~= as equality.  I tried creating my own equality operator based on that:

CREATE OR REPLACE FUNCTION point_equality(point, point) RETURNS bool
   AS 'SELECT $1 ~= $2;'
   LANGUAGE SQL;

CREATE OPERATOR = (
   LEFTARG = point,
   RIGHTARG = point,
   PROCEDURE = point_equality,
   COMMUTATOR = =
);

And when I ran the query again:

ERROR: could not identify an equality operator for type point

I looked into the mailing list archives and found a potential answer on this thread: http://archives.postgresql.org/pgsql-general/2009-10/msg01122.php However I wanted to see if it was still necessary that I would need the complete btree operator class to run such a query.  Are there plans to have a defined "=" operator on the point type?  I can understand how the other geometric types, "=" would represent area, but AFAIK I think "=" could be safely applied on a point type (and i realize I could submit a patch for that :-) maybe depending on the resolution to this / refreshing my C...).

Is there possibly a relatively quick solution to this issue?

Thanks!

Jonathan

Re: point types in "DISTINCT" queries

From
Jeff Davis
Date:
On Tue, 2011-06-28 at 18:56 -0400, Jonathan S. Katz wrote:

> I looked into the mailing list archives and found a potential answer
> on this thread:
> http://archives.postgresql.org/pgsql-general/2009-10/msg01122.php
> However I wanted to see if it was still necessary that I would need
> the complete btree operator class to run such a query.

Yes, the default btree operator class is used to find the equality
operator. Even though you have defined the operator "=", postgresql
doesn't rely on that meaning "equals" -- the btree operator class is
what imparts that meaning.

> Are there plans to have a defined "=" operator on the point type?  I
> can understand how the other geometric types, "=" would represent
> area, but AFAIK I think "=" could be safely applied on a point type
> (and i realize I could submit a patch for that :-) maybe depending on
> the resolution to this / refreshing my C...).

The built-in geometric types haven't received a lot of attention lately.
Most people who use geometric data use the PostGIS extension, which is a
sophisticated extension that can deal with that kind of data. You might
want to check that out and see if it meets your needs.

Perhaps someone is interested in bringing the built-in geometric types
up to speed; but I think most of the interest is moving things like this
out to extensions where they can be more easily be maintained by
interested parties.

If you'd like to submit a patch, I suggest first asking on -hackers
whether improvements to the built-in spatial types would be accepted.

Regards,
    Jeff Davis



Re: point types in "DISTINCT" queries

From
Magnus Hagander
Date:
On Wed, Jun 29, 2011 at 06:53, Jeff Davis <pgsql@j-davis.com> wrote:
> On Tue, 2011-06-28 at 18:56 -0400, Jonathan S. Katz wrote:
>
>> I looked into the mailing list archives and found a potential answer
>> on this thread:
>> http://archives.postgresql.org/pgsql-general/2009-10/msg01122.php
>> However I wanted to see if it was still necessary that I would need
>> the complete btree operator class to run such a query.
>
> Yes, the default btree operator class is used to find the equality
> operator. Even though you have defined the operator "=", postgresql
> doesn't rely on that meaning "equals" -- the btree operator class is
> what imparts that meaning.
>
>> Are there plans to have a defined "=" operator on the point type?  I
>> can understand how the other geometric types, "=" would represent
>> area, but AFAIK I think "=" could be safely applied on a point type
>> (and i realize I could submit a patch for that :-) maybe depending on
>> the resolution to this / refreshing my C...).
>
> The built-in geometric types haven't received a lot of attention lately.
> Most people who use geometric data use the PostGIS extension, which is a
> sophisticated extension that can deal with that kind of data. You might
> want to check that out and see if it meets your needs.
>
> Perhaps someone is interested in bringing the built-in geometric types
> up to speed; but I think most of the interest is moving things like this
> out to extensions where they can be more easily be maintained by
> interested parties.

Given that they are the only ones supporting knn-gist, I would expect
them to actually become *more* popular with 9.1 - at least until such
time as postgis adds support for it...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: point types in "DISTINCT" queries

From
Magnus Hagander
Date:
On Wed, Jun 29, 2011 at 16:38, Jonathan S. Katz
<jonathan.katz@excoventures.com> wrote:
> On Jun 29, 2011, at 10:25 AM, Magnus Hagander <magnus@hagander.net> wrote:
>
>> On Wed, Jun 29, 2011 at 06:53, Jeff Davis <pgsql@j-davis.com> wrote:
>>>
>>> On Tue, 2011-06-28 at 18:56 -0400, Jonathan S. Katz wrote:
>>>
>>>> I looked into the mailing list archives and found a potential answer
>>>> on this thread:
>>>> http://archives.postgresql.org/pgsql-general/2009-10/msg01122.php
>>>> However I wanted to see if it was still necessary that I would need
>>>> the complete btree operator class to run such a query.
>>>
>>> Yes, the default btree operator class is used to find the equality
>>> operator. Even though you have defined the operator "=", postgresql
>>> doesn't rely on that meaning "equals" -- the btree operator class is
>>> what imparts that meaning.
>>>
>>>> Are there plans to have a defined "=" operator on the point type?  I
>>>> can understand how the other geometric types, "=" would represent
>>>> area, but AFAIK I think "=" could be safely applied on a point type
>>>> (and i realize I could submit a patch for that :-) maybe depending on
>>>> the resolution to this / refreshing my C...).
>>>
>>> The built-in geometric types haven't received a lot of attention lately.
>>> Most people who use geometric data use the PostGIS extension, which is a
>>> sophisticated extension that can deal with that kind of data. You might
>>> want to check that out and see if it meets your needs.
>>>
>>> Perhaps someone is interested in bringing the built-in geometric types
>>> up to speed; but I think most of the interest is moving things like this
>>> out to extensions where they can be more easily be maintained by
>>> interested parties.
>>
>> Given that they are the only ones supporting knn-gist, I would expect
>> them to actually become *more* popular with 9.1 - at least until such
>> time as postgis adds support for it...
>
> In fact that is my use-case - I will be performing nearest-neighbor lookups
> (and will be running 9.1b2 on this data set shortly).  However, because most
> of the geospatial work is relatively straightforward, I didn't want to use
> PostGIS for this application.  But that might change in the near future
> depending on the requirements.
>
> But for now tasks like ensuing uniqueness amongst points are slightly more
> difficult.   My current solution is breaking out the (x,y) coords into
> different columns

Have you tried using an exclusion constraint? Not entirely sure, but I
think that might work.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: point types in "DISTINCT" queries

From
"Jonathan S. Katz"
Date:
On Jun 29, 2011, at 10:25 AM, Magnus Hagander <magnus@hagander.net>
wrote:

> On Wed, Jun 29, 2011 at 06:53, Jeff Davis <pgsql@j-davis.com> wrote:
>> On Tue, 2011-06-28 at 18:56 -0400, Jonathan S. Katz wrote:
>>
>>> I looked into the mailing list archives and found a potential answer
>>> on this thread:
>>> http://archives.postgresql.org/pgsql-general/2009-10/msg01122.php
>>> However I wanted to see if it was still necessary that I would need
>>> the complete btree operator class to run such a query.
>>
>> Yes, the default btree operator class is used to find the equality
>> operator. Even though you have defined the operator "=", postgresql
>> doesn't rely on that meaning "equals" -- the btree operator class is
>> what imparts that meaning.
>>
>>> Are there plans to have a defined "=" operator on the point type?  I
>>> can understand how the other geometric types, "=" would represent
>>> area, but AFAIK I think "=" could be safely applied on a point type
>>> (and i realize I could submit a patch for that :-) maybe depending
>>> on
>>> the resolution to this / refreshing my C...).
>>
>> The built-in geometric types haven't received a lot of attention
>> lately.
>> Most people who use geometric data use the PostGIS extension, which
>> is a
>> sophisticated extension that can deal with that kind of data. You
>> might
>> want to check that out and see if it meets your needs.
>>
>> Perhaps someone is interested in bringing the built-in geometric
>> types
>> up to speed; but I think most of the interest is moving things like
>> this
>> out to extensions where they can be more easily be maintained by
>> interested parties.
>
> Given that they are the only ones supporting knn-gist, I would expect
> them to actually become *more* popular with 9.1 - at least until such
> time as postgis adds support for it...

In fact that is my use-case - I will be performing nearest-neighbor
lookups (and will be running 9.1b2 on this data set shortly).
However, because most of the geospatial work is relatively
straightforward, I didn't want to use PostGIS for this application.
But that might change in the near future depending on the requirements.

But for now tasks like ensuing uniqueness amongst points are slightly
more difficult.   My current solution is breaking out the (x,y) coords
into different columns

Jonathan

Re: point types in "DISTINCT" queries

From
"Jonathan S. Katz"
Date:
On Jun 29, 2011, at 10:42 AM, Magnus Hagander wrote:

> On Wed, Jun 29, 2011 at 16:38, Jonathan S. Katz
> <jonathan.katz@excoventures.com> wrote:
>> In fact that is my use-case - I will be performing nearest-neighbor lookups
>> (and will be running 9.1b2 on this data set shortly).  However, because most
>> of the geospatial work is relatively straightforward, I didn't want to use
>> PostGIS for this application.  But that might change in the near future
>> depending on the requirements.
>>
>> But for now tasks like ensuing uniqueness amongst points are slightly more
>> difficult.   My current solution is breaking out the (x,y) coords into
>> different columns
>
> Have you tried using an exclusion constraint? Not entirely sure, but I
> think that might work.

Did a quick experiment:

Using =~

    ALTER TABLE a ADD EXCLUDE USING gist (geocode WITH ~=);

Results:

    ERROR:  could not create exclusion constraint "a_geocode_excl"
    DETAIL:  Key (geocode)=((33.8367126,-117.9164627)) conflicts with key (geocode)=((33.8367128,-117.9164627)).

Which means it *should* work, but first I would need to clean up the data and find the duplicates.  I was hoping this
mightwork: 

    SELECT geocode, count(*)
    FROM a
    GROUP BY a.geocode
    HAVING count(*) > 1;

But:

    ERROR:  could not identify an equality operator for type point

So I would have to just find the points one-by-one until the exclusion constraint passes.

Now, using the custom = operator:

    ALTER TABLE app_address ADD EXCLUDE USING gist (geocode WITH =);

Results:

    ERROR:  operator =(point,point) is not a member of operator family "point_ops"
    DETAIL:  The exclusion operator must be related to the index operator class for the constraint.

Jonathan


Re: point types in "DISTINCT" queries

From
Jeff Davis
Date:
On Wed, 2011-06-29 at 11:37 -0400, Jonathan S. Katz wrote:
> Which means it *should* work, but first I would need to clean up the data and find the duplicates.  I was hoping this
mightwork: 
>
>     SELECT geocode, count(*)
>     FROM a
>     GROUP BY a.geocode
>     HAVING count(*) > 1;

Maybe you could use a self-join as a workaround for now, just to clean
up the data?

SELECT geocode, other_columns from a a1, a a2 where a1.other_columns <>
a2.other_columns and a1.geocode ~= a2.geocode;

Regards,
    Jeff Davis


Re: point types in "DISTINCT" queries

From
"Jonathan S. Katz"
Date:
> On Wed, 2011-06-29 at 11:37 -0400, Jonathan S. Katz wrote:
>> Which means it *should* work, but first I would need to clean up the data and find the duplicates.  I was hoping
thismight work: 
>>
>>     SELECT geocode, count(*)
>>     FROM a
>>     GROUP BY a.geocode
>>     HAVING count(*) > 1;
>
> Maybe you could use a self-join as a workaround for now, just to clean
> up the data?
>
> SELECT geocode, other_columns from a a1, a a2 where a1.other_columns <>
> a2.other_columns and a1.geocode ~= a2.geocode;

That worked perfectly - turned out it was just two rows.  And subsequently executing the exclusion constraint on "=~"
alsoworked perfectly as expected. 

The larger issue I face with now is slightly out of my control without further hacking.  I'm developing an app with
Djangoand I wrote an extension that allows me to use the point type natively in Python.  I ran into the original issue
whilean automatically generated query was executed in the admin section.  I know this could be viewed as something
pertainingto Django, but the goal I had in mind was making PostgreSQL functionality more accessible in a different
softwarelayer. 

I will find a workaround for the above, as I am sure I can do some application-level hacking.

Thanks for your help!

Jonathan