Thread: Re: [pgsql-advocacy] Function which gives back the nearest neighbours

Re: [pgsql-advocacy] Function which gives back the nearest neighbours

From
Christopher Kings-Lynne
Date:
> I'm looking for an existing function which allows me to search the nearest
> neighbours of the requested value.

Well you could try something like:

SELECT * FROM table ORDER BY ABS(val - 2) LIMIT 1;

That doesn't get you all the way there, but it's an idea...

Chris

Re: [pgsql-advocacy] Function which gives back the nearest neighbours

From
Bruno Wolff III
Date:
On Sun, Mar 27, 2005 at 13:24:34 +0800,
  Christopher Kings-Lynne <chriskl@familyhealth.com.au> wrote:
> >I'm looking for an existing function which allows me to search the nearest
> >neighbours of the requested value.
>
> Well you could try something like:
>
> SELECT * FROM table ORDER BY ABS(val - 2) LIMIT 1;
>
> That doesn't get you all the way there, but it's an idea...

For multidimensional objects you can do the same thing with a distance
metric function. It will be relatively slow since this won't be indexable
and will require a sort of all of the values. If you have some bound on
how far apart points can be, then you might be able to limit the set
of candidate points using an indexable search.

Re: [pgsql-advocacy] Function which gives back the nearest neighbours

From
Tom Lane
Date:
Bruno Wolff III <bruno@wolff.to> writes:
> On Sun, Mar 27, 2005 at 13:24:34 +0800,
>   Christopher Kings-Lynne <chriskl@familyhealth.com.au> wrote:
>>> I'm looking for an existing function which allows me to search the nearest
>>> neighbours of the requested value.
>>
>> Well you could try something like:
>>
>> SELECT * FROM table ORDER BY ABS(val - 2) LIMIT 1;
>>
>> That doesn't get you all the way there, but it's an idea...

> For multidimensional objects you can do the same thing with a distance
> metric function. It will be relatively slow since this won't be indexable
> and will require a sort of all of the values. If you have some bound on
> how far apart points can be, then you might be able to limit the set
> of candidate points using an indexable search.

I'd probably go with looking for the nearest "above" neighbor and
nearest "below" neighbor separately, eg

    select * from tab where val > 'target' order by val limit 1;
    select * from tab where val < 'target' order by val desc limit 1;

If there's an index on val, this should work really well.  Of course, if
"nearest" is being defined in multidimensional terms as Bruno is
imagining, it doesn't work at all...

BTW, why is this thread cross-posted to so many lists?  It seems
off-topic for at least two of 'em.

            regards, tom lane

Re: [pgsql-advocacy] Function which gives back the nearest neighbours

From
Bruno Wolff III
Date:
On Mon, Mar 28, 2005 at 17:52:21 +0200,
  Virgile Beddok <virgile.beddok@igd.fraunhofer.de> wrote:
> Thanks for the help.
> I'll try this for the one-dimensional search.
> For the muti-dimensional one, which tools of postgresql could I use for
> this metric function, or this indexable search, which Bruno mentioned.
> Do they already exist?
> What about using a tree for that? Is there one which could fit to such a
> "nearest neighbour search", or do I have to implement it myself...

You could look at boxes or cubes. You haven't said enough about your
multidimensional problem to give specific answers.

Re: [pgsql-advocacy] Function which gives back the nearest neighbours

From
Bruno Wolff III
Date:
On Mon, Mar 28, 2005 at 18:32:09 +0200,
  Virgile Beddok <virgile.beddok@igd.fraunhofer.de> wrote:
>
> I just want to know if there is an existing and implemented function, or
> tree, in Postgres which allows me to directly perform a "nearest neighbour
> search" on multidimensional vectors.

How are you measuring distance? (i.e. Euclidean distance isn't the only
metric that you could be using.)

Earthdistance has functions for doing this using geodesics on the surface
of a sphere.

For 2D, I think there is a function on "point"s that uses Euclidean
distance.

I don't think this is the hard part of your problem. It is easy to create
a metric function and get the point corresponding to the minimal value
using a sort. Being able to limit the sets of points looked at is the
hard part.

Re: [pgsql-advocacy] Function which gives back the nearest

From
Joe Conway
Date:
Bruno Wolff III wrote:
> On Mon, Mar 28, 2005 at 18:32:09 +0200,
>   Virgile Beddok <virgile.beddok@igd.fraunhofer.de> wrote:
>
>>I just want to know if there is an existing and implemented function, or
>>tree, in Postgres which allows me to directly perform a "nearest neighbour
>>search" on multidimensional vectors.
>
>
> How are you measuring distance? (i.e. Euclidean distance isn't the only
> metric that you could be using.)
>
> Earthdistance has functions for doing this using geodesics on the surface
> of a sphere.
>
> For 2D, I think there is a function on "point"s that uses Euclidean
> distance.
>
> I don't think this is the hard part of your problem. It is easy to create
> a metric function and get the point corresponding to the minimal value
> using a sort. Being able to limit the sets of points looked at is the
> hard part.

You might want to look into using R:
http://www.bioconductor.org/CRAN/
(see "Packages")

perhaps through PL/R:
http://www.joeconway.com/plr/

or RdbiPgSQL:
http://www.bioconductor.org/repository/release1.5/package/html/RdbiPgSQL.html

Joe

Re: [pgsql-advocacy] Function which gives back the

From
"Virgile Beddok"
Date:
> Bruno Wolff III <bruno@wolff.to> writes:
>> On Sun, Mar 27, 2005 at 13:24:34 +0800,
>>   Christopher Kings-Lynne <chriskl@familyhealth.com.au> wrote:
>>>> I'm looking for an existing function which allows me to search the
>>>> nearest
>>>> neighbours of the requested value.
>>>
>>> Well you could try something like:
>>>
>>> SELECT * FROM table ORDER BY ABS(val - 2) LIMIT 1;
>>>
>>> That doesn't get you all the way there, but it's an idea...
>
>> For multidimensional objects you can do the same thing with a distance
>> metric function. It will be relatively slow since this won't be
>> indexable
>> and will require a sort of all of the values. If you have some bound on
>> how far apart points can be, then you might be able to limit the set
>> of candidate points using an indexable search.
>
> I'd probably go with looking for the nearest "above" neighbor and
> nearest "below" neighbor separately, eg
>
>     select * from tab where val > 'target' order by val limit 1;
>     select * from tab where val < 'target' order by val desc limit 1;
>
> If there's an index on val, this should work really well.  Of course, if
> "nearest" is being defined in multidimensional terms as Bruno is
> imagining, it doesn't work at all...
>
>             regards, tom lane

Thanks for the help.
I'll try this for the one-dimensional search.
For the muti-dimensional one, which tools of postgresql could I use for
this metric function, or this indexable search, which Bruno mentioned.
Do they already exist?
What about using a tree for that? Is there one which could fit to such a
"nearest neighbour search", or do I have to implement it myself...

Re: [pgsql-advocacy] Function which gives back the

From
"Virgile Beddok"
Date:
> On Mon, Mar 28, 2005 at 17:52:21 +0200,
>   Virgile Beddok <virgile.beddok@igd.fraunhofer.de> wrote:
>> Thanks for the help.
>> I'll try this for the one-dimensional search.
>> For the muti-dimensional one, which tools of postgresql could I use for
>> this metric function, or this indexable search, which Bruno mentioned.
>> Do they already exist?
>> What about using a tree for that? Is there one which could fit to such a
>> "nearest neighbour search", or do I have to implement it myself...
>
> You could look at boxes or cubes. You haven't said enough about your
> multidimensional problem to give specific answers.
>

I just want to know if there is an existing and implemented function, or
tree, in Postgres which allows me to directly perform a "nearest neighbour
search" on multidimensional vectors.

Re: Function which gives back the

From
Bruno Wolff III
Date:
On Mon, Mar 28, 2005 at 18:32:09 +0200,
  Virgile Beddok <virgile.beddok@igd.fraunhofer.de> wrote:
>
> I just want to know if there is an existing and implemented function, or
> tree, in Postgres which allows me to directly perform a "nearest neighbour
> search" on multidimensional vectors.

What precisely do you mean by "nearest"? To answer that question you
need a metric function. A common metric function is euclidean distance.
Another common one is distance along the geodesic connecting two points
on the surface of the sphere.

The cube contrib package has indexing that might be useful for doing lossy
matches if you have some upperbound on how far a part a nearest neighbor
can be in your problem. It even will handle multiple dimensions. (You
haven't even bothered to tell us how many dimensions your points have.)

The earthdistance contrib package has a function for calculating distance
on the surface of sphere and uses the cube package to provide a lossy
way of doing indexed searches.

If you want specific anwsers it would help if you would provide us with
more details about the problem you are trying to solve.