Re: efficient way to do "fuzzy" join - Mailing list pgsql-general

From Andy Colson
Subject Re: efficient way to do "fuzzy" join
Date
Msg-id 534939D7.4060606@squeakycode.net
Whole thread Raw
In response to Re: efficient way to do "fuzzy" join  (Rémi Cura <remi.cura@gmail.com>)
Responses Re: efficient way to do "fuzzy" join  (Rémi Cura <remi.cura@gmail.com>)
List pgsql-general
On 04/12/2014 06:29 AM, Rémi Cura wrote:
> (please note that this random string function is NOT the good way to
> do it, i should random int then use it as index to an array
> containing all the letter)
>
> Thanks a lot for this new version! It seems to be slower than your
> first solution (no index use I guess, I gave up after 5 minutes vs 5
> sec for the previous). Morevover, I canno't make assumption about a
> fixed interval (2 sec in your example). But I think I see where you
> are going.
>
>
> After some test, the fastest is using BETWEEN and range. (it is way
> faster than using the <@, strangely)
>
> Here is the code :

Ah, sorry about that.  I got pulled away to work on work stuff.  I was trying to figure out how to use an index on the
rangequery, but not sure, without adding a new column if it would even work. 

I've never had the need for ranges yet, this is the first time I've gotten to play with them.

I would not have thought about between like that, good call.  I'd have never guess it would be so fast.


If you can't use the fixed interval, then ranges are out.

I was thinking this could be improved:

select t,
  (select t from a where a.t >= b.t order by a.t limit 1) as mint,
  (select t from a where a.t < b.t order by a.t desc limit 1) as maxt
from b

It does two selects into a to find the nearest.  Given this:

create table a(t float);

insert into a values (1), (5), (6);

could you write a single query to find the number nearest 3.5?  If so we might cut the work by 50%.

-Andy

PS: This list prefers you don't top post.


pgsql-general by date:

Previous
From: Rémi Cura
Date:
Subject: Re: efficient way to do "fuzzy" join
Next
From: Jan Wieck
Date:
Subject: Re: Cancelling of autovacuums considered harmful