Home > mailing lists

efficient way to do "fuzzy" join - Mailing list pgsql-general

From	Rémi Cura
Subject	efficient way to do "fuzzy" join
Date	April 11, 2014 12:50:39
Msg-id	CAJvUf_sUFAMdsPRPYRT2WNxrFqt0Bs=xYS-pvy=EAdOFyVg3fw@mail.gmail.com Whole thread
Responses	Re: efficient way to do "fuzzy" join Re: efficient way to do "fuzzy" join
List	pgsql-general

Tree view

Hey dear List,

I'm looking for some advice about the best way to perform a "fuzzy" join, that is joining two table based on approximate matching.

It is about temporal matching

given a table A with rows containing data and a control_time (for instance 1 ; 5; 6; .. sec, not necessarly rounded of evenly-spaced)

given another table B with lines on no precise timing (eg control_time = 2.3 ; 5.8 ; 6.2 for example)

How to join every row of B to A based on min(@(A.control_time-B.control_time))
(that is, for every row of B, get the row of A that is temporaly the closest),

in an efficient way?

(to be explicit, 2.3 would match to 1, 5.8 to 6, 6.2 to 6)

Optionnaly, how to get interpolation efficiently (meaning one has to get the previous time and next time for 1 st order interpolation, 2 before and 2 after for 2nd order interpolation, and so on)?

(to be explicit 5.8 would match to 5 and 6, the weight being 0.2 and 0.8 respectively)

Currently my data is spatial so I use Postgis function to interpolate a point on a line, but is is far from efficient or general, and I don't have control on interpolation (only the spatial values are interpolated).

Cheers,
Rémi-C

pgsql-general by date:

From: Alban Hertroys
Date: 11 April 2014, 11:58:30
Subject: Re: Linux vs FreeBSD

From: Steve Litt
Date: 11 April 2014, 13:16:12
Subject: Re: Linux vs FreeBSD

efficient way to do "fuzzy" join - Mailing list pgsql-general

Previous

Next