Re: Postgresql GROUP BY "SIMILAR" but not equal values - Mailing list pgsql-general

From Tom Lane
Subject Re: Postgresql GROUP BY "SIMILAR" but not equal values
Date
Msg-id 6008.1391700722@sss.pgh.pa.us
Whole thread Raw
In response to Postgresql GROUP BY "SIMILAR" but not equal values  (alexandros_e <alexandros.ef@gmail.com>)
Responses Re: Postgresql GROUP BY "SIMILAR" but not equal values  ("Gauthier, Dave" <dave.gauthier@intel.com>)
List pgsql-general
alexandros_e <alexandros.ef@gmail.com> writes:
> Is there a way in SQL or PostgreSQL in general to group by values than are
> not exactly the same but are quite similar (like 'ABC' and 'ABCD') based on
> some distance function (levenshtein for example) if the distance is within
> some threshold (i.e., 1)

Well, you can GROUP BY the result of a function.

You are going to have to think harder than the above in any case.
For example, it's not hard to imagine a "similarity" operator that
says that A is similar to B, and B is similar to C, but if you ask
it to compare A to C it says they're not similar (enough).  Now what?
Are A,B,C all part of the same group?  If you take the transitive
closure of such an operator you probably end up with everything in
one group; but if you don't, it's hard to see a principled result
at all.

If you can cast your problem as transformation of the values into
some canonical or representative form, then you can do that and then
group on simple equality of the canonical values.  For instance
case-insensitive grouping is customarily done with

      GROUP BY lower(x)

            regards, tom lane


pgsql-general by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: Help with connection issue - started today
Next
From: Adrian Klaver
Date:
Subject: Re: Help with connection issue - started today