Home > mailing lists

Re: Grouping By Similarity (using pg_trgm)? - Mailing list pgsql-general

From	David G. Johnston
Subject	Re: Grouping By Similarity (using pg_trgm)?
Date	May 14, 2015 19:08:29
Msg-id	CAKFQuwaoWL3B6sLBAgrKxBrYB1UJLZzWUruSEQUj_QaYApu-nA@mail.gmail.com Whole thread
In response to	Grouping By Similarity (using pg_trgm)? (Cory Tucker <cory.tucker@gmail.com>)
Responses	Re: Grouping By Similarity (using pg_trgm)?
List	pgsql-general

Tree view

On Thu, May 14, 2015 at 11:58 AM, Cory Tucker <cory.tucker@gmail.com> wrote:

[pg version 9.3 or 9.4]

Suppose I have a simple table:

create table data (
my_value TEXT NOT NULL
);
CREATE INDEX idx_my_value ON data USING gin(my_value gin_trgm_ops);

Now I would like to essentially do group by to get a count of all the values that are sufficiently similar. I can do it using something like a CROSS JOIN to join the table on itself, but then I still am getting all the rows with duplicate counts.

Is there a way to do a group by query and only return a single "my_value" column and a count of the number of times other values are similar while also not returning the included similar values in the output, too?

Concept below - not bothering to lookup the functions/operators for pg_trgm:

SELECT my_value_src, count(*)

FROM (SELECT my_value AS my_value_src FROM data) src

JOIN (SELECT my_value AS my_value_compareto FROM data) comparedto

ON ( func(my_value_src, my_value_compareto) < # )

GROUP BY my_value_src

David J.

pgsql-general by date:

From: Cory Tucker
Date: 14 May 2015, 18:59:09
Subject: Grouping By Similarity (using pg_trgm)?

From: Cory Tucker
Date: 14 May 2015, 20:09:24
Subject: Re: Grouping By Similarity (using pg_trgm)?

Re: Grouping By Similarity (using pg_trgm)? - Mailing list pgsql-general

Previous

Next