Thread: How to group by similarity?
Hi, I'm trying to get an idea about pg_trgrm. I created a GIST index on a text column in a table. Now I can filter the table with similarity(). How would I group the table so that it shows groups that have similarity () > x ? Lets say the table looks like this: id, txt 1, aa1 2, bb1 3, cc1 4, bb2 5, bb3 6, aa2 ... How would a select look like that shows: id, txt, group_id 1, aa1, 1, 6, aa2, 1, 2, bb1, 2, 4, bb2, 2, 5, bb3, 2, 3, cc1, 3 An extension of this problem would be to find similar records in 2 tables. As a result should apear a list of every record from table2 that is similar to a record of table1. Something like: table1.id, table2.id 1, 3 1, 5 1, 7 2, 2 2, 11
Hi-
With respect similarity, its a costly operation on CPU based on the cycles of checking the keyword.
Two words are said to be similar when atleast 2 or more consecutive characters are at the same. The more the consecutive characters are the same the more level of similarity.
It would look simple with the data aa1 and aa2, but with this data aqswderfcdfer11 and aqswderfcdfer12ssws both the keywords look similar but take more cycles of iteration to confirm the similarity.
Can it be this way, Posting the requirement you have on the data selection based on the design. So that, so other appropriate solution can be suggested.
--
Trinath Somanchi.
On Wed, Apr 25, 2012 at 12:09 AM, Andreas <maps.on@gmx.net> wrote:
Hi,
I'm trying to get an idea about pg_trgrm.
I created a GIST index on a text column in a table.
Now I can filter the table with similarity().
How would I group the table so that it shows groups that have similarity () > x ?
Lets say the table looks like this:
id, txt
1, aa1
2, bb1
3, cc1
4, bb2
5, bb3
6, aa2
...
How would a select look like that shows:
id, txt, group_id
1, aa1, 1,
6, aa2, 1,
2, bb1, 2,
4, bb2, 2,
5, bb3, 2,
3, cc1, 3
An extension of this problem would be to find similar records in 2 tables.
As a result should apear a list of every record from table2 that is similar to a record of table1.
Something like:
table1.id, table2.id
1, 3
1, 5
1, 7
2, 2
2, 11
--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
Regards,
----------------------------------------------
Trinath Somanchi,
+91 9866 235 130