Re: Similarity search for sentences - Mailing list pgsql-general

From Kevin Grittner
Subject Re: Similarity search for sentences
Date
Msg-id 1386345502.6533.YahooMailNeo@web162905.mail.bf1.yahoo.com
Whole thread Raw
In response to Re: Similarity search for sentences  ("Janek Sendrowski" <janek12@web.de>)
List pgsql-general
Janek Sendrowski <janek12@web.de> wrote:

> I didn't know that the pg_trgm Module provides KNN search

It does, although my own experience shows that it tends to be more
appropriate for name searches or similar smaller columns than for
big text columns.  Using the war_and_peace table from another
thread:

test=# CREATE INDEX war_and_peace_linetext_trgm2 ON war_and_peace
using gist (linetext gist_trgm_ops);
CREATE INDEX
test=# VACUUM ANALYZE war_and_peace;
VACUUM
test=# -- Use a KNN search for the words.
EXPLAIN ANALYZE
SELECT * FROM war_and_peace
  ORDER BY 'ladies gentlemen provinces distance' <-> linetext
  LIMIT 10;
                                                                          QUERY
PLAN                                                                          

---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.28..2.74 rows=10 width=116) (actual time=42.157..42.246 rows=10 loops=1)
   ->  Index Scan using war_and_peace_linetext_trgm2 on war_and_peace  (cost=0.28..16016.42 rows=65007 width=116)
(actualtime=42.155..42.243 rows=10 loops=1) 
         Order By: (linetext <-> 'ladies gentlemen provinces distance'::text)
 Total runtime: 42.716 ms
(4 rows)

As you can see, it pulls the requested number of rows from the
index in the order of "distance" (inverse similarity).  It is,
however, not as fast as a tsearch2 search (43 ms instead of a
fraction of a ms), and it doesn't give you options to use AND/OR as
you might like.  It is still a lot faster than scanning the whole
table and applying the test to each row.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-general by date:

Previous
From: Florian Weimer
Date:
Subject: Testing an extension without installing it
Next
From: Tom Lane
Date:
Subject: Re: Testing an extension without installing it