Re: n-gram search function - Mailing list pgsql-hackers

From Guillaume Smet
Subject Re: n-gram search function
Date
Msg-id 1d4e0c10702181441o106f7998p1bd7e5d59a86754@mail.gmail.com
Whole thread Raw
In response to Re: n-gram search function  (Oleg Bartunov <oleg@sai.msu.su>)
Responses Re: n-gram search function  (Oleg Bartunov <oleg@sai.msu.su>)
List pgsql-hackers
Hi Oleg,

On 2/17/07, Oleg Bartunov <oleg@sai.msu.su> wrote:
> 3-gram is implemented as a contrib/pg_trgm. It currently uses GiST index,
> but may be enhanced with the GiN.

As I'm facing the same problem, I've taken a look to pg_trgm. At the
moment, my opinion is quite mixed but perhaps I did something wrong.

I have a table (100k rows) with a location name in it generally
composed of several words but not that long. I created the index
directly on this column (ie I don't create a table with each word of
the location name). Then I tried a few queries.

Here is an example:

prod=# explain analyze select nomlieu from lieu where nomlieu ilike '%gaumont%';
   QUERY PLAN
 
-----------------------------------------------------------------------------------------------------Seq Scan on lieu
(cost=0.00..7230.20rows=7 width=21) (actual
 
time=7.768..556.930 rows=39 loops=1)  Filter: ((nomlieu)::text ~~* '%gaumont%'::text)Total runtime: 557.066 ms
(3 rows)

_prod=# explain analyze select nomlieu from lieu where nomlieu % 'gaumont';
              QUERY
 
PLAN

-------------------------------------------------------------------------------------------------------------------------------------Bitmap
HeapScan on lieu  (cost=3.37..200.80 rows=106 width=21)
 
(actual time=689.799..690.035 rows=36 loops=1)  Recheck Cond: ((nomlieu)::text % 'gaumont'::text)  ->  Bitmap Index
Scanon idx_lieu_nomlieu_trgm  (cost=0.00..3.37
 
rows=106 width=0) (actual time=689.749..689.749 rows=36 loops=1)        Index Cond: ((nomlieu)::text %
'gaumont'::text)Totalruntime: 690.195 ms
 
(5 rows)

The trigram version is slower and doesn't return 3 results I should
have. The 3 results it doesn't return have the word gaumont in them at
the start of the string exactly like the others.

Is there anything I can do to improve the performances and investigate
why I don't have these 3 results?

Thanks.

--
Guillaume


pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: New feature request: FlashBack Query
Next
From: "Merlin Moncure"
Date:
Subject: Re: Plan invalidation design