Re: pg_trgm performance - Mailing list pgsql-performance
From | Oleg Bartunov |
---|---|
Subject | Re: pg_trgm performance |
Date | |
Msg-id | Pine.LNX.4.64.0702261552120.400@sn.sai.msu.ru Whole thread Raw |
In response to | Re: pg_trgm performance ("Guillaume Smet" <guillaume.smet@gmail.com>) |
Responses |
Re: pg_trgm performance
("Guillaume Smet" <guillaume.smet@gmail.com>)
|
List | pgsql-performance |
On Mon, 26 Feb 2007, Guillaume Smet wrote: > On 2/24/07, Steinar H. Gunderson <sgunderson@bigfoot.com> wrote: > > Thanks for your time. > >> GiN version, short: >> -> Bitmap Heap Scan on tags (cost=8.64..151.79 rows=41 width=0) >> (actual time=5.555..30.157 rows=7 loops=1) >> Filter: (title % 'foo'::text) >> -> Bitmap Index Scan on trgm_idx (cost=0.00..8.63 rows=41 >> width=0) (actual time=2.857..2.857 rows=5555 loops=1) >> Index Cond: (title % 'foo'::text) > > This is currently the worst case in the gist - gin comparison because > in the index scan, gin version doesn't have the length of the indexed > string. So it returns a lot of rows which have every trigram of your > search string but has in fact a low similarity due to the length of > the indexed string (5555 rows -> 7 rows). > It cannot be fixed at the moment due to the way GIN indexes work. > >> So, the GiN version seems to be a bit faster for long queries, but it's >> still >> too slow -- in fact, _unindexed_ versions give 141ms, 342ms, 725ms for >> these >> three queries, so for the longer queries, the gain is only about a factor >> two. (By the way, I would like to stress that this is not my personal music >> collection! :-P) > > The fact is that pg_trgm is designed to index words and not to index > long sentences. I'm not that surprised it's slow in your case. > > It's also my case but following the instructions in README.pg_trgm I > created a dictionary of words using tsearch2 (stat function) and I use > pg_trgm on this dictionary to find similar words in my dictionary. > > For example, I rewrite the search: > auberge cevenes > as: > (auberge | auberges | aubberge | auberg) & (ceven | cene | cevenol | > cevennes) > using pg_trgm and my query can find Auberge des C?vennes (currently > it's limited to the 4th most similar words but I can change it > easily). Did you rewrite query manually or use rewrite feature of tsearch2 ? > > -- > Guillaume > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-performance by date: