Re: Fast tsearch2, trigram matching on short phrases - Mailing list pgsql-performance

From Oleg Bartunov
Subject Re: Fast tsearch2, trigram matching on short phrases
Date
Msg-id Pine.LNX.4.64.0708222248020.2727@sn.sai.msu.ru
Whole thread Raw
In response to Fast tsearch2, trigram matching on short phrases  ("Carlo Stonebanks" <stonec.register@sympatico.ca>)
Responses Re: Fast tsearch2, trigram matching on short phrases
List pgsql-performance
On Wed, 22 Aug 2007, Carlo Stonebanks wrote:

> I have read that trigram matching (similarity()) performance degrades when
> the matching is on longer strings such as phrases. I need to quickly match
> strings and rate them by similiarity. The strings are typically one to seven
> words in length - and will often include unconventional abbreviations and
> misspellings.
>
> I have a stored function which does more thorough testing of the phrases,
> including spelling correction, abbreviation translation, etc... and scores
> the results - I pick the winning score that passes a pass/fail constant.
> However, the function is slow. My solution was to reduce the number of rows
> that are passed to the function by pruning obvious mismatches using
> similarity(). However, trigram matching on phrases is slow as well.

you didn't show us explain analyze of your select.

>
> I have experimented with tsearch2 but I have two problems:
>
> 1) I need a "score" so I can decide if match passed or failed. trigram
> similarity() has a fixed result that you can test, but I don't know if rank()
> returns results that can be compared to a fixed value
>
> 2) I need an efficient methodology to create vectors based on trigrams, and a
> way to create an index to support it. My tsearch2 experiment with normal
> vectors used gist(text tsvector) and an on insert/update trigger to populate
> the vector field.
>
> Any suggestions on where to go with this project to improve performance would
> be greatly appreciated.
>
> Carlo
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

pgsql-performance by date:

Previous
From: "Steven Flatt"
Date:
Subject: Re: When/if to Reindex
Next
From: "Kevin Grittner"
Date:
Subject: Re: Optimising "in" queries