Home > mailing lists

Fast tsearch2, trigram matching on short phrases - Mailing list pgsql-performance

From	Carlo Stonebanks
Subject	Fast tsearch2, trigram matching on short phrases
Date	August 22, 2007 13:02:18
Msg-id	fahmm4$ffo$1@news.hub.org Whole thread Raw
Responses	Re: Fast tsearch2, trigram matching on short phrases Re: Fast tsearch2, trigram matching on short phrases
List	pgsql-performance

Tree view

I have read that trigram matching (similarity()) performance degrades when
the matching is on longer strings such as phrases. I need to quickly match
strings and rate them by similiarity. The strings are typically one to seven
words in length - and will often include unconventional abbreviations and
misspellings.

I have a stored function which does more thorough testing of the phrases,
including spelling correction, abbreviation translation, etc... and scores
the results - I pick the winning score that passes a pass/fail constant.
However, the function is slow. My solution was to reduce the number of rows
that are passed to the function by pruning obvious mismatches using
similarity(). However, trigram matching on phrases is slow as well.

I have experimented with tsearch2 but I have two problems:

1) I need a "score" so I can decide if match passed or failed. trigram
similarity() has a fixed result that you can test, but I don't know if
rank() returns results that can be compared to a fixed value

2) I need an efficient methodology to create vectors based on trigrams, and
a way to create an index to support it. My tsearch2 experiment with normal
vectors used gist(text tsvector) and an on insert/update trigger to populate
the vector field.

Any suggestions on where to go with this project to improve performance
would be greatly appreciated.

Carlo

pgsql-performance by date:

From: "Joshua D. Drake"
Date: 22 August 2007, 13:01:44
Subject: Re: io storm on checkpoints, postgresql 8.2.4, linux

From: "Dmitry Potapov"
Date: 22 August 2007, 13:16:24
Subject: Re: io storm on checkpoints, postgresql 8.2.4, linux

Fast tsearch2, trigram matching on short phrases - Mailing list pgsql-performance

Previous

Next