Home > mailing lists

Re: [PERFORM] Similarity search with the tsearch2 extension - Mailing list pgsql-general

From	Janek Sendrowski
Subject	Re: [PERFORM] Similarity search with the tsearch2 extension
Date	December 6, 2013 16:21:17
Msg-id	trinity-cf3ecc79-6b1d-4bbe-a706-5553ce2e50ca-1386346873185@3capp-webde-bs06 Whole thread
List	pgsql-general

Tree view

Sorry, I used AND-statements instead of OR-statement in the example.
I notices that gin is much faster than gist, but I don't know why.

The query gets slow, because there are many non-stop words which appear very often in my sentences, like in 3% of all
thesentences. 
Do you think it could be worth it to filter the words, which appears that often and declare them as stop-words.
How would you split a sentence with let's say 10 non stop words to provide a performed similarity search?
 
There's still the problem with very short sentences. An partiel index on them with the trigram search might be the
solution.
The pg_trgm module is far to slow for bigger setences, like you showed.
 
I thought I'll build a few partiel indexes on the string length, to enhance the performance.
Do you know some more improvements?
 
Janek Sendrowki

pgsql-general by date:

From: Tom Lane
Date: 06 December 2013, 16:19:35
Subject: Re: Testing an extension without installing it

From: 吕晓旭
Date: 06 December 2013, 19:04:19
Subject: Re: Fwd: Help！Why CPU Usage and LoadAverage Jump up Suddenly

Re: [PERFORM] Similarity search with the tsearch2 extension - Mailing list pgsql-general

Previous

Next