Re: Very bad FTS performance with the Polish config - Mailing list pgsql-hackers

From Wojciech Knapik
Subject Re: Very bad FTS performance with the Polish config
Date
Msg-id 4B044746.3090604@wolniartysci.pl
Whole thread Raw
In response to Re: Very bad FTS performance with the Polish config  (Oleg Bartunov <oleg@sai.msu.su>)
Responses Re: Very bad FTS performance with the Polish config  (Sushant Sinha <sushant354@gmail.com>)
List pgsql-hackers
Oleg Bartunov wrote:

>> Yes, for 4-word texts the results are similar.
>> Try that with a longer text and the difference becomes more and more 
>> significant. For the lorem ipsum text, 'polish' is about 4 times 
>> slower, than 'english'. For 5 repetitions of the text, it's 6 times, 
>> for 10 repetitions - 7.5 times...
> 
> Again, I see nothing unclear here, since dictionaries (as specified
> in configuration) apply to ALL words in document. The more words in 
> document, the more overhead.

You're missing the point. I'm not surprised that the function takes more 
time for larger input texts - that's obvious. The thing is, the 
computation times rise more steeply when the Polish config is used. 
Steeply enough, that the difference between the Polish and English 
configs becomes enormous in practical cases.

Now this may be expected behaviour, but since I don't know if it is, I 
posted to the mailing lists to find out. If you're saying this is ok and 
there's nothing to fix here, then there's nothing more to discuss and we 
may consider the thread closed.
If not, ts_headline deserves a closer look.

cheers,
Wojciech Knapik


pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Timezones (in 8.5?)
Next
From: Nathan Boley
Date:
Subject: Re: Python 3.1 support