Re: Very bad FTS performance with the Polish config - Mailing list pgsql-hackers

From Wojciech Knapik
Subject Re: Very bad FTS performance with the Polish config
Date
Msg-id 4B04125A.50906@wolniartysci.pl
Whole thread Raw
In response to Re: Very bad FTS performance with the Polish config  (Oleg Bartunov <oleg@sai.msu.su>)
Responses Re: Very bad FTS performance with the Polish config  (Oleg Bartunov <oleg@sai.msu.su>)
List pgsql-hackers
Oleg Bartunov wrote:

>>> your polish_english, polish configurations uses ispell language
>>> and slow, while english configuration doesn't contains ispell.
>>> So, what's your complains ? Try add ispell dictionary to english
>>> configuration and see timings.
>> 
>> Oh, so this is not anomalous ? These are the expected speeds for an
>> ispell dictionary ? I didn't realize that. Sorry for the bother
>> then. It just seemed way too slow to be practical.
> 
> You can see real timings using ts_lexize() function for different 
> dictionaries (try several time to avoid cold-start problem) instead
> of ts_headline(), which involves other factors.
> 
> On my test machine I see no real difference between very simple
> dictionary and french ispell, snowball dictionaries:

ts_lexize seems to be just as fast for simple, polish_ispell and 
english_stem with the 'voila' argument.

polish_ispell is in fact *faster* for the lorem ipsum text repeated a 
couple times (10 ?). Which suggests that the issue is with ts_headline 
iteself.

> I see no big difference in ts_headline as well:
> 
> dev-oleg=# select ts_headline('english','I can do voila', 
> 'voila'::tsquery);
>       ts_headline
> -----------------------
>  I can do <b>voila</b>
> (1 row)
> 
> Time: 0.265 ms

Yes, for 4-word texts the results are similar.
Try that with a longer text and the difference becomes more and more 
significant. For the lorem ipsum text, 'polish' is about 4 times slower, 
than 'english'. For 5 repetitions of the text, it's 6 times, for 10 
repetitions - 7.5 times...

> This is 8.4.1 version of PostgreSQL.

An that was 8.3.8/OSX.

cheers,
Wojciech Knapik


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: UTF8 with BOM support in psql
Next
From: Peter Eisentraut
Date:
Subject: Re: Python 3.1 support