Thread: FTS performance issue probably due to wrong planner estimate of detoasting

FTS performance issue probably due to wrong planner estimate of detoasting

From
Stefan Keller
Date:
Hi,

I have problems with the performance of FTS in a query like this:

  SELECT * FROM FullTextSearch WHERE content_tsv_gin @@
plainto_tsquery('english', 'good');

It's slow (> 30 sec.) for some GB (27886 html files, originally 73 MB zipped).
The planner obviously always chooses table scan: http://explain.depesz.com/s/EEE
I have to check again, if I'm doing something wrong but I'm pretty
sure it has to do with de-toasting and (wrong?) cost estimations.

I've seen some comments here saying that estimating detoasting costs
(especially with operator "@@" and GIN index) is an open issue (since
years?).
And I found a nice blog here [1] which uses 9.2/9.1 and proposes to
disable sequential table scan (SET enable_seqscan off;). But this is
no option for me since other queries still need seqscan.
Can anyone tell me if is on some agenda here (e.g. as an open item for >9.2)?

Yours, Stefan

[1] http://palominodb.com/blog/2012/03/06/considerations-about-text-searchs-big-fields-and-planner-costs


Re: FTS performance issue probably due to wrong planner estimate of detoasting

From
Pavel Stehule
Date:
Hello

you can try to wrap searching to immutable function and use following trick

http://postgres.cz/wiki/PostgreSQL_SQL_Tricks#Using_IMMUTABLE_functions_as_hints_for_the_optimizer

Regards

Pavel Stehule

2013/2/8 Stefan Keller <sfkeller@gmail.com>:
> Hi,
>
> I have problems with the performance of FTS in a query like this:
>
>   SELECT * FROM FullTextSearch WHERE content_tsv_gin @@
> plainto_tsquery('english', 'good');
>
> It's slow (> 30 sec.) for some GB (27886 html files, originally 73 MB zipped).
> The planner obviously always chooses table scan: http://explain.depesz.com/s/EEE
> I have to check again, if I'm doing something wrong but I'm pretty
> sure it has to do with de-toasting and (wrong?) cost estimations.
>
> I've seen some comments here saying that estimating detoasting costs
> (especially with operator "@@" and GIN index) is an open issue (since
> years?).
> And I found a nice blog here [1] which uses 9.2/9.1 and proposes to
> disable sequential table scan (SET enable_seqscan off;). But this is
> no option for me since other queries still need seqscan.
> Can anyone tell me if is on some agenda here (e.g. as an open item for >9.2)?
>
> Yours, Stefan
>
> [1] http://palominodb.com/blog/2012/03/06/considerations-about-text-searchs-big-fields-and-planner-costs
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


Re: FTS performance issue probably due to wrong planner estimate of detoasting

From
Jesper Krogh
Date:
On 08/02/13 01:52, Stefan Keller wrote:
> Hi,
>
> I have problems with the performance of FTS in a query like this:
>
>    SELECT * FROM FullTextSearch WHERE content_tsv_gin @@
> plainto_tsquery('english', 'good');
>
> It's slow (> 30 sec.) for some GB (27886 html files, originally 73 MB zipped).
> The planner obviously always chooses table scan: http://explain.depesz.com/s/EEE
> I have to check again, if I'm doing something wrong but I'm pretty
> sure it has to do with de-toasting and (wrong?) cost estimations.
If you havent done it .. bump up statistics target on the column and
re-analyze, see what that gives.

I have also been playing with the cost-numbers in order to get it to favour
an index-scan more often. That is lowering random_page_cost to be close to
seq_page_cost, dependent on your system, the amount of memory, etc, then
this can have negative side-effects on non-gin-queries.

--
Jesper


Re: FTS performance issue probably due to wrong planner estimate of detoasting

From
Stefan Keller
Date:
Hi Jesper and Pavel

Thx for your hints.
I'm rather reluctant in tuning with unwanted side effects, We'll see.
I have to setup my system and db again before I can try out your tricks.

Yours, Stefan

2013/2/8 Jesper Krogh <jesper@krogh.cc>:
> On 08/02/13 01:52, Stefan Keller wrote:
>>
>> Hi,
>>
>> I have problems with the performance of FTS in a query like this:
>>
>>    SELECT * FROM FullTextSearch WHERE content_tsv_gin @@
>> plainto_tsquery('english', 'good');
>>
>> It's slow (> 30 sec.) for some GB (27886 html files, originally 73 MB
>> zipped).
>> The planner obviously always chooses table scan:
>> http://explain.depesz.com/s/EEE
>> I have to check again, if I'm doing something wrong but I'm pretty
>> sure it has to do with de-toasting and (wrong?) cost estimations.
>
> If you havent done it .. bump up statistics target on the column and
> re-analyze, see what that gives.
>
> I have also been playing with the cost-numbers in order to get it to favour
> an index-scan more often. That is lowering random_page_cost to be close to
> seq_page_cost, dependent on your system, the amount of memory, etc, then
> this can have negative side-effects on non-gin-queries.
>
> --
> Jesper