FTS performance issue - planner problem identified (but only partially resolved) - Mailing list pgsql-performance

From Stefan Keller
Subject FTS performance issue - planner problem identified (but only partially resolved)
Date
Msg-id CAFcOn2_Ei2nf+-odnWost0niAsvfcrsJe_LuV4X9P7KG0FJcQA@mail.gmail.com
Whole thread Raw
Responses Re: FTS performance issue - planner problem identified (but only partially resolved)  (Stefan Keller <sfkeller@gmail.com>)
Re: FTS performance issue - planner problem identified (but only partially resolved)  (Marc Mamin <M.Mamin@intershop.de>)
Re: FTS performance issue - planner problem identified (but only partially resolved)  (Kevin Grittner <kgrittn@ymail.com>)
List pgsql-performance
Hi

At 2013/2/8 I wrote:
> I have problems with the performance of FTS in a query like this:
>
> SELECT * FROM FullTextSearch WHERE content_tsv_gin @@
> plainto_tsquery('english', 'good');
>
> It's slow (> 30 sec.) for some GB (27886 html files, originally 73 MB zipped).
> The planner obviously always chooses table scan

Now, I've identified (but only partially resolved) the issue: Here are
my comments:

Thats the query in question (see commented log below):

select id,title,left(content,100)
from fulltextsearch
where plainto_tsquery('pg_catalog.english','good') @@
to_tsvector('pg_catalog.english',content);

After having created the GIN index, the FTS query unexpectedly is fast
because planner chooses "Bitmap Index Scan". After the index
statistics have been updated, the same query becomes slow. Only when
using the "trick" with the function in the WHERE clause. I think GIST
does'nt change anything.

select id,title,left(content,100)
from fulltextsearch, plainto_tsquery('pg_catalog.english','good') query
where query @@ to_tsvector('pg_catalog.english',content);

=> This hint should mentioned in the docs!

Then, setting enable_seqscan to off makes original query fast again.
But that's a setting I want to avoid in a multi-user database.
Finally, setting random_page_cost to 1 helps also - but I don't like
this setting neither.

=> To me the planner should be updated to recognize immutable
plainto_tsquery() function in the WHERE clause and choose "Bitmap
Index Scan" at the first place.

What do you think?

Yours, Stefan


----
Lets look at table fulltextsearch:

movies=# \d fulltextsearch
                          Table "public.fulltextsearch"
 Column  |  Type   |                          Modifiers
---------+---------+-------------------------------------------------------------
 id      | integer | not null default nextval('fulltextsearch_id_seq'::regclass)
 docid   | integer | default 0
 title   | text    |
 content | text    | not null

movies=# CREATE INDEX fulltextsearch_gincontent ON fulltextsearch
USING gin(to_tsvector('pg_catalog.english',content));

movies=# SELECT * FROM pg_class c WHERE relname LIKE 'fullt%';
  oid   |           name            | kind |   tuples    | pages |
allvisible | toastrelid | hasindex
--------+---------------------------+------+-------------+-------+------------+------------+----------
 476289 | fulltextsearch            | r    |       27886 |   555 |
     0 |     476293 | t
 503080 | fulltextsearch_gincontent | i    | 8.97135e+06 | 11133 |
     0 |          0 | f
 476296 | fulltextsearch_id_seq     | S    |           1 |     1 |
     0 |          0 | f
 503075 | fulltextsearch_pkey       | i    |       27886 |    79 |
     0 |          0 | f
(4 rows)

=> fulltextsearch_gincontent has an arbitrary large number of tuples
(statistics is wrong and not yet updated)

movies=#
explain (analyze,costs,timing,buffers)
select id,title,left(content,100)
from fulltextsearch
where plainto_tsquery('pg_catalog.english','good') @@
to_tsvector('pg_catalog.english',content);
=> Unexpectedly, the query is fast!
See query plan http://explain.depesz.com/s/ewn

Let's update the statistics:

movies=# VACUUM ANALYZE VERBOSE fulltextsearch ;

SELECT * FROM pg_class c WHERE relname LIKE 'fullt%';
  oid   |           name            | kind | tuples | pages |
allvisible | toastrelid | hasindex
--------+---------------------------+------+--------+-------+------------+------------+----------
 476289 | fulltextsearch            | r    |  27886 |   555 |
555 |     476293 | t
 503080 | fulltextsearch_gincontent | i    |  27886 | 11133 |
0 |          0 | f
 476296 | fulltextsearch_id_seq     | S    |      1 |     1 |
0 |          0 | f
 503075 | fulltextsearch_pkey       | i    |  27886 |    79 |
0 |          0 | f
(4 rows)

=> Now after having update statistics (see especially tuples of
fulltextsearch_gincontent ) the original query is slow!
See query plan http://explain.depesz.com/s/MQ60

Now, let's reformulate the original query and move the function call
to plainto_tsquery to the FROM clause:

movies=# explain (analyze,costs,timing,buffers)
select id,title,left(content,100)
from fulltextsearch, plainto_tsquery('pg_catalog.english','good') query
where query @@ to_tsvector('pg_catalog.english',content);
=> This special query is fast again!  See query plan
http://explain.depesz.com/s/FVT

Setting enable_seqscan to off makes query fast again: See query plan
http://explain.depesz.com/s/eOr

Finally, setting random_page_cost to 1 helps also (default is 4):

movies=# set enable_seqscan to default;
movies=# set random_page_cost to 1.0;
=> Query is fast. See query plan http://explain.depesz.com/s/M5Ke

----


pgsql-performance by date:

Previous
From: Shaun Thomas
Date:
Subject: Re: PostgreSQL settings for running on an SSD drive
Next
From: Stefan Keller
Date:
Subject: Re: How to properly index hstore tags column to faster search for keys