Home > mailing lists

FTS performance issue - planner problem identified (but only partially resolved) - Mailing list pgsql-performance

From	Stefan Keller
Subject	FTS performance issue - planner problem identified (but only partially resolved)
Date	July 18, 2013 23:39:25
Msg-id	CAFcOn2_Ei2nf+-odnWost0niAsvfcrsJe_LuV4X9P7KG0FJcQA@mail.gmail.com Whole thread
Responses	Re: FTS performance issue - planner problem identified (but only partially resolved) Re: FTS performance issue - planner problem identified (but only partially resolved) Re: FTS performance issue - planner problem identified (but only partially resolved)
List	pgsql-performance

Tree view

Hi

At 2013/2/8 I wrote:
> I have problems with the performance of FTS in a query like this:
>
> SELECT * FROM FullTextSearch WHERE content_tsv_gin @@
> plainto_tsquery('english', 'good');
>
> It's slow (> 30 sec.) for some GB (27886 html files, originally 73 MB zipped).
> The planner obviously always chooses table scan

Now, I've identified (but only partially resolved) the issue: Here are
my comments:

Thats the query in question (see commented log below):

select id,title,left(content,100)
from fulltextsearch
where plainto_tsquery('pg_catalog.english','good') @@
to_tsvector('pg_catalog.english',content);

After having created the GIN index, the FTS query unexpectedly is fast
because planner chooses "Bitmap Index Scan". After the index
statistics have been updated, the same query becomes slow. Only when
using the "trick" with the function in the WHERE clause. I think GIST
does'nt change anything.

select id,title,left(content,100)
from fulltextsearch, plainto_tsquery('pg_catalog.english','good') query
where query @@ to_tsvector('pg_catalog.english',content);

=> This hint should mentioned in the docs!

Then, setting enable_seqscan to off makes original query fast again.
But that's a setting I want to avoid in a multi-user database.
Finally, setting random_page_cost to 1 helps also - but I don't like
this setting neither.

=> To me the planner should be updated to recognize immutable
plainto_tsquery() function in the WHERE clause and choose "Bitmap
Index Scan" at the first place.

What do you think?

Yours, Stefan

----
Lets look at table fulltextsearch:

movies=# \d fulltextsearch
                          Table "public.fulltextsearch"
 Column  |  Type   |                          Modifiers
---------+---------+-------------------------------------------------------------
 id      | integer | not null default nextval('fulltextsearch_id_seq'::regclass)
 docid   | integer | default 0
 title   | text    |
 content | text    | not null

movies=# CREATE INDEX fulltextsearch_gincontent ON fulltextsearch
USING gin(to_tsvector('pg_catalog.english',content));

movies=# SELECT * FROM pg_class c WHERE relname LIKE 'fullt%';
  oid   |           name            | kind |   tuples    | pages |
allvisible | toastrelid | hasindex
--------+---------------------------+------+-------------+-------+------------+------------+----------
 476289 | fulltextsearch            | r    |       27886 |   555 |
     0 |     476293 | t
 503080 | fulltextsearch_gincontent | i    | 8.97135e+06 | 11133 |
     0 |          0 | f
 476296 | fulltextsearch_id_seq     | S    |           1 |     1 |
     0 |          0 | f
 503075 | fulltextsearch_pkey       | i    |       27886 |    79 |
     0 |          0 | f
(4 rows)

=> fulltextsearch_gincontent has an arbitrary large number of tuples
(statistics is wrong and not yet updated)

movies=#
explain (analyze,costs,timing,buffers)
select id,title,left(content,100)
from fulltextsearch
where plainto_tsquery('pg_catalog.english','good') @@
to_tsvector('pg_catalog.english',content);
=> Unexpectedly, the query is fast!
See query plan http://explain.depesz.com/s/ewn

Let's update the statistics:

movies=# VACUUM ANALYZE VERBOSE fulltextsearch ;

SELECT * FROM pg_class c WHERE relname LIKE 'fullt%';
  oid   |           name            | kind | tuples | pages |
allvisible | toastrelid | hasindex
--------+---------------------------+------+--------+-------+------------+------------+----------
 476289 | fulltextsearch            | r    |  27886 |   555 |
555 |     476293 | t
 503080 | fulltextsearch_gincontent | i    |  27886 | 11133 |
0 |          0 | f
 476296 | fulltextsearch_id_seq     | S    |      1 |     1 |
0 |          0 | f
 503075 | fulltextsearch_pkey       | i    |  27886 |    79 |
0 |          0 | f
(4 rows)

=> Now after having update statistics (see especially tuples of
fulltextsearch_gincontent ) the original query is slow!
See query plan http://explain.depesz.com/s/MQ60

Now, let's reformulate the original query and move the function call
to plainto_tsquery to the FROM clause:

movies=# explain (analyze,costs,timing,buffers)
select id,title,left(content,100)
from fulltextsearch, plainto_tsquery('pg_catalog.english','good') query
where query @@ to_tsvector('pg_catalog.english',content);
=> This special query is fast again!  See query plan
http://explain.depesz.com/s/FVT

Setting enable_seqscan to off makes query fast again: See query plan
http://explain.depesz.com/s/eOr

Finally, setting random_page_cost to 1 helps also (default is 4):

movies=# set enable_seqscan to default;
movies=# set random_page_cost to 1.0;
=> Query is fast. See query plan http://explain.depesz.com/s/M5Ke

----

pgsql-performance by date:

From: Shaun Thomas
Date: 18 July 2013, 20:42:21
Subject: Re: PostgreSQL settings for running on an SSD drive

From: Stefan Keller
Date: 19 July 2013, 10:03:24
Subject: Re: How to properly index hstore tags column to faster search for keys

FTS performance issue - planner problem identified (but only partially resolved) - Mailing list pgsql-performance

Previous

Next