Queryplan within FTS/GIN index -search. - Mailing list pgsql-performance

Hi

My indexing base is now up to 7.5m documents, I have raise statistics
target to 1000 for the tsvector column in order to make the
query-planner choose more correctly. That works excellent.

Table structure is still:
ftstest=# \d ftsbody
                               Table "public.ftsbody"
      Column      |   Type   |                      Modifiers

------------------+----------+------------------------------------------------------
 id               | integer  | not null default
nextval('ftsbody_id_seq'::regclass)
 body             | text     | not null default ''::text
 ftsbody_body_fts | tsvector |
Indexes:
    "ftsbody_body_md5" UNIQUE, btree (md5(body))
    "ftsbody_id_pri_idx" UNIQUE, btree (id)
    "ftsbody_tfs_idx" gin (ftsbody_body_fts)
Triggers:
    tsvectorupdate BEFORE INSERT OR UPDATE ON uniprot FOR EACH ROW
EXECUTE PROCEDURE tsvector_update_trigger('ftsbody_body_fts',
'pg_catalog.english', 'body')


I'm searching the gin-index for 1-5 terms, where all of them matches the
same document. TERM1 is unique by itself, TERM2 is a bit more common (52
rows), TERM3 more common, TERM4 close to all and TERM5 all records.

Just quering for a unique value and add in several values that match
everything makes the run-time go significantly up.

I somehow would expect the index-search to take advantage of the MCV's
informations in the statistics that sort of translate it into a search
and post-filtering (as PG's queryplanner usually does at the SQL-level).

                                                              QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=102.45..102.45 rows=1 width=751) (actual time=3.726..3.729
rows=1 loops=1)
   ->  Sort  (cost=102.45..102.45 rows=1 width=751) (actual
time=3.722..3.723 rows=1 loops=1)
         Sort Key: id
         Sort Method:  quicksort  Memory: 27kB
         ->  Bitmap Heap Scan on ftsbody  (cost=100.42..102.44 rows=1
width=751) (actual time=3.700..3.702 rows=1 loops=1)
               Recheck Cond: (ftsbody_body_fts @@ to_tsquery('TERM1 &
TERM2'::text))
               ->  Bitmap Index Scan on ftsbody_tfs_idx
(cost=0.00..100.42 rows=1 width=0) (actual time=3.683..3.683 rows=1 loops=1)
                     Index Cond: (ftsbody_body_fts @@ to_tsquery('TERM1
& TERM2'::text))
 Total runtime: 3.790 ms
(9 rows)

                                                                QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=102.45..102.45 rows=1 width=751) (actual
time=850.017..850.020 rows=1 loops=1)
   ->  Sort  (cost=102.45..102.45 rows=1 width=751) (actual
time=850.013..850.015 rows=1 loops=1)
         Sort Key: id
         Sort Method:  quicksort  Memory: 27kB
         ->  Bitmap Heap Scan on ftsbody  (cost=100.42..102.44 rows=1
width=751) (actual time=849.991..849.993 rows=1 loops=1)
               Recheck Cond: (ftsbody_body_fts @@ to_tsquery('TERM1 &
TERM2 & TERM3'::text))
               ->  Bitmap Index Scan on ftsbody_tfs_idx
(cost=0.00..100.42 rows=1 width=0) (actual time=849.970..849.970 rows=1
loops=1)
                     Index Cond: (ftsbody_body_fts @@ to_tsquery('TERM1
& TERM2 & TERM3'::text))
 Total runtime: 850.084 ms
(9 rows)

                                                                 QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=102.45..102.45 rows=1 width=751) (actual
time=1152.065..1152.068 rows=1 loops=1)
   ->  Sort  (cost=102.45..102.45 rows=1 width=751) (actual
time=1152.061..1152.062 rows=1 loops=1)
         Sort Key: id
         Sort Method:  quicksort  Memory: 27kB
         ->  Bitmap Heap Scan on ftsbody  (cost=100.42..102.44 rows=1
width=751) (actual time=1152.039..1152.041 rows=1 loops=1)
               Recheck Cond: (ftsbody_body_fts @@ to_tsquery('TERM1 &
TERM2 & TERM3 & TERM4'::text))
               ->  Bitmap Index Scan on ftsbody_tfs_idx
(cost=0.00..100.42 rows=1 width=0) (actual time=1152.020..1152.020
rows=1 loops=1)
                     Index Cond: (ftsbody_body_fts @@ to_tsquery('TERM1
& TERM2 & TERM3 & TERM4'::text))
 Total runtime: 1152.129 ms
(9 rows)

                                                                 QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=102.45..102.45 rows=1 width=751) (actual
time=1509.043..1509.046 rows=1 loops=1)
   ->  Sort  (cost=102.45..102.45 rows=1 width=751) (actual
time=1509.040..1509.040 rows=1 loops=1)
         Sort Key: id
         Sort Method:  quicksort  Memory: 27kB
         ->  Bitmap Heap Scan on ftsbody  (cost=100.42..102.44 rows=1
width=751) (actual time=1509.018..1509.020 rows=1 loops=1)
               Recheck Cond: (ftsbody_body_fts @@ to_tsquery('TERM1 &
TERM2 & TERM3 & TERM4 & TERM5'::text))
               ->  Bitmap Index Scan on ftsbody_tfs_idx
(cost=0.00..100.42 rows=1 width=0) (actual time=1508.998..1508.998
rows=1 loops=1)
                     Index Cond: (ftsbody_body_fts @@ to_tsquery('TERM1
& TERM2 & TERM3 & TERM4 & TERM5'::text))
 Total runtime: 1509.109 ms
(9 rows)

Can (perhaps more readable) be found at http://krogh.cc/~jesper/test.out

Can this be optimized? (I cannot really prevent users from typing stuff
in that are common).

--
Jesper

pgsql-performance by date:

Previous
From: Robert Haas
Date:
Subject: Re: optimizing query with multiple aggregates
Next
From: "Kevin Grittner"
Date:
Subject: Re: Queryplan within FTS/GIN index -search.