combined indexes with Gist - planner issues? - Mailing list pgsql-hackers

From Hans-Juergen Schoenig -- PostgreSQL
Subject combined indexes with Gist - planner issues?
Date
Msg-id 4A9BBD39.2010500@cybertec.at
Whole thread Raw
Responses Re: combined indexes with Gist - planner issues?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
hello everybody,

we are seriously fighting with some planner issue which seems to be 
slightly obscure to us.
we have a table which is nicely indexed (several GB in size).
i am using btree_gist operator classes to use a combined index including 
an FTI expression along with a number:

db=# \d product.t_product                                      Table "product.t_product"       Column         |
Type     |                           
 
Modifiers                           
-----------------------+---------------+----------------------------------------------------------------id
     | bigint        | not null default 
 
nextval('product.t_product_id_seq'::regclass)shop_id               | integer       |art_number            | text
 |title                 | text          |description           | text          |display_price         | numeric(10,4)
|

Indexes:   "t_product_pkey" PRIMARY KEY, btree (id)   "idx_test" gist (display_price, to_tsvector('german'::regconfig,

(title || ' '::text) || description))
*    "idx_test2" gist (to_tsvector('german'::regconfig, (title || ' 
'::text) || description), display_price)*


what we basically expected here is that Postgres will scan the table 
using the index to give us the cheapest products containing the words we 
are looking for.
i am totally surprised to see that we have to fetch all products given 
the words, sort and then do the limit.
this totally kills performance because some words simply show up 
millions of times. this totally kills everything.

the plans look like this:

db=#  explain analyze SELECT art_number, title   FROM product.t_product   WHERE to_tsvector('german'::regconfig, (title
||' '::text) || 
 
description) @@ plainto_tsquery('harddisk')   ORDER BY display_price   LIMIT 10;
                         QUERY 
 
PLAN                                                                  

------------------------------------------------------------------------------------------------------------------------------------------------Limit
(cost=108340.08..108340.10 rows=10 width=54) (actual 
 
time=1328.900..1328.909 rows=10 loops=1)  ->  Sort  (cost=108340.08..108422.48 rows=32961 width=54) (actual 
time=1328.899..1328.905 rows=10 loops=1)        Sort Key: display_price        Sort Method:  top-N heapsort  Memory:
18kB       ->  Bitmap Heap Scan on t_product  (cost=2716.62..107627.80 
 
rows=32961 width=54) (actual time=1052.706..1328.772 rows=55 loops=1)              Recheck Cond:
(to_tsvector('german'::regconfig,((title 
 
|| ' '::text) || description)) @@ plainto_tsquery('harddisk'::text))              ->  Bitmap Index Scan on idx_test2
(cost=0.00..2708.38
 
rows=32961 width=0) (actual time=1052.576..1052.576 rows=55 loops=1)                    Index Cond:
(to_tsvector('german'::regconfig,
 
((title || ' '::text) || description)) @@ plainto_tsquery('harddisk'::text))Total runtime: 1328.942 ms
(9 rows)


runtime increases badly if words start to be more likely ...


db=#  explain analyze SELECT art_number, title       FROM product.t_product       WHERE
to_tsvector('german'::regconfig,(title || ' '::text) || 
 
description) @@ plainto_tsquery('spiel')       ORDER BY display_price       LIMIT 10;
                             QUERY 
 
PLAN                                                                 

----------------------------------------------------------------------------------------------------------------------------------------------Limit
(cost=108340.08..108340.10 rows=10 width=54) (actual 
 
time=33489.675..33489.682 rows=10 loops=1)  ->  Sort  (cost=108340.08..108422.48 rows=32961 width=54) (actual 
time=33489.675..33489.675 rows=10 loops=1)        Sort Key: display_price        Sort Method:  top-N heapsort  Memory:
18kB       ->  Bitmap Heap Scan on t_product  (cost=2716.62..107627.80 
 
rows=32961 width=54) (actual time=774.923..33408.522 rows=56047 loops=1)              Recheck Cond:
(to_tsvector('german'::regconfig,((title 
 
|| ' '::text) || description)) @@ plainto_tsquery('spiel'::text))              ->  Bitmap Index Scan on idx_test2
(cost=0.00..2708.38
 
rows=32961 width=0) (actual time=759.078..759.078 rows=56047 loops=1)                    Index Cond:
(to_tsvector('german'::regconfig,
 
((title || ' '::text) || description)) @@ plainto_tsquery('spiel'::text))Total runtime: 33489.906 ms
(9 rows)

i am wondering why postgres is not able to use a combined index here?
is this some obscure thing related to gist, a logical problem or a 
planner deficiency?

ideas are welcome.
   many thanks,
      hans



-- 
Cybertec Schoenig & Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Hot Standby, conflict cache
Next
From: Greg Sabino Mullane
Date:
Subject: Re: Add YAML option to explain