Thread: GiST index slower than seqscan

GiST index slower than seqscan

From
CG
Date:
I'm still trying to wrap my brain around this one. Please forgive me if this is
the proverbial "dead horse" that I'm beating.

In case you're unfamiliar with this particular horse, I'm using ltree to create
a full text index on some <= 50 char long fields for a lookup table. The idea
was to be able to tear through tons of data quickly finding case insensitive
substring matches.

Here's my index...

CREATE INDEX letter_search_vector_idx
  ON letter_search
  USING gist
  (search_vector);

I know that the index is bigger than the table, but shouldn't it be able to
quickly scan the few branches that matter?

I've tried to do a varchar-based substring lookup table, and the size for that
table+index is enormous compared to the ltree table + index (which is huge
anyway)

I'm thinking that I've created something or am using something incorrectly.

I need to be set straight. Please advise!

[snip]

Welcome to psql 8.1.3, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit
data=# analyze letter_search;
ANALYZE
data=# explain select * from letter_search where search_vector ~
charslquery('669344');
                                        QUERY PLAN

--------------------------------------------------------------------------------
-----------
 Bitmap Heap Scan on letter_search  (cost=59.14..4978.98 rows=1467 width=162)
   Recheck Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery)
   ->  Bitmap Index Scan on letter_search_vector_idx  (cost=0.00..59.14
rows=146
7 width=0)
         Index Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery)
(4 rows)

data=# explain analyze select * from letter_search where search_vector ~
charslquery('669344');
                                                                 QUERY PLAN

--------------------------------------------------------------------------------
-------------------------------------------------------------
 Bitmap Heap Scan on letter_search  (cost=59.14..4978.98 rows=1467 width=162)
(a
ctual time=63061.402..63072.362 rows=2 loops=1)
   Recheck Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery)
   ->  Bitmap Index Scan on letter_search_vector_idx  (cost=0.00..59.14
rows=146
7 width=0) (actual time=63058.094..63058.094 rows=2 loops=1)
         Index Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery)
 Total runtime: 63072.411 ms
(5 rows)

data=# set enable_bitmapscan=off;
SET
data=# explain select * from letter_search where search_vector ~ charslqu
ery('669344');
                                              QUERY PLAN

--------------------------------------------------------------------------------
----------------------
 Index Scan using letter_search_vector_idx on letter_search
(cost=0.00..5837.70
 rows=1467 width=162)
   Index Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery)
(2 rows)

data=# explain analyze select * from letter_search where search_vector ~
charslquery('669344');
                                                                       QUERY
PLA
N
--------------------------------------------------------------------------------
-------------------------------------------------------------------------
 Index Scan using letter_search_vector_idx on letter_search
(cost=0.00..5837.70
 rows=1467 width=162) (actual time=14582.619..162637.887 rows=2 loops=1)
   Index Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery)
 Total runtime: 162637.977 ms
(3 rows)

data=# set enable_indexscan=off;
SET
data=# explain select * from letter_search where search_vector ~ charslqu
ery('669344');
                              QUERY PLAN
----------------------------------------------------------------------
 Seq Scan on letter_search  (cost=0.00..55232.18 rows=1467 width=162)
   Filter: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery)
(2 rows)

data=# explain analyze select * from letter_search where search_vector ~
charslquery('669344');
                                                      QUERY PLAN

--------------------------------------------------------------------------------
--------------------------------------
 Seq Scan on letter_search  (cost=0.00..55232.18 rows=1467 width=162) (actual
ti
me=4725.525..9428.087 rows=2 loops=1)
   Filter: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery)
 Total runtime: 9428.118 ms
(3 rows)

[/snip]


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: GiST index slower than seqscan

From
Teodor Sigaev
Date:
> In case you're unfamiliar with this particular horse, I'm using ltree to create
> a full text index on some <= 50 char long fields for a lookup table. The idea
> was to be able to tear through tons of data quickly finding case insensitive
> substring matches.
>

Why it is a ltree, not a tsearch?


>          Index Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery)

That's the problem. Queries which begin with '*' will be slow enough...



Try to reduce SIGLENINT in tsearch2/gistidx.h up to 8 (do not forget reindex !!)
and try it....

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Re: GiST index slower than seqscan

From
CG
Date:

--- Teodor Sigaev <teodor@sigaev.ru> wrote:

> > In case you're unfamiliar with this particular horse, I'm using ltree to
> create
> > a full text index on some <= 50 char long fields for a lookup table. The
> idea
> > was to be able to tear through tons of data quickly finding case
> insensitive
> > substring matches.
> >
>
> Why it is a ltree, not a tsearch?

When I said full text, I meant substring. Please correct me if I am wrong, but
tsearch would be useful for finding words in a paragraph, not characters in a
word (or small group of words) ... If I had fields 'Hello World!', 'Low Tide',
and 'Following Day' they would all be hits for a search on 'low' ...

>
>
> >          Index Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery)
>
> That's the problem. Queries which begin with '*' will be slow enough...
>

Indeed. Substring searches are quite costly... I was hoping that the
hiearchical nature of ltree would allow me to be able to sift quickly through
the list since every alpha or numeric character would be a branch on the tree.

> Try to reduce SIGLENINT in tsearch2/gistidx.h up to 8 (do not forget reindex
> !!)
> and try it....

I bet you meant ltree/ltree.h ... I'll give that a try and see what happens!
Thank you!

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: GiST index slower than seqscan

From
Teodor Sigaev
Date:

> When I said full text, I meant substring. Please correct me if I am wrong, but
> tsearch would be useful for finding words in a paragraph, not characters in a
> word (or small group of words) ... If I had fields 'Hello World!', 'Low Tide',
> and 'Following Day' they would all be hits for a search on 'low' ...
>

Ok, I see

> I bet you meant ltree/ltree.h ... I'll give that a try and see what happens!
> Thank you!

Ltree index structure is mixed by BTree and signature tree. You don't need Btree
part as I understand...

I don't know efficient index structure to support queries you want... May be
Oleg knows...


--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/