Thread: GiST index slower than seqscan
I'm still trying to wrap my brain around this one. Please forgive me if this is the proverbial "dead horse" that I'm beating. In case you're unfamiliar with this particular horse, I'm using ltree to create a full text index on some <= 50 char long fields for a lookup table. The idea was to be able to tear through tons of data quickly finding case insensitive substring matches. Here's my index... CREATE INDEX letter_search_vector_idx ON letter_search USING gist (search_vector); I know that the index is bigger than the table, but shouldn't it be able to quickly scan the few branches that matter? I've tried to do a varchar-based substring lookup table, and the size for that table+index is enormous compared to the ltree table + index (which is huge anyway) I'm thinking that I've created something or am using something incorrectly. I need to be set straight. Please advise! [snip] Welcome to psql 8.1.3, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g or terminate with semicolon to execute query \q to quit data=# analyze letter_search; ANALYZE data=# explain select * from letter_search where search_vector ~ charslquery('669344'); QUERY PLAN -------------------------------------------------------------------------------- ----------- Bitmap Heap Scan on letter_search (cost=59.14..4978.98 rows=1467 width=162) Recheck Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery) -> Bitmap Index Scan on letter_search_vector_idx (cost=0.00..59.14 rows=146 7 width=0) Index Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery) (4 rows) data=# explain analyze select * from letter_search where search_vector ~ charslquery('669344'); QUERY PLAN -------------------------------------------------------------------------------- ------------------------------------------------------------- Bitmap Heap Scan on letter_search (cost=59.14..4978.98 rows=1467 width=162) (a ctual time=63061.402..63072.362 rows=2 loops=1) Recheck Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery) -> Bitmap Index Scan on letter_search_vector_idx (cost=0.00..59.14 rows=146 7 width=0) (actual time=63058.094..63058.094 rows=2 loops=1) Index Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery) Total runtime: 63072.411 ms (5 rows) data=# set enable_bitmapscan=off; SET data=# explain select * from letter_search where search_vector ~ charslqu ery('669344'); QUERY PLAN -------------------------------------------------------------------------------- ---------------------- Index Scan using letter_search_vector_idx on letter_search (cost=0.00..5837.70 rows=1467 width=162) Index Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery) (2 rows) data=# explain analyze select * from letter_search where search_vector ~ charslquery('669344'); QUERY PLA N -------------------------------------------------------------------------------- ------------------------------------------------------------------------- Index Scan using letter_search_vector_idx on letter_search (cost=0.00..5837.70 rows=1467 width=162) (actual time=14582.619..162637.887 rows=2 loops=1) Index Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery) Total runtime: 162637.977 ms (3 rows) data=# set enable_indexscan=off; SET data=# explain select * from letter_search where search_vector ~ charslqu ery('669344'); QUERY PLAN ---------------------------------------------------------------------- Seq Scan on letter_search (cost=0.00..55232.18 rows=1467 width=162) Filter: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery) (2 rows) data=# explain analyze select * from letter_search where search_vector ~ charslquery('669344'); QUERY PLAN -------------------------------------------------------------------------------- -------------------------------------- Seq Scan on letter_search (cost=0.00..55232.18 rows=1467 width=162) (actual ti me=4725.525..9428.087 rows=2 loops=1) Filter: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery) Total runtime: 9428.118 ms (3 rows) [/snip] __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
> In case you're unfamiliar with this particular horse, I'm using ltree to create > a full text index on some <= 50 char long fields for a lookup table. The idea > was to be able to tear through tons of data quickly finding case insensitive > substring matches. > Why it is a ltree, not a tsearch? > Index Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery) That's the problem. Queries which begin with '*' will be slow enough... Try to reduce SIGLENINT in tsearch2/gistidx.h up to 8 (do not forget reindex !!) and try it.... -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
--- Teodor Sigaev <teodor@sigaev.ru> wrote: > > In case you're unfamiliar with this particular horse, I'm using ltree to > create > > a full text index on some <= 50 char long fields for a lookup table. The > idea > > was to be able to tear through tons of data quickly finding case > insensitive > > substring matches. > > > > Why it is a ltree, not a tsearch? When I said full text, I meant substring. Please correct me if I am wrong, but tsearch would be useful for finding words in a paragraph, not characters in a word (or small group of words) ... If I had fields 'Hello World!', 'Low Tide', and 'Following Day' they would all be hits for a search on 'low' ... > > > > Index Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery) > > That's the problem. Queries which begin with '*' will be slow enough... > Indeed. Substring searches are quite costly... I was hoping that the hiearchical nature of ltree would allow me to be able to sift quickly through the list since every alpha or numeric character would be a branch on the tree. > Try to reduce SIGLENINT in tsearch2/gistidx.h up to 8 (do not forget reindex > !!) > and try it.... I bet you meant ltree/ltree.h ... I'll give that a try and see what happens! Thank you! __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
> When I said full text, I meant substring. Please correct me if I am wrong, but > tsearch would be useful for finding words in a paragraph, not characters in a > word (or small group of words) ... If I had fields 'Hello World!', 'Low Tide', > and 'Following Day' they would all be hits for a search on 'low' ... > Ok, I see > I bet you meant ltree/ltree.h ... I'll give that a try and see what happens! > Thank you! Ltree index structure is mixed by BTree and signature tree. You don't need Btree part as I understand... I don't know efficient index structure to support queries you want... May be Oleg knows... -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/