planner row-estimates for tsvector seems horribly wrong - Mailing list pgsql-hackers

From Sushant Sinha
Subject planner row-estimates for tsvector seems horribly wrong
Date
Msg-id 1287924261.4758.12.camel@yoffice
Whole thread Raw
Responses Re: planner row-estimates for tsvector seems horribly wrong
Re: planner row-estimates for tsvector seems horribly wrong
List pgsql-hackers
I am using gin index on a tsvector and doing basic search. I see the
row-estimate of the planner to be horribly wrong. It is returning
row-estimate as 4843 for all queries whether it matches zero rows, a
medium number of rows (88,000) or a large number of rows (726,000).

The table has roughly a million docs.

I see a similar problem reported here but thought it was fixed in 9.0
which I am running. 

http://archives.postgresql.org/pgsql-hackers/2010-05/msg01389.php

Here is the version info and detailed planner output for all the three
queries:


select version();

version                                                     
PostgreSQL 9.0.0 on x86_64-unknown-linux-gnu, compiled by GCC gcc
(Gentoo 4.3.4 p1.1, pie-10.1.5) 4.3.4, 64-bit


Case I: FOR A NON-MATCHING WORD
===============================

explain analyze select count(*) from  docmeta,
plainto_tsquery('english', 'dyfdfdf') as qdoc where  docvector @@ qdoc;
          QUERY
 
PLAN                                                             
Aggregate  (cost=20322.17..20322.18 rows=1 width=0) (actual
time=0.058..0.058 rows=1 loops=1)  ->  Nested Loop  (cost=5300.28..20310.06 rows=4843 width=0) (actual
time=0.055..0.055 rows=0 loops=1)        ->  Function Scan on qdoc  (cost=0.00..0.01 rows=1 width=32)
(actual time=0.005..0.005 rows=1 loops=1)        ->  Bitmap Heap Scan on docmeta  (cost=5300.28..20249.51
rows=4843 width=270) (actual time=0.046..0.046 rows=0 loops=1)              Recheck Cond: (docmeta.docvector @@
qdoc.qdoc)             ->  Bitmap Index Scan on doc_index  (cost=0.00..5299.07
 
rows=4843 width=0) (actual time=0.044..0.044 rows=0 loops=1)                    Index Cond: (docmeta.docvector @@
qdoc.qdoc)Totalruntime: 0.092 ms
 

CASE II: FOR A MEDIUM-MATCHING WORD
===================================explain analyze select count(*) from  docmeta,
plainto_tsquery('english', 'quit') as qdoc where  docvector @@ qdoc;
           QUERY
 
PLAN                                                                 
Aggregate  (cost=20322.17..20322.18 rows=1 width=0) (actual
time=1222.856..1222.857 rows=1 loops=1)  ->  Nested Loop  (cost=5300.28..20310.06 rows=4843 width=0) (actual
time=639.275..1212.460 rows=88545 loops=1)        ->  Function Scan on qdoc  (cost=0.00..0.01 rows=1 width=32)
(actual time=0.006..0.007 rows=1 loops=1)        ->  Bitmap Heap Scan on docmeta  (cost=5300.28..20249.51
rows=4843 width=270) (actual time=639.264..1196.542 rows=88545 loops=1)              Recheck Cond: (docmeta.docvector
@@qdoc.qdoc)              ->  Bitmap Index Scan on doc_index  (cost=0.00..5299.07
 
rows=4843 width=0) (actual time=621.877..621.877 rows=88545 loops=1)                    Index Cond: (docmeta.docvector
@@qdoc.qdoc)Total runtime: 1222.907 ms
 


Case II: FOR A HIGH-MATCHING WORD
=================================

explain analyze select count(*) from  docmeta,
plainto_tsquery('english', 'j') as qdoc where  docvector @@ qdoc;
        QUERY
 
PLAN                                                                  
Aggregate  (cost=20322.17..20322.18 rows=1 width=0) (actual
time=742.857..742.858 rows=1 loops=1)  ->  Nested Loop  (cost=5300.28..20310.06 rows=4843 width=0) (actual
time=126.804..660.895 rows=726985 loops=1)        ->  Function Scan on qdoc  (cost=0.00..0.01 rows=1 width=32)
(actual time=0.004..0.006 rows=1 loops=1)        ->  Bitmap Heap Scan on docmeta  (cost=5300.28..20249.51
rows=4843 width=270) (actual time=126.795..530.422 rows=726985 loops=1)              Recheck Cond: (docmeta.docvector
@@qdoc.qdoc)              ->  Bitmap Index Scan on doc_index  (cost=0.00..5299.07
 
rows=4843 width=0) (actual time=113.742..113.742 rows=726985 loops=1)                    Index Cond: (docmeta.docvector
@@qdoc.qdoc)Total runtime: 742.906 ms
 

Thanks,
Sushant.



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: knngist - 0.8
Next
From: Jan Urbański
Date:
Subject: Re: planner row-estimates for tsvector seems horribly wrong