Thread: Completely wrong row estimates

Completely wrong row estimates

From
Björn Lindqvist
Date:
Subject: Completely wrong row estimates

Hello everybody,

Here is the EXPLAIN ANALYZE output for a simple query in my database
running on postgres 8.3.9:

EXPLAIN ANALYZE
SELECT *  FROM word w JOIN video_words vw ON w.id = vw.word_id
WHERE w.word = 'tagtext';

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=18.89..2711.16 rows=95 width=24) (actual
time=19.266..131.255 rows=43374 loops=1)
   ->  Index Scan using word_word_key on word w  (cost=0.00..8.28
rows=1 width=12) (actual time=0.029..0.034 rows=1 loops=1)
         Index Cond: ((word)::text = 'tagtext'::text)
   ->  Bitmap Heap Scan on video_words vw  (cost=18.89..2693.31
rows=766 width=12) (actual time=19.227..77.809 rows=43374 loops=1)
         Recheck Cond: (vw.word_id = w.id)
         ->  Bitmap Index Scan on video_words_word_id_key
(cost=0.00..18.70 rows=766 width=0) (actual time=12.662..12.662
rows=43374 loops=1)
               Index Cond: (vw.word_id = w.id)
 Total runtime: 154.215 ms

Note how the planner estimates that there are 766 rows in the table
that matches the word 'tagtext'. In reality 43374 does. I've tried to
get postgres to refresh the statistics by running with
enable_statistics_target=100, running VACUUM, VACUUM FULL, VACUUM FULL
ANALYZE etc but nothing works. Postgres seem stuck with its bad
statistics and unwilling to change them. There are many other strings
that also matches tens of thousands of rows in the table which
postgres only thinks matches 766.

Is this a bug in postgres?


--
mvh Björn

Re: Completely wrong row estimates

From
Rob Wultsch
Date:
2010/4/4 Björn Lindqvist <bjourne@gmail.com>:
> Subject: Completely wrong row estimates
>
> Hello everybody,
>
> Here is the EXPLAIN ANALYZE output for a simple query in my database
> running on postgres 8.3.9:
>
> EXPLAIN ANALYZE
> SELECT *  FROM word w JOIN video_words vw ON w.id = vw.word_id
> WHERE w.word = 'tagtext';
>
> QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=18.89..2711.16 rows=95 width=24) (actual
> time=19.266..131.255 rows=43374 loops=1)
>   ->  Index Scan using word_word_key on word w  (cost=0.00..8.28
> rows=1 width=12) (actual time=0.029..0.034 rows=1 loops=1)
>         Index Cond: ((word)::text = 'tagtext'::text)
>   ->  Bitmap Heap Scan on video_words vw  (cost=18.89..2693.31
> rows=766 width=12) (actual time=19.227..77.809 rows=43374 loops=1)
>         Recheck Cond: (vw.word_id = w.id)
>         ->  Bitmap Index Scan on video_words_word_id_key
> (cost=0.00..18.70 rows=766 width=0) (actual time=12.662..12.662
> rows=43374 loops=1)
>               Index Cond: (vw.word_id = w.id)
>  Total runtime: 154.215 ms
>
> Note how the planner estimates that there are 766 rows in the table
> that matches the word 'tagtext'. In reality 43374 does. I've tried to
> get postgres to refresh the statistics by running with
> enable_statistics_target=100, running VACUUM, VACUUM FULL, VACUUM FULL
> ANALYZE etc but nothing works. Postgres seem stuck with its bad
> statistics and unwilling to change them. There are many other strings
> that also matches tens of thousands of rows in the table which
> postgres only thinks matches 766.
>
> Is this a bug in postgres?
>
>
> --
> mvh Björn
>

You probably want to run "analyze" or "vacuum analyze" to update
statistics. Do you have auto vacuum setup?


--
Rob Wultsch
wultsch@gmail.com

Re: Completely wrong row estimates

From
Nilesh Govindarajan
Date:
On 04/05/10 01:44, Björn Lindqvist wrote:
> Subject: Completely wrong row estimates
>
> Hello everybody,
>
> Here is the EXPLAIN ANALYZE output for a simple query in my database
> running on postgres 8.3.9:
>
> EXPLAIN ANALYZE
> SELECT *  FROM word w JOIN video_words vw ON w.id = vw.word_id
> WHERE w.word = 'tagtext';
>
> QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------------
>   Nested Loop  (cost=18.89..2711.16 rows=95 width=24) (actual
> time=19.266..131.255 rows=43374 loops=1)
>     ->   Index Scan using word_word_key on word w  (cost=0.00..8.28
> rows=1 width=12) (actual time=0.029..0.034 rows=1 loops=1)
>           Index Cond: ((word)::text = 'tagtext'::text)
>     ->   Bitmap Heap Scan on video_words vw  (cost=18.89..2693.31
> rows=766 width=12) (actual time=19.227..77.809 rows=43374 loops=1)
>           Recheck Cond: (vw.word_id = w.id)
>           ->   Bitmap Index Scan on video_words_word_id_key
> (cost=0.00..18.70 rows=766 width=0) (actual time=12.662..12.662
> rows=43374 loops=1)
>                 Index Cond: (vw.word_id = w.id)
>   Total runtime: 154.215 ms
>
> Note how the planner estimates that there are 766 rows in the table
> that matches the word 'tagtext'. In reality 43374 does. I've tried to
> get postgres to refresh the statistics by running with
> enable_statistics_target=100, running VACUUM, VACUUM FULL, VACUUM FULL
> ANALYZE etc but nothing works. Postgres seem stuck with its bad
> statistics and unwilling to change them. There are many other strings
> that also matches tens of thousands of rows in the table which
> postgres only thinks matches 766.
>
> Is this a bug in postgres?
>
>
> --
> mvh Björn
>

It seems like regular maintainance tasks haven't been carried out.
VACUUM, etc.

--
Nilesh Govindarajan
Site & Server Administrator
www.itech7.com
मेरा भारत महान !
मम भारत: महत्तम भवतु !

Re: Completely wrong row estimates

From
Magnus Hagander
Date:
2010/4/4 Björn Lindqvist <bjourne@gmail.com>:
> Subject: Completely wrong row estimates
>
> Hello everybody,
>
> Here is the EXPLAIN ANALYZE output for a simple query in my database
> running on postgres 8.3.9:
>
> EXPLAIN ANALYZE
> SELECT *  FROM word w JOIN video_words vw ON w.id = vw.word_id
> WHERE w.word = 'tagtext';
>
> QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=18.89..2711.16 rows=95 width=24) (actual
> time=19.266..131.255 rows=43374 loops=1)
>   ->  Index Scan using word_word_key on word w  (cost=0.00..8.28
> rows=1 width=12) (actual time=0.029..0.034 rows=1 loops=1)
>         Index Cond: ((word)::text = 'tagtext'::text)
>   ->  Bitmap Heap Scan on video_words vw  (cost=18.89..2693.31
> rows=766 width=12) (actual time=19.227..77.809 rows=43374 loops=1)
>         Recheck Cond: (vw.word_id = w.id)
>         ->  Bitmap Index Scan on video_words_word_id_key
> (cost=0.00..18.70 rows=766 width=0) (actual time=12.662..12.662
> rows=43374 loops=1)
>               Index Cond: (vw.word_id = w.id)
>  Total runtime: 154.215 ms
>
> Note how the planner estimates that there are 766 rows in the table
> that matches the word 'tagtext'. In reality 43374 does. I've tried to
> get postgres to refresh the statistics by running with
> enable_statistics_target=100, running VACUUM, VACUUM FULL, VACUUM FULL
> ANALYZE etc but nothing works. Postgres seem stuck with its bad
> statistics and unwilling to change them. There are many other strings
> that also matches tens of thousands of rows in the table which
> postgres only thinks matches 766.

I assume you mean default_statistics_target, not enable_statistics_target.

You should try setting it higher - but obviously just for these
columns. Use something like

ALTER TABLE words ALTER COLUMN word SET STATISTICS 1000

Then you need to run ANALYZE on it. Not vacuum ,certainly not vacuum
full, just analyze.


Oh, and if what you're doing is actually full text search, which is
what it looks like, you should really look at using the native full
text indexing support rather than just stuffing your words in a table.
You'll get better and much faster results.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Completely wrong row estimates

From
Björn Lindqvist
Date:
Den 5 april 2010 11.57 skrev Magnus Hagander <magnus@hagander.net>:
>> Note how the planner estimates that there are 766 rows in the table
>> that matches the word 'tagtext'. In reality 43374 does. I've tried to
>> get postgres to refresh the statistics by running with
>> enable_statistics_target=100, running VACUUM, VACUUM FULL, VACUUM FULL
>> ANALYZE etc but nothing works. Postgres seem stuck with its bad
>> statistics and unwilling to change them. There are many other strings
>> that also matches tens of thousands of rows in the table which
>> postgres only thinks matches 766.
>
> I assume you mean default_statistics_target, not enable_statistics_target.

Yes, sorry.

> You should try setting it higher - but obviously just for these
> columns. Use something like
>
> ALTER TABLE words ALTER COLUMN word SET STATISTICS 1000
>
> Then you need to run ANALYZE on it. Not vacuum ,certainly not vacuum
> full, just analyze.

Done that and it doesn't help. The estimates are always off for the
query of the type I specified.

> Oh, and if what you're doing is actually full text search, which is
> what it looks like, you should really look at using the native full
> text indexing support rather than just stuffing your words in a table.
> You'll get better and much faster results.

It is more "full tag search" because I'm not using any word stemming,
phrase matching or OR:ing query terms. It was, when I measured it,
significantly faster than using the native text searching feature.



--
mvh Björn

Re: Completely wrong row estimates

From
Scott Mead
Date:

2010/4/6 Björn Lindqvist <bjourne@gmail.com>
Den 5 april 2010 11.57 skrev Magnus Hagander <magnus@hagander.net>:
>> Note how the planner estimates that there are 766 rows in the table
>> that matches the word 'tagtext'. In reality 43374 does. I've tried to
>> get postgres to refresh the statistics by running with
>> enable_statistics_target=100, running VACUUM, VACUUM FULL, VACUUM FULL
>> ANALYZE etc but nothing works. Postgres seem stuck with its bad
>> statistics and unwilling to change them. There are many other strings
>> that also matches tens of thousands of rows in the table which
>> postgres only thinks matches 766.

Have you tried running :

 'EXPLAIN ANALYZE <your query>'

?

  This will show you the estimates and the actuals (for each operation) side by side.

--Scott
 
>
> I assume you mean default_statistics_target, not enable_statistics_target.

Yes, sorry.

> You should try setting it higher - but obviously just for these
> columns. Use something like
>
> ALTER TABLE words ALTER COLUMN word SET STATISTICS 1000
>
> Then you need to run ANALYZE on it. Not vacuum ,certainly not vacuum
> full, just analyze.

Done that and it doesn't help. The estimates are always off for the
query of the type I specified.

> Oh, and if what you're doing is actually full text search, which is
> what it looks like, you should really look at using the native full
> text indexing support rather than just stuffing your words in a table.
> You'll get better and much faster results.

It is more "full tag search" because I'm not using any word stemming,
phrase matching or OR:ing query terms. It was, when I measured it,
significantly faster than using the native text searching feature.



--
mvh Björn

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Completely wrong row estimates

From
Björn Lindqvist
Date:
Den 6 april 2010 14.22 skrev Scott Mead <scott.lists@enterprisedb.com>:
>
> 2010/4/6 Björn Lindqvist <bjourne@gmail.com>
>>
>> Den 5 april 2010 11.57 skrev Magnus Hagander <magnus@hagander.net>:
>> >> Note how the planner estimates that there are 766 rows in the table
>> >> that matches the word 'tagtext'. In reality 43374 does. I've tried to
>> >> get postgres to refresh the statistics by running with
>> >> enable_statistics_target=100, running VACUUM, VACUUM FULL, VACUUM FULL
>> >> ANALYZE etc but nothing works. Postgres seem stuck with its bad
>> >> statistics and unwilling to change them. There are many other strings
>> >> that also matches tens of thousands of rows in the table which
>> >> postgres only thinks matches 766.
>
> Have you tried running :
>  'EXPLAIN ANALYZE <your query>'
> ?
>   This will show you the estimates and the actuals (for each operation) side
> by side.

Yes, see my first message where I post the EXPLAIN ANALYZE output for the query.


--
mvh Björn