Thread: index scan cost

index scan cost

From
Jeff Frost
Date:
I have two postgresql servers.  One runs 8.3.1, the other 8.3.3.  On the 8.3.1
machine, the index scans are being planned extremely low cost:

explain ANALYZE select * from email_entity where email_thread = 375629157;
                                                                QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------
  Index Scan using ix_email_entity_thread on email_entity  (cost=0.00..4.59
rows=1 width=1031) (actual time=0.095..0.120 rows=4 loops=1)
    Index Cond: (email_thread = 375629157)
  Total runtime: 0.207 ms
(3 rows)


But on the 8.3.3 machine, the index scans are being planned much higher cost:

  explain ANALYZE select * from email_entity where email_thread = 375629157;
                                                                  QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------
  Index Scan using ix_email_entity_thread on email_entity  (cost=0.00..2218.61
rows=1151 width=931) (actual time=0.094..0.111 rows=4 loops=1)
    Index Cond: (email_thread = 375629157)
  Total runtime: 0.253 ms
(3 rows)



diffing the 'show all;' output reveals the following (left side is the low
cost plan, right side is the high cost plan server):

57c57
<  effective_cache_size            | 31800MB                               |
Sets the planner's assumption about the size of the disk cache.
---
>  effective_cache_size            | 15300MB                               |
Sets the planner's assumption about the size of the disk cache.
72c72
<  fsync                           | on                                    |
Forces synchronization of updates to disk.
---
>  fsync                           | off                                   |
Forces synchronization of updates to disk.
110c110
<  log_line_prefix                 |                                       |
Controls information prefixed to each log line.
---
>  log_line_prefix                 | user=%u,db=%d                         |
Controls information prefixed to each log line.
128,129c128,129
<  max_fsm_pages                   | 2000000                               |
Sets the maximum number of disk pages for which free space is tracked.
<  max_fsm_relations               | 1000                                  |
Sets the maximum number of tables and indexes for which free space is tracked.
---
>  max_fsm_pages                   | 4000000                               |
Sets the maximum number of disk pages for which free space is tracked.
>  max_fsm_relations               | 5000                                  |
Sets the maximum number of tables and indexes for which free space is tracked.
145,146c145,146
<  server_version                  | 8.3.1                                 |
Shows the server version.
<  server_version_num              | 80301                                 |
Shows the server version as an integer.
---
>  server_version                  | 8.3.3                                 |
Shows the server version.
>  server_version_num              | 80303                                 |
Shows the server version as an integer.
149c149
<  shared_preload_libraries        |                                       |
Lists shared libraries to preload into server.
---
>  shared_preload_libraries        | $libdir/plugins/plugin_debugger.so    |
Lists shared libraries to preload into server.

Disabling the debugger had no effect on the slow server.

I then thought perhaps this was a difference between 8.3.1 and 8.3.3, so I
loaded the DB on a separate test machine and tried the query with both 8.3.1
and 8.3.3 on the same server:

engage=# show server_version;
  server_version
----------------
  8.3.1
(1 row)

  explain ANALYZE select * from email_entity where email_thread = 375629157;
                                                                   QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
  Index Scan using ix_email_entity_thread on email_entity  (cost=0.00..1319.44
rows=1183 width=1046) (actual time=0.017..0.022 rows=4 loops=1)
    Index Cond: (email_thread = 375629157)
  Total runtime: 0.054 ms
(3 rows)


engage=# show server_version;
  server_version
----------------
  8.3.3
(1 row)

  explain ANALYZE select * from email_entity where email_thread = 375629157;
                                                                   QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
  Index Scan using ix_email_entity_thread on email_entity  (cost=0.00..1319.44
rows=1183 width=1046) (actual time=0.018..0.022 rows=4 loops=1)
    Index Cond: (email_thread = 375629157)
  Total runtime: 0.055 ms
(3 rows)

As you might guess, the reason I started looking at this is that the high cost
changes the plan of a more complex query for the worse.

Any idea what might be influencing the plan on the other server?  I tried
increasing the statistics target on the email_thread column and that helped to
a certain extent.  Setting the statistics target to 1000 gets me a good enough
plan to help the complex query in question:

                                                                QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------
  Index Scan using ix_email_entity_thread on email_entity  (cost=0.00..26.36
rows=12 width=913) (actual time=0.028..0.040 rows=4 loops=1)
    Index Cond: (email_thread = 375629157)
  Total runtime: 0.092 ms
(3 rows)

But 26.36 is still not 4.59 like the other server estimates AND the statistics
target on that column is just the default 10 on the server with the 4.59 cost
estimate.

--
Jeff Frost, Owner     <jeff@frostconsultingllc.com>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

Re: index scan cost

From
Jeff Frost
Date:
On Fri, 18 Jul 2008, Dennis Brakhane wrote:

> The "fast" server makes a much more accurate estimation of the number
> of rows to expect (4 rows are returning, 1 was estimated). The "slow"
> server estimates 1151 rows. Try running ANALYZE on the slow one

You're quite right.  I probably didn't mention that the slow one has been
analyzed several times.  In fact, every time adjusted the statistics target
for that column I analyzed, thus the eventually better, but still inaccurate
estimates toward the bottom of the post.

--
Jeff Frost, Owner     <jeff@frostconsultingllc.com>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

Re: index scan cost

From
"Dennis Brakhane"
Date:
The "fast" server makes a much more accurate estimation of the number
of rows to expect (4 rows are returning, 1 was estimated). The "slow"
server estimates 1151 rows. Try running ANALYZE on the slow one

Re: index scan cost

From
Tom Lane
Date:
Jeff Frost <jeff@frostconsultingllc.com> writes:
> I have two postgresql servers.  One runs 8.3.1, the other 8.3.3.  On the 8.3.1
> machine, the index scans are being planned extremely low cost:

>   Index Scan using ix_email_entity_thread on email_entity  (cost=0.00..4.59
> rows=1 width=1031) (actual time=0.095..0.120 rows=4 loops=1)
>     Index Cond: (email_thread = 375629157)

>   Index Scan using ix_email_entity_thread on email_entity  (cost=0.00..2218.61
> rows=1151 width=931) (actual time=0.094..0.111 rows=4 loops=1)
>     Index Cond: (email_thread = 375629157)

This isn't a "cost" problem, this is a "stats" problem.  Why does the
second server think 1151 rows will be returned?  Try comparing the
pg_stats entries for the email_thread column on both servers ... seems
like they must be significantly different.

            regards, tom lane

Mailing list hacked by spammer?

From
Craig James
Date:
I've never gotten a single spam from the Postgres mailing list ... until today.  A Chinese company selling consumer
productsis using this list.  I have my filters set to automatically trust this list because it has been so reliable
untilnow.  It would be really, really unfortunate if this list fell to the spammers. 

Craig

Re: Mailing list hacked by spammer?

From
Steve Atkins
Date:
On Jul 18, 2008, at 4:02 PM, Craig James wrote:

> I've never gotten a single spam from the Postgres mailing list ...
> until today.  A Chinese company selling consumer products is using
> this list.  I have my filters set to automatically trust this list
> because it has been so reliable until now.  It would be really,
> really unfortunate if this list fell to the spammers.

It's not been "hacked by spammers".

It's a valid From address, probably coincidentally. Nothing worth
discussing. *Definitely* not something worth discussing on the list.

Cheers,
   Steve


Re: Mailing list hacked by spammer?

From
Alvaro Herrera
Date:
Steve Atkins wrote:
>
> On Jul 18, 2008, at 4:02 PM, Craig James wrote:
>
>> I've never gotten a single spam from the Postgres mailing list ...
>> until today.  A Chinese company selling consumer products is using
>> this list.  I have my filters set to automatically trust this list
>> because it has been so reliable until now.  It would be really, really
>> unfortunate if this list fell to the spammers.
>
> It's not been "hacked by spammers".
>
> It's a valid From address, probably coincidentally. Nothing worth
> discussing. *Definitely* not something worth discussing on the list.

Keep in mind that messages from unsubscribed addresses are held up for
moderation.  A human moderator must then reject it or approve it, and
humans make mistakes sometimes.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: index scan cost

From
Jeff Frost
Date:
Tom Lane wrote:
Jeff Frost <jeff@frostconsultingllc.com> writes: 
I have two postgresql servers.  One runs 8.3.1, the other 8.3.3.  On the 8.3.1 
machine, the index scans are being planned extremely low cost:   
 
  Index Scan using ix_email_entity_thread on email_entity  (cost=0.00..4.59 
rows=1 width=1031) (actual time=0.095..0.120 rows=4 loops=1)   Index Cond: (email_thread = 375629157)   
 
  Index Scan using ix_email_entity_thread on email_entity  (cost=0.00..2218.61 
rows=1151 width=931) (actual time=0.094..0.111 rows=4 loops=1)   Index Cond: (email_thread = 375629157)   
This isn't a "cost" problem, this is a "stats" problem.  Why does the
second server think 1151 rows will be returned?  Try comparing the
pg_stats entries for the email_thread column on both servers ... seems
like they must be significantly different. 
Sorry it took me a while to close the loop on this.  So, the server that had the less desirable plan had actually been analyzed more recently by autovacuum.  When I went back to compare the stats on the faster server, autovacuum had analyzed it and the plan was now more similar.  Adjusting the stats target up for that column helped on both servers though it never did get back as close as before.

-- 
Jeff Frost, Owner 	<jeff@frostconsultingllc.com>
Frost Consulting, LLC 	http://www.frostconsultingllc.com/
Phone: 916-647-6411	FAX: 916-405-4032