Thread: PostgreSQL and explain
Howdy:
Running Postgres 7.1.3 on RedHat 7.2 kernel 2.4.7 rel. 10.
I have a question regarding explain and the cost definition.
I have a table I have just loaded:
[snip]
bcn=> select count (*) from db2_class01 ;
count
--------
217490
(1 row)
[/snip]
Then, I run something like this just for kicks - I'm
trying to figure out performance measures ...
[script]
explain
select
c_claim_num,
c_diag_1,
c_mbr_num,
c_mbr_sex
from
db2_class01
where
c_claim_num like '%307A0%'
and
c_contract_num like '%67853%'
and
d_mbr_birth_dt between '1950-01-01' and '1965-06-30'
;
[/script]
The result is this:
[results]
psql:./explain1.sql:15: NOTICE: QUERY PLAN:
Seq Scan on db2_class01 (cost=0.00..30.00 rows=1 width=48)
EXPLAIN
[/results]
NOW - I run vacuum on the same table:
[vacuum]
bcn=> vacuum verbose db2_class01;
NOTICE: --Relation db2_class01--
NOTICE: Pages 3021: Changed 0, reaped 0, Empty 0, New 0; Tup 217490: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 108, MaxLen 108; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.16s/0.04u sec.
VACUUM
[/vacuum]
When I re-run the example script show above, the output is different (which it should be)
but I don't understand what exactly what IS cost and how do I figure out if the
cost of doing the query is / should be greater or more efficient.
[after vacuum result]
psql:./explain1.sql:15: NOTICE: QUERY PLAN:
Seq Scan on db2_class01 (cost=0.00..7370.80 rows=1 width=48)
EXPLAIN
[/after vacuum result]
I mean: the difference for cost = 0.00..30.00 and cost = 0.00..7370.80 ...
(forgive my newbie-ness) but this looks like an increase ... why is that?
Thanks!
-X
"Johnson, Shaunn" <SJohnson6@bcbsm.com> writes: > This message is in MIME format. Since your mail reader does not understand > this format, some or all of this message may not be legible. Would it be possible to turn off the MIME-ification of your email? > I mean: the difference for cost = 0.00..30.00 and cost = 0.00..7370.80 ... > (forgive my newbie-ness) but this looks like an increase ... why is that? The cost=30 number is completely bogus since it is based on no statistical data whatever; until you've done a VACUUM or ANALYZE, the planner is flying blind and has to work with some default assumptions about table size. (Which are 10 pages and 1000 tuples, IIRC.) After VACUUM the numbers are at least somewhat meaningful, but you have to remember that (a) they're on an arbitrary scale of 1 disk page fetch = 1 cost unit, and (b) the planner's opinion is frequently very far from reality. I'd never trust EXPLAIN's output as a guide to what's really cheap or not; use real timings. (EXPLAIN ANALYZE can be helpful, since it accumulates real timings.) There's more info about EXPLAIN in the "performance tips" chapter of the User's Guide. regards, tom lane