Thread: PostgreSQL and explain

PostgreSQL and explain

From
"Johnson, Shaunn"
Date:

Howdy:

Running Postgres 7.1.3 on RedHat 7.2 kernel 2.4.7 rel. 10.

I have a question regarding explain and the cost definition.

I have a table I have just loaded:

[snip]

bcn=> select count (*) from db2_class01 ;
 count
--------
 217490
(1 row)

[/snip]

Then, I run something like this just for kicks - I'm
trying to figure out performance measures ...

[script]

explain
select
c_claim_num,
c_diag_1,
c_mbr_num,
c_mbr_sex
from
db2_class01
where
c_claim_num like '%307A0%'
and
c_contract_num like '%67853%'
and
d_mbr_birth_dt between '1950-01-01' and '1965-06-30'
;
[/script]

The result is this:

[results]

psql:./explain1.sql:15: NOTICE:  QUERY PLAN:

Seq Scan on db2_class01  (cost=0.00..30.00 rows=1 width=48)

EXPLAIN

[/results]

NOW - I run vacuum on the same table:

[vacuum]

bcn=> vacuum verbose db2_class01;
NOTICE:  --Relation db2_class01--
NOTICE:  Pages 3021: Changed 0, reaped 0, Empty 0, New 0; Tup 217490: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 108, MaxLen 108; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.16s/0.04u sec.

VACUUM

[/vacuum]

When I re-run the example script show above, the output is different (which it should be)
but I don't understand what exactly what IS cost and how do I figure out if the
cost of doing the query is / should be greater or more efficient.

[after vacuum result]

psql:./explain1.sql:15: NOTICE:  QUERY PLAN:

Seq Scan on db2_class01  (cost=0.00..7370.80 rows=1 width=48)

EXPLAIN

[/after vacuum result]

I mean: the difference for cost = 0.00..30.00 and cost = 0.00..7370.80 ...
(forgive my newbie-ness) but this looks like an increase ... why is that?

Thanks!

-X

Re: PostgreSQL and explain

From
Tom Lane
Date:
"Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
> This message is in MIME format. Since your mail reader does not understand
> this format, some or all of this message may not be legible.

Would it be possible to turn off the MIME-ification of your email?

> I mean: the difference for cost = 0.00..30.00 and cost = 0.00..7370.80 ...
> (forgive my newbie-ness) but this looks like an increase ... why is that?

The cost=30 number is completely bogus since it is based on no
statistical data whatever; until you've done a VACUUM or ANALYZE,
the planner is flying blind and has to work with some default
assumptions about table size.  (Which are 10 pages and 1000 tuples,
IIRC.)

After VACUUM the numbers are at least somewhat meaningful, but you
have to remember that (a) they're on an arbitrary scale of 1 disk
page fetch = 1 cost unit, and (b) the planner's opinion is frequently
very far from reality.  I'd never trust EXPLAIN's output as a guide
to what's really cheap or not; use real timings.  (EXPLAIN ANALYZE
can be helpful, since it accumulates real timings.)

There's more info about EXPLAIN in the "performance tips" chapter of the
User's Guide.

            regards, tom lane