Thread: Understanding EXPLAIN

Understanding EXPLAIN

From
Robert Lichtenberger
Date:
I am trying to fully understand, how costs for queries are computed.
Taking the following example:

CREATE TABLE test (name varchar(250) primary key) ;
INSERT INTO test (name) VALUES(generate_series(1, 1000)::text) ;
ANALYZE test ;
EXPLAIN SELECT * FROM test WHERE name = '4' ;

I am getting the output:
Index Scan using test_pkey on test  (cost=0.00..8.27 rows=1 width=3)
   Index Cond: ((name)::text = '4'::text)

The server has default cost parameters

The value I want to understand is 8.27. From reading the book
"PostgreSQL 9.0 High Performance" I know, that we have one index page
read (random page read, cost=4.0) and one database row read (random page
read, cost=4.0) which comes up to a total of 8.0. But where are the
missing 0.27 from?

If I modify the example to insert 10,000 rows, the cost stays the same.
Only if I go for 100,000 rows will the computed cost increase to 8.29.

Can anybody enlighten me, please ;-).




Re: Understanding EXPLAIN

From
Tom Lane
Date:
Robert Lichtenberger <r.lichtenberger@synedra.com> writes:
> I am trying to fully understand, how costs for queries are computed.
> Taking the following example: ...
> Index Scan using test_pkey on test  (cost=0.00..8.27 rows=1 width=3)
>    Index Cond: ((name)::text = '4'::text)

> The value I want to understand is 8.27. From reading the book
> "PostgreSQL 9.0 High Performance" I know, that we have one index page
> read (random page read, cost=4.0) and one database row read (random page
> read, cost=4.0) which comes up to a total of 8.0. But where are the
> missing 0.27 from?

I think you're neglecting CPU costs.  We're going to charge at least one
cpu_operator_cost, one cpu_index_tuple_cost, and one cpu_tuple_cost on
top of the I/O costs.  Now that still only adds 0.0175 at the default
settings, but there are various other second-order contributions.  For
the whole story, there's no substitute for taking a look at the source
code; see cost_index here:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/optimizer/path/costsize.c;h=885d8558c319fd283df351c2c8e062a449b72d3c;hb=HEAD#l208
which largely depends on btcostestimate and genericcostestimate here:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/adt/selfuncs.c;h=6d78068476e520f7dd2da6c0c8d48d93e0649768;hb=HEAD#l6003

In a quick look through that, I think the largest second-order component
is this charge in genericcostestimate:

6214      * We also add a CPU-cost component to represent the general costs of
6215      * starting an indexscan, such as analysis of btree index keys and initial
6216      * tree descent.  This is estimated at 100x cpu_operator_cost, which is a
6217      * bit arbitrary but seems the right order of magnitude. (As noted above,
6218      * we don't charge any I/O for touching upper tree levels, but charging
6219      * nothing at all has been found too optimistic.)
...
6226     *indexTotalCost += num_sa_scans * 100.0 * cpu_operator_cost;

which accounts for 0.25 cost units at the default cpu_operator_cost setting.

            regards, tom lane

Re: Understanding EXPLAIN

From
Andreas Kretschmer
Date:
Robert Lichtenberger <r.lichtenberger@synedra.com> wrote:

> I am trying to fully understand, how costs for queries are computed.
> Taking the following example:
>
> CREATE TABLE test (name varchar(250) primary key) ;
> INSERT INTO test (name) VALUES(generate_series(1, 1000)::text) ;
> ANALYZE test ;
> EXPLAIN SELECT * FROM test WHERE name = '4' ;
>
> I am getting the output:
> Index Scan using test_pkey on test  (cost=0.00..8.27 rows=1 width=3)
>   Index Cond: ((name)::text = '4'::text)
>
> The server has default cost parameters
>
> The value I want to understand is 8.27. From reading the book
> "PostgreSQL 9.0 High Performance" I know, that we have one index page
> read (random page read, cost=4.0) and one database row read (random page
> read, cost=4.0) which comes up to a total of 8.0. But where are the
> missing 0.27 from?
>
> If I modify the example to insert 10,000 rows, the cost stays the same.
> Only if I go for 100,000 rows will the computed cost increase to 8.29.
>
> Can anybody enlighten me, please ;-).

There are some other costs, in your case cpu_tuple_cost and
cpu_index_tuple_cost.


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°