Home > mailing lists

Thread: : Cost calculation for EXPLAIN output

: Cost calculation for EXPLAIN output

From

Venkat Balaji

Date:

23 February 2012, 08:21:48

Hello,

I am trying to understand the analysis behind the "cost" attribute in EXPLAIN output.

postgres = # explain select * from table_event where seq_id=8520960;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------

Index Scan using te_pk on table_event (cost=0.00..13.88 rows=1 width=62)

Index Cond: (sequence_id = 8520960)

The cost is "13.88" to fetch 1 row by scanning an Primary Key indexed column.

Isn't the cost for fetching 1 row is too high ?

On the same table, the cost calculation for scanning the full table is looking justified --

postgres=# explain select * from table_event;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------

Seq Scan on table_event (cost=0.00..853043.44 rows=38679544 width=62)

(1 row)

(disk pages read * seq_page_cost) + (rows scanned * cpu_tuple_cost) = (466248 * 1) + (38679544 * 0.01) = 853043.44

By the way below are the details -

Version - Postgres-9.0

Table size is - 3643 MB

+Indexes the size is - 8898 MB

I am looking for a way to reduce cost as much as possible because the query executes 100000+ times a day.

Any thoughts ?

Thanks,

Re: : Cost calculation for EXPLAIN output

From

"Kevin Grittner"

Date:

23 February 2012, 12:52:13

Venkat Balaji <venkat.balaji@verse.in> wrote:

> The cost is "13.88" to fetch 1 row by scanning an Primary Key
> indexed column.
>
> Isn't the cost for fetching 1 row is too high ?

I don't know, how many index pages will need to be randomly accessed
in addition to the random heap access?  How many dead versions of
the row will need to be visited besides the row which is actually
visible?  How many of these pages are in shared_buffers?  How many
of these pages are in OS cache?

> I am looking for a way to reduce cost as much as possible because
> the query executes 100000+ times a day.

Well, you can reduce the cost all you want by dividing all of the
costing factors in postgresql.conf by the same value, but that won't
affect query run time.  That depends on the query plan which is
chosen.  The cost is just an abstract number used for comparing the
apparent resources needed to run a query through each of the
available plans.  What matters is that the cost factors accurately
reflect the resources used; if not you should adjust them.

If you calculate a ratio between run time and estimated cost, you
should find that it remains relatively constant (like within an
order of magnitude) for various queries.  Since you didn't show
actual run times, we can't tell whether anything need adjustment.

-Kevin

Re: : Cost calculation for EXPLAIN output

From

Shaun Thomas

Date:

23 February 2012, 12:56:44

On 02/23/2012 06:21 AM, Venkat Balaji wrote:

> The cost is "13.88" to fetch 1 row by scanning an Primary Key
> indexed column.
>
> Isn't the cost for fetching 1 row is too high ?

Not really. The "cost" is really just an estimate to rank alternate
query plans so the database picks the least expensive plan. The number
'13.88' is basically meaningless. It doesn't translate to any real-world
equivalent. What you actually care about is the execution time. If it
takes 0.25ms or something per row, that's what really matters.

For what it's worth, it looks like you have the right query plan, there.
Scan the primary key for one row. What's wrong with that? Our systems
have tables far larger than yours, handling 300M queries per day that
are far more expensive than a simple primary key index scan. You'll be
fine. :)

I suggest you set log_min_duration_statement to something like 1000, to
send any query that takes longer than 1 second to the PG logs.
Concentrate on those queries, because ones like this are already working
right.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@peak6.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: : Cost calculation for EXPLAIN output

From

Venkat Balaji

Date:

26 February 2012, 13:15:15

> The cost is "13.88" to fetch 1 row by scanning an Primary Key
> indexed column.
>
> Isn't the cost for fetching 1 row is too high ?

I don't know, how many index pages will need to be randomly accessed
in addition to the random heap access? How many dead versions of
the row will need to be visited besides the row which is actually
visible? How many of these pages are in shared_buffers? How many
of these pages are in OS cache?

Total Index pages are 140310. Yes. I suspect most of the times the required page is found in either OS cache or disk (shared_buffers is .9 GB) as we have 200+ GB of highly active database and the Index is on a 10GB table.

Re: : Cost calculation for EXPLAIN output

From

Venkat Balaji

Date:

26 February 2012, 13:23:05

Thanks for your valuable inputs !

The cost is "13.88" to fetch 1 row by scanning an Primary Key
indexed column.

Isn't the cost for fetching 1 row is too high ?

Not really. The "cost" is really just an estimate to rank alternate query plans so the database picks the least expensive plan. The number '13.88' is basically meaningless. It doesn't translate to any real-world equivalent. What you actually care about is the execution time. If it takes 0.25ms or something per row, that's what really matters.

For what it's worth, it looks like you have the right query plan, there. Scan the primary key for one row. What's wrong with that? Our systems have tables far larger than yours, handling 300M queries per day that are far more expensive than a simple primary key index scan. You'll be fine. :)

Execution time is 0.025 ms per row. I am quite happy with the execution time, even the plan is going the correct way. The total execution time reduced from 2.5 hrs to 5.5 seconds. I am looking for a room for further improvement -- probably quite ambitious :-) ..

But, when i reduced random_page_cost to 2, the cost reduced to 7.03 and execution also have reduced by almost 50%. Is this an gain ?

Please comment !

Thanks,

_________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email