Thread: : Cost calculation for EXPLAIN output
Hello,
I am trying to understand the analysis behind the "cost" attribute in EXPLAIN output.
postgres = # explain select * from table_event where seq_id=8520960;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Scan using te_pk on table_event (cost=0.00..13.88 rows=1 width=62)
Index Cond: (sequence_id = 8520960)
The cost is "13.88" to fetch 1 row by scanning an Primary Key indexed column.
Isn't the cost for fetching 1 row is too high ?
On the same table, the cost calculation for scanning the full table is looking justified --
postgres=# explain select * from table_event;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on table_event (cost=0.00..853043.44 rows=38679544 width=62)
(1 row)
(disk pages read * seq_page_cost) + (rows scanned * cpu_tuple_cost) = (466248 * 1) + (38679544 * 0.01) = 853043.44
By the way below are the details -
Version - Postgres-9.0
Table size is - 3643 MB
+Indexes the size is - 8898 MB
I am looking for a way to reduce cost as much as possible because the query executes 100000+ times a day.
Any thoughts ?
Thanks,
VB
Venkat Balaji <venkat.balaji@verse.in> wrote: > The cost is "13.88" to fetch 1 row by scanning an Primary Key > indexed column. > > Isn't the cost for fetching 1 row is too high ? I don't know, how many index pages will need to be randomly accessed in addition to the random heap access? How many dead versions of the row will need to be visited besides the row which is actually visible? How many of these pages are in shared_buffers? How many of these pages are in OS cache? > I am looking for a way to reduce cost as much as possible because > the query executes 100000+ times a day. Well, you can reduce the cost all you want by dividing all of the costing factors in postgresql.conf by the same value, but that won't affect query run time. That depends on the query plan which is chosen. The cost is just an abstract number used for comparing the apparent resources needed to run a query through each of the available plans. What matters is that the cost factors accurately reflect the resources used; if not you should adjust them. If you calculate a ratio between run time and estimated cost, you should find that it remains relatively constant (like within an order of magnitude) for various queries. Since you didn't show actual run times, we can't tell whether anything need adjustment. -Kevin
On 02/23/2012 06:21 AM, Venkat Balaji wrote: > The cost is "13.88" to fetch 1 row by scanning an Primary Key > indexed column. > > Isn't the cost for fetching 1 row is too high ? Not really. The "cost" is really just an estimate to rank alternate query plans so the database picks the least expensive plan. The number '13.88' is basically meaningless. It doesn't translate to any real-world equivalent. What you actually care about is the execution time. If it takes 0.25ms or something per row, that's what really matters. For what it's worth, it looks like you have the right query plan, there. Scan the primary key for one row. What's wrong with that? Our systems have tables far larger than yours, handling 300M queries per day that are far more expensive than a simple primary key index scan. You'll be fine. :) I suggest you set log_min_duration_statement to something like 1000, to send any query that takes longer than 1 second to the PG logs. Concentrate on those queries, because ones like this are already working right. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
> The cost is "13.88" to fetch 1 row by scanning an Primary KeyI don't know, how many index pages will need to be randomly accessed
> indexed column.
>
> Isn't the cost for fetching 1 row is too high ?
in addition to the random heap access? How many dead versions of
the row will need to be visited besides the row which is actually
visible? How many of these pages are in shared_buffers? How many
of these pages are in OS cache?
Total Index pages are 140310. Yes. I suspect most of the times the required page is found in either OS cache or disk (shared_buffers is .9 GB) as we have 200+ GB of highly active database and the Index is on a 10GB table.
Thanks for your valuable inputs !
Not really. The "cost" is really just an estimate to rank alternate query plans so the database picks the least expensive plan. The number '13.88' is basically meaningless. It doesn't translate to any real-world equivalent. What you actually care about is the execution time. If it takes 0.25ms or something per row, that's what really matters.The cost is "13.88" to fetch 1 row by scanning an Primary Key
indexed column.
Isn't the cost for fetching 1 row is too high ?
For what it's worth, it looks like you have the right query plan, there. Scan the primary key for one row. What's wrong with that? Our systems have tables far larger than yours, handling 300M queries per day that are far more expensive than a simple primary key index scan. You'll be fine. :)
Execution time is 0.025 ms per row. I am quite happy with the execution time, even the plan is going the correct way. The total execution time reduced from 2.5 hrs to 5.5 seconds. I am looking for a room for further improvement -- probably quite ambitious :-) ..
But, when i reduced random_page_cost to 2, the cost reduced to 7.03 and execution also have reduced by almost 50%. Is this an gain ?
Please comment !
Thanks,
VB
_________________________________
See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email