Re: how to identify expensive steps in an explain analyze output - Mailing list pgsql-performance
From | Frits Hoogland |
---|---|
Subject | Re: how to identify expensive steps in an explain analyze output |
Date | |
Msg-id | fbb8fbcd0802270936p4e0b6cabn9738fa6e89e69e82@mail.gmail.com Whole thread Raw |
In response to | Re: how to identify expensive steps in an explain analyze output (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-performance |
thanks for your answer!
okay, cost is a totally wrong word here. I am using the "actual times" in the execplan.
what I am trying to do, is optimise a database both by investigating execplans, and thinking about what concurrency would do to my database. (I have a database which is reported to almost stop functioning under load)
I've read the sections you pointed out. It's quite understandable.
What I am missing, is the connection between the logical steps (ie. the execplan) and the physical implications.
For example: the time it took for a seqscan to complete can be computed by subtracting the two times after "actual time"
(because it's an end node in the execplan, is that assumption (scantime=totaltime-startuptime) right?)
I can compute how long it approximately took for each row by dividing the time through the number of rows (and loops)
but I do not know how many physical IO's it has done, and/or how many logical IO's
Same for merge joins&sorts: the physical implication (writing and reading if the amount of data exceeds work_mem) is not in the execplan. that's the reason I mentioned "cost".
I know understand that it's impossible to judge the "cost" of a merge join, because it's time is composited by both the scans and the merge operation itself, right?
Is there any way to identify nodes in the execplan which "cost" many (CPU time, IO, etc.)?
regards
frits
okay, cost is a totally wrong word here. I am using the "actual times" in the execplan.
what I am trying to do, is optimise a database both by investigating execplans, and thinking about what concurrency would do to my database. (I have a database which is reported to almost stop functioning under load)
I've read the sections you pointed out. It's quite understandable.
What I am missing, is the connection between the logical steps (ie. the execplan) and the physical implications.
For example: the time it took for a seqscan to complete can be computed by subtracting the two times after "actual time"
(because it's an end node in the execplan, is that assumption (scantime=totaltime-startuptime) right?)
I can compute how long it approximately took for each row by dividing the time through the number of rows (and loops)
but I do not know how many physical IO's it has done, and/or how many logical IO's
Same for merge joins&sorts: the physical implication (writing and reading if the amount of data exceeds work_mem) is not in the execplan. that's the reason I mentioned "cost".
I know understand that it's impossible to judge the "cost" of a merge join, because it's time is composited by both the scans and the merge operation itself, right?
Is there any way to identify nodes in the execplan which "cost" many (CPU time, IO, etc.)?
regards
frits
On 2/27/08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Frits Hoogland" <frits.hoogland@gmail.com> writes:
> The manual states: "Actually two numbers are shown: the start-up time before
> the first row can be returned, and the total time to return all the rows.".
> Does this mean that the difference between the first and second is the cost
> or the time the step in the explain has taken?
No, or at least only for a very strange definition of "cost". An
example of the way these are used is that for a hash join, the startup
time would include the time needed to scan the inner relation and build
the hash table from it. The run time (ie, difference between startup
and total) represents the part of the process where we're scanning the
outer relation and probing into the hash table for matches. Rows are
returned as matches are found during this part of the process. I can't
think of any useful definition under which the startup time would be
ignored.
The reason the planner divides the total cost like this is that in the
presence of LIMIT or a few other SQL features, it may not be necessary
to run the plan to completion, but only to fetch the first few rows.
In this case a plan with low startup cost may be preferred, even though
the estimated total cost to run it to completion might be higher than
some other plan has. We're not *going* to run it to completion, and
so the really interesting figure is startup cost plus some appropriate
fraction of run cost. You can see this at work if you look at the
EXPLAIN numbers for a query involving a LIMIT.
The whole thing might make a bit more sense if you read
http://www.postgresql.org/docs/8.3/static/overview.html
particularly the last two subsections.
regards, tom lane
pgsql-performance by date: