Thread: New EXPLAIN ANALYZE statement
Well, I don't know how many people will have seen this one already so I'm posting it here. This patch probably needs more feedback before being applied. - What it does: Adds a new EXPLAIN ANALYZE statement which displays the plan tree, just like EXPLAIN, but also executes the query and collects statistics on what actually happened. The current statistics collected are: Time till first tuple is returned Time till all tuples are returned Total number of tuples returned Number of times Rescan is called I wrote it to help analyze why certain queries were taking much longer than the planner expected. I think for debugging purposes it is useful because by asking people to post the output of EXPLAIN ANALYZE you can immediately see the actual statistics and the planner's estimates next to each other. - How it does it: It adds a few calls in a few strategic places in the executor. Each node in the query tree is accounted for separately. Only time processing the node is counted, not the time before and after. It also adds two new files. - Remaining issues: Currently only select statements are allowed. This is because I can't work out how to supress the actual effect of an UPDATE or DELETE query. I'm also assuming that SELECT queries can't be rewritten into something that actually changes something. Ouch! I just realised that functions can be called which could do anything, so I truly need something to suppress the results. Documentation, but I'll only do that if people are interested. Do people like the name? The other thing I was thinking of was just ANALYZE [query] but I thought that might be to much overloading of that name. Anything else you people come up with. - Example # explain analyse select sum(cost) from dailycalls where band = 1 and billid is null; NOTICE: QUERY PLAN: Aggregate (cost=53723.88..53723.88 rows=1 width=8) (actual time=24956.02..24956.02 rows=1 loops=1) -> Seq Scan on dailycalls (cost=0.00..53708.34 rows=6216 width=8) (actual time=463.16..24941.63 rows=5418 loops=1) EXPLAIN - Where to get it It's attached, but also available at: http://svana.org/kleptog/pgsql/timing-explain.patch Please CC any replies, as I'm subscribed nopost. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > It would be nice if someone came up with a certification system that > actually separated those who can barely regurgitate what they crammed over > the last few weeks from those who command secret ninja networking powers.
Attachment
Martijn van Oosterhout <kleptog@svana.org> writes: > It adds a few calls in a few strategic places in the executor. Each node in > the query tree is accounted for separately. Only time processing the node is > counted, not the time before and after. It also adds two new files. I'm still unconvinced that this approach will yield values that can meaningfully be compared to the planner's cost estimates. However, the thing I *really* object to about this patch is that the statistics- collection overhead is paid by every query whether it's being EXPLAINed or not. Kindly set it up so that that's not true. (One way to do that is to have EXPLAIN be responsible for traversing the tree and attaching instrumentation structs to the nodes; then at runtime, the work is done iff the struct links are not NULL.) > Currently only select statements are allowed. This is because I can't work > out how to supress the actual effect of an UPDATE or DELETE query. I'm also > assuming that SELECT queries can't be rewritten into something that actually > changes something. Ouch! I just realised that functions can be called which > could do anything, so I truly need something to suppress the results. I think this is wrongheaded; one of the things that might be interesting is to know how much of the runtime is actually spent in the executor toplevel, doing the update/delete/trigger firing/whatever. And if your query is calling expensive functions, why would you not want to know that? The correct approach is to indeed do the query, full up, as-is. (And you should probably include the end-to-end time, user, system and wall-clock, in the printout.) What's needed is to design the user interface so that it does not surprise people that the query is executed. This suggests to me that it should not be called EXPLAIN, but something else. (No, not ANALYZE, that's taken.) Random other comments: > + InstrStopNode( node->instrument ); > + > + if( TupIsNull(result) ) > + InstrEndLoop( node->instrument ); I think this isn't gonna work right for Group nodes. > + int tuplecount; /* Tuples so far this loop */ Call me silly, but I think the tuple and iteration counts should be doubles, not ints, to avoid overflow issues. > + bzero( i, sizeof(Instrumentation) ); Don't use bzero; use the ANSI-standard routines (memset, etc). > +InstrStartNode( Instrumentation *i ) /* When we enter the node */ > +{ > + if( !i ) > + return; This is just a personal thing, perhaps, but I *never* use a name like "i" for anything except loop counters. A struct argument ought to have a less anonymous name. > + while( i->counter.tv_usec < 0 ) > + { > + i->counter.tv_usec += 1000000; > + i->counter.tv_sec--; > + } Don't you need to also reduce tv_usec to less than 1000000? Else you risk overflow of the usec field over successive executions. > + if( !timerisset( &i->counter ) ) /* Skip if nothing has happend */ Are these "timerisset" and "timerclear" thingies portable? I can't find anything about them in the man pages. Since they're not used anywhere in Postgres at the moment, I'd just as soon not see you add a new platform dependency. Just write out the clearing/testing of tv_sec and tv_usec instead --- it's not like your code will work if those fields don't exist with those names... A general rule when dealing with system-defined stuff is to look around and see how similar stuff is coded elsewhere in the Postgres backend, and do it the same way. That way you can take advantage of literally years' worth of portability knowledge that we have accumulated, rather than repeating old mistakes that we have fixed. regards, tom lane
On Fri, Jul 20, 2001 at 01:39:34PM -0400, Tom Lane wrote: > I'm still unconvinced that this approach will yield values that can > meaningfully be compared to the planner's cost estimates. However, > the thing I *really* object to about this patch is that the statistics- > collection overhead is paid by every query whether it's being EXPLAINed > or not. Kindly set it up so that that's not true. (One way to do that > is to have EXPLAIN be responsible for traversing the tree and attaching > instrumentation structs to the nodes; then at runtime, the work is done > iff the struct links are not NULL.) Why do you think the results cannot be compared to the estimates? The estimates are after all an estimate of how long it would take. The code already handles having NULLs in the structure so it doesn't do anything. I was thinking if just having a flag in the top level query node and use that to decide whether to allocate the structure in ExecInitNode. > I think this is wrongheaded; one of the things that might be interesting > is to know how much of the runtime is actually spent in the executor > toplevel, doing the update/delete/trigger firing/whatever. And if your > query is calling expensive functions, why would you not want to know > that? The correct approach is to indeed do the query, full up, as-is. > (And you should probably include the end-to-end time, user, system and > wall-clock, in the printout.) What's needed is to design the user > interface so that it does not surprise people that the query is > executed. This suggests to me that it should not be called EXPLAIN, > but something else. (No, not ANALYZE, that's taken.) Unfortunatly (as far as I can tell) calls to functions don't have nodes in the tree so I don't see anywhere to store the actual data. Maybe there is some other non-shared structure I can use. I'll look. What you are suggesting would make the explain output very much more verbose. If that's what you're insterested in. I'm not sure adding all those other time measurements would be particularly helpful. I agree with the renaming but no-one seems to have any suggestions. Maybe TIME [query], like at a command prompt. > Random other comments: > > > + InstrStopNode( node->instrument ); > > + > > + if( TupIsNull(result) ) > > + InstrEndLoop( node->instrument ); > > I think this isn't gonna work right for Group nodes. I noticed that Group node act strange, the tuplecount becomes the whole number of tuples involved and the loop count is the number of groups. I was going to look into it but am not convinced it is wrong as is. > > + int tuplecount; /* Tuples so far this loop */ > > Call me silly, but I think the tuple and iteration counts should be > doubles, not ints, to avoid overflow issues. I'd prefer a 64-bit int actually. Floating point is accurate but no precise. > > + bzero( i, sizeof(Instrumentation) ); > > Don't use bzero; use the ANSI-standard routines (memset, etc). OK. > > +InstrStartNode( Instrumentation *i ) /* When we enter the node */ > > +{ > > + if( !i ) > > + return; > > This is just a personal thing, perhaps, but I *never* use a name like > "i" for anything except loop counters. A struct argument ought to have > a less anonymous name. s/i/instr/ then > > + while( i->counter.tv_usec < 0 ) > > + { > > + i->counter.tv_usec += 1000000; > > + i->counter.tv_sec--; > > + } > > Don't you need to also reduce tv_usec to less than 1000000? Else you > risk overflow of the usec field over successive executions. I think the way the calculation is written, tv_usec can't be greater than 1000000. > > + if( !timerisset( &i->counter ) ) /* Skip if nothing has happend */ > > Are these "timerisset" and "timerclear" thingies portable? I can't find > anything about them in the man pages. Since they're not used anywhere > in Postgres at the moment, I'd just as soon not see you add a new > platform dependency. Just write out the clearing/testing of tv_sec and > tv_usec instead --- it's not like your code will work if those fields > don't exist with those names... I'm not sure. On my system (Debian Linux) they are described in the manpage for gettimeofday. They are just one line #defines, not a major drama. I can copy them in if you like. > A general rule when dealing with system-defined stuff is to look > around and see how similar stuff is coded elsewhere in the Postgres > backend, and do it the same way. That way you can take advantage of > literally years' worth of portability knowledge that we have > accumulated, rather than repeating old mistakes that we have fixed. Hmm, ok. gettimeofday is used elsewhere. There are no other system calls involved. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > It would be nice if someone came up with a certification system that > actually separated those who can barely regurgitate what they crammed over > the last few weeks from those who command secret ninja networking powers.
On Sat, Jul 21, 2001 at 02:18:45AM +1000, Justin Clift wrote: > Hi Martijn, > > Why would you want to suppress the actual effect of non-select > statements? > > If people want to time them, I think they should. If they don't, they > shouldn't run the query. :) Well, it depends if most of the time is taken working out what to enter or if it's taken actually inserting the data (for an INSERT statement). This patch is more aimed toward helping optimise the query, and just like the planner doesn't take into account the time taken to send the data to the frontend, this doesn't care how long it takes to insert the data, since that is not something you can change. If you want overall resource usage and the time taken to execute the whole query, that already exists. If you want to find bottlenecks in the processing then you need to get a profiler. This patch is aimed at the planner because that's the *only* area that can be tuned and adjusted by people trying to improve performance of thier queries. > Is it harder to create or manage? To change the way this code is called is easy, just fiddle the grammer. Once people agree on a name it'll be done. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > It would be nice if someone came up with a certification system that > actually separated those who can barely regurgitate what they crammed over > the last few weeks from those who command secret ninja networking powers.
At 00:49 21/07/01 +1000, Martijn van Oosterhout wrote: >- What it does: >Adds a new EXPLAIN ANALYZE statement which displays the plan tree, just like >EXPLAIN, but also executes the query and collects statistics on what >actually happened. The current statistics collected are: > >Time till first tuple is returned >Time till all tuples are returned >Total number of tuples returned >Number of times Rescan is called > This is a great idea; it's a good first step in getting some stats on how accurate the optimizer settings are, as well as how good the assumptions are. But you probably need to collect information about IOs & buffer hits as well. Perhaps you should look at Jan's stats collection patches and see if you can piggyback on them somehow? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
On Sat, Jul 21, 2001 at 12:04:47PM +1000, Philip Warner wrote: > At 00:49 21/07/01 +1000, Martijn van Oosterhout wrote: > >- What it does: > >Adds a new EXPLAIN ANALYZE statement which displays the plan tree, just like > >EXPLAIN, but also executes the query and collects statistics on what > >actually happened. The current statistics collected are: > > > >Time till first tuple is returned > >Time till all tuples are returned > >Total number of tuples returned > >Number of times Rescan is called > > > > This is a great idea; it's a good first step in getting some stats on how > accurate the optimizer settings are, as well as how good the assumptions > are. But you probably need to collect information about IOs & buffer hits > as well. Perhaps you should look at Jan's stats collection patches and see > if you can piggyback on them somehow? Well, the above statistics are done because they can be calculated on each node. The buffer hits and IOs are harder because they only apply to certain nodes. As for Jan's stats sollection patches, where are they? Google finds nothing. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > It would be nice if someone came up with a certification system that > actually separated those who can barely regurgitate what they crammed over > the last few weeks from those who command secret ninja networking powers.
At 12:29 21/07/01 +1000, Martijn van Oosterhout wrote: > >As for Jan's stats sollection patches, where are they? Google finds nothing. > I *think* they are part of current CVS. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/