Thread: [HACKERS] Reporting planning time with EXPLAIN
Hi, We report planning and execution time when EXPLAIN ANALYZE is issued. We do not have facility to report planning time as part EXPLAIN output. In order to get the planning time, one has to issue EXPLAIN ANALYZE which involves executing the plan, which is unnecessary. We report planning and execution times when es->summary is true. It is set to true when es->analyze is true. 211 /* currently, summary option is not exposed to users; just set it */ 212 es->summary = es->analyze; The comment was introduced by commit commit 90063a7612e2730f7757c2a80ba384bbe7e35c4b Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Wed Oct 15 18:50:13 2014 -0400 Print planning time only in EXPLAIN ANALYZE, not plain EXPLAIN. We've gotten enough push-back on that change to make it clear that it wasn't an especially good idea to do it like that. Revert plain EXPLAIN to its previous behavior, but keep the extra output in EXPLAIN ANALYZE. Per discussion. Internally, I set this up as a separate flag ExplainState.summary that controls printing of planning time and execution time. For now it's just copied from the ANALYZE option, but we could consider exposing it to users. The discussion referred to seems to be [1]. Here's patch to expose the "summary" option as mentioned in the last paragraph of above commit message. Right now I have named it as "summary", but I am fine if we want to change it to something meaningful. "timing" already has got some other usage, so can't use it here. One can use this option as postgres=# explain (summary on) select * from pg_class c, pg_type t where c.reltype = t.oid; QUERY PLAN -------------------------------------------------------------------------- Hash Join (cost=17.12..35.70 rows=319 width=511) Hash Cond: (c.reltype = t.oid) -> Seq Scan on pg_class c (cost=0.00..14.19 rows=319 width=259) -> Hash (cost=12.61..12.61 rows=361 width=256) -> Seq Scan on pg_type t (cost=0.00..12.61 rows=361 width=256) Planning time: 48.823 ms (6 rows) When analyze is specified, summary is also set to ON. By default this flag is OFF. Suggestions welcome. [1] https://www.postgresql.org/message-id/flat/1351f76f-69a4-4257-91c2-9382e2a6dc22%40email.android.com#1351f76f-69a4-4257-91c2-9382e2a6dc22@email.android.com [2] https://www.postgresql.org/message-id/19766.1413129321%40sss.pgh.pa.us -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Ashutosh, * Ashutosh Bapat (ashutosh.bapat@enterprisedb.com) wrote: > We report planning and execution time when EXPLAIN ANALYZE is issued. > We do not have facility to report planning time as part EXPLAIN > output. In order to get the planning time, one has to issue EXPLAIN > ANALYZE which involves executing the plan, which is unnecessary. +1, that seems like a useful thing to have. > The discussion referred to seems to be [1]. Here's patch to expose the > "summary" option as mentioned in the last paragraph of above commit > message. Right now I have named it as "summary", but I am fine if we > want to change it to something meaningful. "timing" already has got > some other usage, so can't use it here. After reading that, rather long, thread, I agree that just having it be 'summary' is fine. We don't really want to make it based off of 'timing' or 'costs' or 'verbose', those are different things. I've only briefly looked at the patch so far, but it seemed pretty straight-forward. If there aren't objections, I'll see about getting this committed later this week. I will point out that it'd still be nice to have something like 'explain (I WANT IT ALL)', but that's a different feature and has its own challenges, so let's not argue about it here. Thanks! Stephen
On Tue, Dec 27, 2016 at 1:27 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > Hi, > We report planning and execution time when EXPLAIN ANALYZE is issued. > We do not have facility to report planning time as part EXPLAIN > output. In order to get the planning time, one has to issue EXPLAIN > ANALYZE which involves executing the plan, which is unnecessary. > +1. I think getting to know planning time is useful for cases when we are making changes in planner to know if the changes has introduced any regression. > > One can use this option as > postgres=# explain (summary on) select * from pg_class c, pg_type t > where c.reltype = t.oid; > QUERY PLAN > -------------------------------------------------------------------------- > Hash Join (cost=17.12..35.70 rows=319 width=511) > Hash Cond: (c.reltype = t.oid) > -> Seq Scan on pg_class c (cost=0.00..14.19 rows=319 width=259) > -> Hash (cost=12.61..12.61 rows=361 width=256) > -> Seq Scan on pg_type t (cost=0.00..12.61 rows=361 width=256) > Planning time: 48.823 ms > (6 rows) > > When analyze is specified, summary is also set to ON. By default this > flag is OFF. > I am not sure whether using *summary* to print just planning time is a good idea. Another option could be SUMMARY_PLAN_TIME. + /* Execution time matters only when analyze is requested */ + if (es->summary && es->analyze) Do you really need es->summary in above check? We should update documentation of Explain command, but maybe that can wait till we finalize the specs. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Dec 27, 2016 at 09:26:21AM -0500, Stephen Frost wrote: > * Ashutosh Bapat (ashutosh.bapat@enterprisedb.com) wrote: > > We report planning and execution time when EXPLAIN ANALYZE is issued. > > We do not have facility to report planning time as part EXPLAIN > > output. In order to get the planning time, one has to issue EXPLAIN > > ANALYZE which involves executing the plan, which is unnecessary. > > After reading that, rather long, thread, I agree that just having it be > 'summary' is fine. We don't really want to make it based off of > 'timing' or 'costs' or 'verbose', those are different things. I'm wondering, why is 'timing' (in the EXPLAIN scope) a different thing to planning time? It might be that people don't expect it at this point and external tools might break (dunno), but in oder to print the planning time, 'timing' sounds clearer than 'summary' to me. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
On Wed, Dec 28, 2016 at 02:08:55PM +0100, Michael Banck wrote: > On Tue, Dec 27, 2016 at 09:26:21AM -0500, Stephen Frost wrote: > > * Ashutosh Bapat (ashutosh.bapat@enterprisedb.com) wrote: > > > We report planning and execution time when EXPLAIN ANALYZE is issued. > > > We do not have facility to report planning time as part EXPLAIN > > > output. In order to get the planning time, one has to issue EXPLAIN > > > ANALYZE which involves executing the plan, which is unnecessary. > > > > After reading that, rather long, thread, I agree that just having it be > > 'summary' is fine. We don't really want to make it based off of > > 'timing' or 'costs' or 'verbose', those are different things. > > I'm wondering, why is 'timing' (in the EXPLAIN scope) a different thing > to planning time? It might be that people don't expect it at this point > and external tools might break (dunno), but in oder to print the > planning time, 'timing' sounds clearer than 'summary' to me. OK, after also rereading the thread, this indeed seems to be very complicated, and I don't want to reopen this can of worms, sorry. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
>> >> One can use this option as >> postgres=# explain (summary on) select * from pg_class c, pg_type t >> where c.reltype = t.oid; >> QUERY PLAN >> -------------------------------------------------------------------------- >> Hash Join (cost=17.12..35.70 rows=319 width=511) >> Hash Cond: (c.reltype = t.oid) >> -> Seq Scan on pg_class c (cost=0.00..14.19 rows=319 width=259) >> -> Hash (cost=12.61..12.61 rows=361 width=256) >> -> Seq Scan on pg_type t (cost=0.00..12.61 rows=361 width=256) >> Planning time: 48.823 ms >> (6 rows) >> >> When analyze is specified, summary is also set to ON. By default this >> flag is OFF. >> > > I am not sure whether using *summary* to print just planning time is a > good idea. Another option could be SUMMARY_PLAN_TIME. I have just used the same name as the boolean which controls the printing of planning time. Suggestions are welcome though. We haven't used words with "_" for EXPLAIN options, so I am not sure about SUMMARY_PLAN_TIME. > > + /* Execution time matters only when analyze is requested */ > + if (es->summary && es->analyze) > > Do you really need es->summary in above check? I think es->summary controls printing overall timing, planning as well as execution (hence probably the name "summary"). Earlier it was solely controlled by es->analyze, but now that it's exposed, we do want to check if analyze was also true as without analyze there can not be execution time. So, I changed the earlier condition if (es->summary) to include es->analyze. Can we use only analyze? Probably yes. The question there is why we didn't do it to start with? OR why did we have summary controlling execution time report. Probably the author thought that at some point in future we might separate those two. So, I have left summary there. I don't have problem removing it. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
* Ashutosh Bapat (ashutosh.bapat@enterprisedb.com) wrote: > >> One can use this option as > >> postgres=# explain (summary on) select * from pg_class c, pg_type t > >> where c.reltype = t.oid; > >> QUERY PLAN > >> -------------------------------------------------------------------------- > >> Hash Join (cost=17.12..35.70 rows=319 width=511) > >> Hash Cond: (c.reltype = t.oid) > >> -> Seq Scan on pg_class c (cost=0.00..14.19 rows=319 width=259) > >> -> Hash (cost=12.61..12.61 rows=361 width=256) > >> -> Seq Scan on pg_type t (cost=0.00..12.61 rows=361 width=256) > >> Planning time: 48.823 ms > >> (6 rows) > >> > >> When analyze is specified, summary is also set to ON. By default this > >> flag is OFF. > >> > > > > I am not sure whether using *summary* to print just planning time is a > > good idea. Another option could be SUMMARY_PLAN_TIME. > > I have just used the same name as the boolean which controls the > printing of planning time. Suggestions are welcome though. We haven't > used words with "_" for EXPLAIN options, so I am not sure about > SUMMARY_PLAN_TIME. Using 'summary' seems entirely reasonable to me, I don't think we need to complicate it by saying 'summary_plan_time'- I know that I'd had to have to write that out. > > + /* Execution time matters only when analyze is requested */ > > + if (es->summary && es->analyze) > > > > Do you really need es->summary in above check? I'm pretty sure we do. EXPLAIN (ANALYZE, SUMMARY OFF) Should not print the summary (which means it shouldn't print either the planning or execution time). > I think es->summary controls printing overall timing, planning as well > as execution (hence probably the name "summary"). I am imagining it controlling if we print the summary or not, where the summary today is the last two lines: Planning time: 14.020 msExecution time: 79.555 ms Thanks! Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Ashutosh Bapat (ashutosh.bapat@enterprisedb.com) wrote: >>> I am not sure whether using *summary* to print just planning time is a >>> good idea. Another option could be SUMMARY_PLAN_TIME. > Using 'summary' seems entirely reasonable to me, I think it's an awful choice of name; it has nothing to do with either the functionality or the printed name of the field. And I could imagine future uses of "summary" to mean something much different, like say an actual summary. (The dictionary meaning of "summary" is "a brief restatement of the main points of something"; adding new information is not even approximately within the meaning.) How about just saying that the existing TIMING option turns this on, if it's specified without ANALYZE? Right now that combination draws an error: regression=# explain (timing on) select 1;ERROR: EXPLAIN option TIMING requires ANALYZE so there's no existing usage that this would break. regards, tom lane
Tom Lane wrote: > How about just saying that the existing TIMING option turns this on, > if it's specified without ANALYZE? Right now that combination draws > an error: > > regression=# explain (timing on) select 1; > ERROR: EXPLAIN option TIMING requires ANALYZE > > so there's no existing usage that this would break. Sounds much more reasonable to me than SUMMARY. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tom, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Ashutosh Bapat (ashutosh.bapat@enterprisedb.com) wrote: > >>> I am not sure whether using *summary* to print just planning time is a > >>> good idea. Another option could be SUMMARY_PLAN_TIME. > > > Using 'summary' seems entirely reasonable to me, > > I think it's an awful choice of name; it has nothing to do with either > the functionality or the printed name of the field. And I could imagine > future uses of "summary" to mean something much different, like say an > actual summary. (The dictionary meaning of "summary" is "a brief > restatement of the main points of something"; adding new information > is not even approximately within the meaning.) I don't particularly agree with this. It's a summary of the timing information today (and at least one dictionary agrees that "summary" can also mean "Formed into a sum") and in the future it certainly could be expanded on, in which case I'd expect the 'summary' option to control whatever else is added there. As an example, we might some day wish to include a summary of buffer information at the bottom too when 'buffers' is used. The proposed 'summary' option would cover that nicely, but 'timing' wouldn't. That's actually why I was thinking summary might be a good option to have. > How about just saying that the existing TIMING option turns this on, > if it's specified without ANALYZE? Right now that combination draws > an error: > > regression=# explain (timing on) select 1; > ERROR: EXPLAIN option TIMING requires ANALYZE > > so there's no existing usage that this would break. No, but consider how the docs for the current 'timing' option would have to be rewritten. Further, I'd expect to include examples of the ability to show planning time in 14.1 as that's something that regular users will actually be using, unlike 'timing's current usage which is, in my experience anyway, not terribly common. We would also have to say something like "the default when not using 'analyze' is off, but with 'analyze' the default is on" which seems pretty grotty to me. Then again, from a *user's* perspective, it should just be included by default. The reason it isn't hasn't got anything to do with what our *users* want, in my opinion at least. Having the summary option exposed would also provide a way for Andres to do what he wanted to originally from the referred-to thread. There may be other pieces to address if the plan might involve platform-specific details about sorts, etc, but from what he was suggesting that wouldn't be an issue for his initial case, and as Robert mentioned on that thread, we could do something about those other cases too. I don't think having 'timing' or 'whatever controls showing planning and total execution times at the bottom' would make sense as an option to disable showing platform-specific sort or hashing info though. Thanks! Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> I think it's an awful choice of name; it has nothing to do with either >> the functionality or the printed name of the field. > As an example, we might some day wish to include a summary of buffer > information at the bottom too when 'buffers' is used. The proposed > 'summary' option would cover that nicely, but 'timing' wouldn't. That's > actually why I was thinking summary might be a good option to have. What, would this option then turn off the total-time displays by default? I do not see that being a reasonable thing to do. Basically, you're taking what seems like a very general-purpose option name and nailing it down to mean "print planning time". You aren't going to be able to change that later. > No, but consider how the docs for the current 'timing' option would have > to be rewritten. Well, sure, they'd have to be rewritten, but I think this definition would actually be more orthogonal. > We would also have to say something like "the default when not using > 'analyze' is off, but with 'analyze' the default is on" which seems > pretty grotty to me. But the default for TIMING already does depend on ANALYZE. > Then again, from a *user's* perspective, it should just be included by > default. Actually, the reason it hasn't gotten included is probably that the use-case for it is very small. If you just do psql \timing on an EXPLAIN, you get something close enough to the planning time. I don't mind adding this as an option, but claiming that it's so essential that it should be there by default is silly. People would have asked for it years ago if it were all that important. > Having the summary option exposed would also provide a way for Andres to > do what he wanted to originally from the referred-to thread. There may > be other pieces to address if the plan might involve platform-specific > details about sorts, etc, but from what he was suggesting that wouldn't > be an issue for his initial case, and as Robert mentioned on that > thread, we could do something about those other cases too. I don't > think having 'timing' or 'whatever controls showing planning and total > execution times at the bottom' would make sense as an option to disable > showing platform-specific sort or hashing info though. Again, you're proposing that you can add an option today and totally redefine what it means tomorrow. I do not think that's a plan. regards, tom lane
Tom, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >> I think it's an awful choice of name; it has nothing to do with either > >> the functionality or the printed name of the field. > > > As an example, we might some day wish to include a summary of buffer > > information at the bottom too when 'buffers' is used. The proposed > > 'summary' option would cover that nicely, but 'timing' wouldn't. That's > > actually why I was thinking summary might be a good option to have. > > What, would this option then turn off the total-time displays by default? To retain our current mixed behavior with explain vs. explain-analyze, we'd have to say it defaults to off for explain and on with analyze. I don't particularly like that, and would rather we just default it to on, but that would mean adjusting the regression tests. > I do not see that being a reasonable thing to do. Basically, you're > taking what seems like a very general-purpose option name and nailing > it down to mean "print planning time". You aren't going to be able > to change that later. No, that's not what I was suggesting to do and I disagree that we couldn't ever change it later. If we want it to mean "print planning time" and only ever that then I agree that calling it "summary" isn't a good option. > > No, but consider how the docs for the current 'timing' option would have > > to be rewritten. > > Well, sure, they'd have to be rewritten, but I think this definition > would actually be more orthogonal. This definition would have two completely different meanings- one for when analyze is used, and one for when it isn't. > > We would also have to say something like "the default when not using > > 'analyze' is off, but with 'analyze' the default is on" which seems > > pretty grotty to me. > > But the default for TIMING already does depend on ANALYZE. I would argue that timing can only actually be used with analyze today, which makes sense when you consider that timing is about enabling or disabling per-node timing information. Redefining it to mean something else isn't particularly different from redefining 'summary' later to mean something else. > > Then again, from a *user's* perspective, it should just be included by > > default. > > Actually, the reason it hasn't gotten included is probably that the > use-case for it is very small. If you just do psql \timing on an > EXPLAIN, you get something close enough to the planning time. I don't > mind adding this as an option, but claiming that it's so essential > that it should be there by default is silly. People would have asked > for it years ago if it were all that important. I don't buy this argument. Planning time is (hopefully, anyway...) a rather small amount of time which means that the actual results from \timing (or, worse, the timing info from other tools like pgAdmin) is quite far off. On a local instance with a simple plan, you can get an order-of-magnitude difference between psql's \timing output and the actual planning time, throw in a few or even 10s of ms of network latency and you might as well forget about trying to figure out what the planning time actually is. > > Having the summary option exposed would also provide a way for Andres to > > do what he wanted to originally from the referred-to thread. There may > > be other pieces to address if the plan might involve platform-specific > > details about sorts, etc, but from what he was suggesting that wouldn't > > be an issue for his initial case, and as Robert mentioned on that > > thread, we could do something about those other cases too. I don't > > think having 'timing' or 'whatever controls showing planning and total > > execution times at the bottom' would make sense as an option to disable > > showing platform-specific sort or hashing info though. > > Again, you're proposing that you can add an option today and totally > redefine what it means tomorrow. I do not think that's a plan. The above paragraph was intended to suggest that we could add 'summary' now to control the last few lines which are displayed after the plan (which would be a consistent definition of 'summary', even if we added things to the summary contents later) and that we could then add an independent option to control the output of plan nodes like 'sort' to allow for platform-independent output. I was not suggesting that 'summary' would control what 'sort' produces. Thanks! Stephen
On Wed, Dec 28, 2016 at 8:41 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Ashutosh Bapat (ashutosh.bapat@enterprisedb.com) wrote: >> >> One can use this option as >> >> postgres=# explain (summary on) select * from pg_class c, pg_type t >> >> where c.reltype = t.oid; >> >> QUERY PLAN >> >> -------------------------------------------------------------------------- >> >> Hash Join (cost=17.12..35.70 rows=319 width=511) >> >> Hash Cond: (c.reltype = t.oid) >> >> -> Seq Scan on pg_class c (cost=0.00..14.19 rows=319 width=259) >> >> -> Hash (cost=12.61..12.61 rows=361 width=256) >> >> -> Seq Scan on pg_type t (cost=0.00..12.61 rows=361 width=256) >> >> Planning time: 48.823 ms >> >> (6 rows) >> >> >> >> When analyze is specified, summary is also set to ON. By default this >> >> flag is OFF. >> >> >> > >> > I am not sure whether using *summary* to print just planning time is a >> > good idea. Another option could be SUMMARY_PLAN_TIME. >> >> I have just used the same name as the boolean which controls the >> printing of planning time. Suggestions are welcome though. We haven't >> used words with "_" for EXPLAIN options, so I am not sure about >> SUMMARY_PLAN_TIME. > > Using 'summary' seems entirely reasonable to me, I don't think we need > to complicate it by saying 'summary_plan_time'- I know that I'd had to > have to write that out. > >> > + /* Execution time matters only when analyze is requested */ >> > + if (es->summary && es->analyze) >> > >> > Do you really need es->summary in above check? > > I'm pretty sure we do. > > EXPLAIN (ANALYZE, SUMMARY OFF) > > Should not print the summary (which means it shouldn't print either the > planning or execution time). > Hmm, have you checked what output patch gives for above command, in above usage, it will print both planning and execution time. IIUC, then patch doesn't do what you have in mind. As far as I understand the proposed usage for the summary parameter was only for planning time. Now if you want to mean that it should be used to print the last two lines of Explain output (planning or execution time), then I think patch requires some change. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 28, 2016 at 10:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Stephen Frost <sfrost@snowman.net> writes: >> * Tom Lane (tgl@sss.pgh.pa.us) wrote: >>> I think it's an awful choice of name; it has nothing to do with either >>> the functionality or the printed name of the field. > >> As an example, we might some day wish to include a summary of buffer >> information at the bottom too when 'buffers' is used. The proposed >> 'summary' option would cover that nicely, but 'timing' wouldn't. That's >> actually why I was thinking summary might be a good option to have. > > What, would this option then turn off the total-time displays by default? > I do not see that being a reasonable thing to do. Basically, you're > taking what seems like a very general-purpose option name and nailing > it down to mean "print planning time". You aren't going to be able > to change that later. I don't intend to use "summary" to print only planning time. As Stephen has pointed out in his mail, it can be expanded later to include other things. But I guess, the documentation changes I included in the patch are the reason behind your objection. <varlistentry> + <term><literal>SUMMARY</literal></term> + <listitem> + <para> + Include planning time, except when used with <command>EXECUTE</command>. + Since <command>EXPLAIN EXECUTE</command> displays plan for a prepared + query, i.e. a query whose plan is already created, the planning time is + not available when <command>EXPLAIN EXECUTE</command> is executed. + It defaults to <literal>FALSE</literal>. + </para> + </listitem> + </varlistentry> + I think I did a bad job there. Sorry for that. I think we should reword the paragraph as "Include summary of planning and execution of query. When used without <literal>ANALYZE</literal> it prints planning time. When used with <literal>ANALYZE</literal>, this option is considered to be <literal>TRUE</literal> overriding user specified value and prints execution time. Since <command>EXPLAIN EXECUTE</command> displays plan for a prepared query, i.e. a query whose plan is already created, the planning time isnot available when <command>EXPLAIN EXECUTE</command> is executed. It defaults to <literal>FALSE</literal>." We can add more things to this later. I am not very happy with the sentence explaining ANALYZE, but that's how it is today. We can change that. With ANALYZE, SUMMARY is ON if user doesn't specify SUMMARY. But in case user specifies SUMMARY OFF with ANALYZE, we won't print execution and planning time. It's a conscious decision by user not to print those things. That will make the documentation straight forward. I am not so happy with EXPLAIN EXECUTE either, but it would be better to clarify the situation. Or we can print planning time as 0 for EXPLAIN EXECUTE. We can do better there as well. We can print planning time if the cached plan was invalidated and required planning, otherwise print 0. That would be a helpful diagnostic. I do think that there is some merit in reporting planning time as a whole just like execution time. Planning time is usually so small that users don't care about how it's split across various phases of planning. But with more and more complex queries and more and more planning techniques, it becomes essential to know module-wise (join planner, subquery planner, upper planner) timings. Developers certainly would like that, but advanced users who try to control optimizer might find it helpful. In that case, total planning time becomes a "summary". In this case "TIMING" would control reporting granular planning time and SUMMARY would control reporting overall printing time. I don't intend to add granular timings right now, and that wasn't something I was thinking of while writing this patch. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On Wed, Dec 28, 2016 at 10:55 PM, Stephen Frost <sfrost@snowman.net> wrote: > Tom, > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> Stephen Frost <sfrost@snowman.net> writes: >> > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> >> I think it's an awful choice of name; it has nothing to do with either >> >> the functionality or the printed name of the field. >> >> > As an example, we might some day wish to include a summary of buffer >> > information at the bottom too when 'buffers' is used. The proposed >> > 'summary' option would cover that nicely, but 'timing' wouldn't. That's >> > actually why I was thinking summary might be a good option to have. >> >> What, would this option then turn off the total-time displays by default? > > To retain our current mixed behavior with explain vs. explain-analyze, > we'd have to say it defaults to off for explain and on with analyze. I > don't particularly like that, and would rather we just default it to on, > but that would mean adjusting the regression tests. > >> I do not see that being a reasonable thing to do. Basically, you're >> taking what seems like a very general-purpose option name and nailing >> it down to mean "print planning time". You aren't going to be able >> to change that later. > > No, that's not what I was suggesting to do and I disagree that we > couldn't ever change it later. If we want it to mean "print planning > time" and only ever that then I agree that calling it "summary" isn't a > good option. No, that wasn't my intention either. I have clarified it in my mail. > >> > No, but consider how the docs for the current 'timing' option would have >> > to be rewritten. >> >> Well, sure, they'd have to be rewritten, but I think this definition >> would actually be more orthogonal. > > This definition would have two completely different meanings- one for > when analyze is used, and one for when it isn't. > >> > We would also have to say something like "the default when not using >> > 'analyze' is off, but with 'analyze' the default is on" which seems >> > pretty grotty to me. >> >> But the default for TIMING already does depend on ANALYZE. > > I would argue that timing can only actually be used with analyze today, > which makes sense when you consider that timing is about enabling or > disabling per-node timing information. Redefining it to mean something > else isn't particularly different from redefining 'summary' later to > mean something else. > >> > Then again, from a *user's* perspective, it should just be included by >> > default. >> >> Actually, the reason it hasn't gotten included is probably that the >> use-case for it is very small. If you just do psql \timing on an >> EXPLAIN, you get something close enough to the planning time. I don't >> mind adding this as an option, but claiming that it's so essential >> that it should be there by default is silly. People would have asked >> for it years ago if it were all that important. > > I don't buy this argument. Planning time is (hopefully, anyway...) a > rather small amount of time which means that the actual results from > \timing (or, worse, the timing info from other tools like pgAdmin) is > quite far off. On a local instance with a simple plan, you can get an > order-of-magnitude difference between psql's \timing output and the > actual planning time, throw in a few or even 10s of ms of network > latency and you might as well forget about trying to figure out what > the planning time actually is. +1. On my machine regression=# \timing Timing is on. regression=# explain select * from pg_class c, pg_type t where c.reltype = t.oid; [...] clipped plan Time: 1.202 ms regression=# \timing Timing is off. regression=# explain analyze select * from pg_class c, pg_type t where c.reltype = t.oid; [...] clipped plan Planning time: 0.332 msExecution time: 1.670 ms (8 rows) \timing output is way off than the actual planning time. Take another example regression=# explain analyze select * from pg_class c, pg_type t, pg_inherits i where c.reltype = t.oid and i.inhparent = c.oid; QUERY PLAN [ ... ] clipped plan Planning time: 0.592 msExecution time: 2.294 ms (13 rows) regression=# \timing Timing is on. regression=# explain select * from pg_class c, pg_type t, pg_inherits i where c.reltype = t.oid and i.inhparent = c.oid; [...] clipped plan Time: 1.831 ms The planning time has almost doubled, but what \timing reported has only grown by approximately 50%. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Here are patches for follwing 1. pg_explain_plan_time_v3 adds SUMMARY option which behaves as: SUMMARY when ON prints planning time. With ANALYZE ON, it also prints execution time. When user explicitly uses SUMMARY OFF, it does not print planning and execution time (even when ANALYZE is ON). By default SUMMARY is ON when ANALYZE is ON, otherwise SUMMARY defaults to OFF. Attached explain_summary_examples.out.txt shows examples. 2. explain_exec_timing adds support to print planning time in EXPLAIN EXECUTE output with SUMMARY option. In this case, planning time includes time required to fetch the plan from cache and plan the query if necessary (i.e. after invalidation or the first time it's executed.) E.g. prepare somestmt as select count(*) from t1 where a > 100; PREPARE postgres=# explain (summary on) execute somestmt; QUERY PLAN ----------------------------------------------------------- Aggregate (cost=40.13..40.14 rows=1 width=8) -> Seq Scan on t1 (cost=0.00..38.25 rows=753 width=0) Filter: (a > 100) Planning time: 0.245 ms (4 rows) postgres=# explain (summary on) execute somestmt; QUERY PLAN ----------------------------------------------------------- Aggregate (cost=40.13..40.14 rows=1 width=8) -> Seq Scan on t1 (cost=0.00..38.25 rows=753 width=0) Filter: (a > 100) Planning time: 0.012 ms (4 rows) Notice the difference in planning time. explain_summary_examples.out.txt has examples. On Thu, Dec 29, 2016 at 10:26 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > On Wed, Dec 28, 2016 at 10:55 PM, Stephen Frost <sfrost@snowman.net> wrote: >> Tom, >> >> * Tom Lane (tgl@sss.pgh.pa.us) wrote: >>> Stephen Frost <sfrost@snowman.net> writes: >>> > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >>> >> I think it's an awful choice of name; it has nothing to do with either >>> >> the functionality or the printed name of the field. >>> >>> > As an example, we might some day wish to include a summary of buffer >>> > information at the bottom too when 'buffers' is used. The proposed >>> > 'summary' option would cover that nicely, but 'timing' wouldn't. That's >>> > actually why I was thinking summary might be a good option to have. >>> >>> What, would this option then turn off the total-time displays by default? >> >> To retain our current mixed behavior with explain vs. explain-analyze, >> we'd have to say it defaults to off for explain and on with analyze. I >> don't particularly like that, and would rather we just default it to on, >> but that would mean adjusting the regression tests. >> >>> I do not see that being a reasonable thing to do. Basically, you're >>> taking what seems like a very general-purpose option name and nailing >>> it down to mean "print planning time". You aren't going to be able >>> to change that later. >> >> No, that's not what I was suggesting to do and I disagree that we >> couldn't ever change it later. If we want it to mean "print planning >> time" and only ever that then I agree that calling it "summary" isn't a >> good option. > > No, that wasn't my intention either. I have clarified it in my mail. > >> >>> > No, but consider how the docs for the current 'timing' option would have >>> > to be rewritten. >>> >>> Well, sure, they'd have to be rewritten, but I think this definition >>> would actually be more orthogonal. >> >> This definition would have two completely different meanings- one for >> when analyze is used, and one for when it isn't. >> >>> > We would also have to say something like "the default when not using >>> > 'analyze' is off, but with 'analyze' the default is on" which seems >>> > pretty grotty to me. >>> >>> But the default for TIMING already does depend on ANALYZE. >> >> I would argue that timing can only actually be used with analyze today, >> which makes sense when you consider that timing is about enabling or >> disabling per-node timing information. Redefining it to mean something >> else isn't particularly different from redefining 'summary' later to >> mean something else. >> >>> > Then again, from a *user's* perspective, it should just be included by >>> > default. >>> >>> Actually, the reason it hasn't gotten included is probably that the >>> use-case for it is very small. If you just do psql \timing on an >>> EXPLAIN, you get something close enough to the planning time. I don't >>> mind adding this as an option, but claiming that it's so essential >>> that it should be there by default is silly. People would have asked >>> for it years ago if it were all that important. >> >> I don't buy this argument. Planning time is (hopefully, anyway...) a >> rather small amount of time which means that the actual results from >> \timing (or, worse, the timing info from other tools like pgAdmin) is >> quite far off. On a local instance with a simple plan, you can get an >> order-of-magnitude difference between psql's \timing output and the >> actual planning time, throw in a few or even 10s of ms of network >> latency and you might as well forget about trying to figure out what >> the planning time actually is. > > +1. On my machine > > regression=# \timing > Timing is on. > regression=# explain select * from pg_class c, pg_type t where > c.reltype = t.oid; > > [...] clipped plan > > Time: 1.202 ms > regression=# \timing > Timing is off. > regression=# explain analyze select * from pg_class c, pg_type t where > c.reltype = t.oid; > > [...] clipped plan > > Planning time: 0.332 ms > Execution time: 1.670 ms > (8 rows) > > \timing output is way off than the actual planning time. > > Take another example > > regression=# explain analyze select * from pg_class c, pg_type t, > pg_inherits i where c.reltype = t.oid and i.inhparent = c.oid; > QUERY PLAN > [ ... ] clipped plan > > Planning time: 0.592 ms > Execution time: 2.294 ms > (13 rows) > > regression=# \timing > Timing is on. > regression=# explain select * from pg_class c, pg_type t, pg_inherits > i where c.reltype = t.oid and i.inhparent = c.oid; > [...] clipped plan > > Time: 1.831 ms > > The planning time has almost doubled, but what \timing reported has > only grown by approximately 50%. > > -- > Best Wishes, > Ashutosh Bapat > EnterpriseDB Corporation > The Postgres Database Company -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Ashutosh, I realize you were replying to yourself, but you probably didn't need to include the entire thread below or to top-post. * Ashutosh Bapat (ashutosh.bapat@enterprisedb.com) wrote: > 1. pg_explain_plan_time_v3 adds SUMMARY option which behaves as: > SUMMARY when ON prints planning time. With ANALYZE ON, it also prints > execution time. When user explicitly uses SUMMARY OFF, it does not > print planning and execution time (even when ANALYZE is ON). By > default SUMMARY is ON when ANALYZE is ON, otherwise SUMMARY defaults > to OFF. Attached explain_summary_examples.out.txt shows examples. Right, this is how I had been thinking 'summary' would behave. > 2. explain_exec_timing adds support to print planning time in EXPLAIN > EXECUTE output with SUMMARY option. In this case, planning time > includes time required to fetch the plan from cache and plan the query > if necessary (i.e. after invalidation or the first time it's > executed.) E.g. Ok. diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml [...] + Include planning time and execution time. Execution time is included + when <literal>ANALYZE</literal> is enabled. + It defaults to <literal>TRUE</literal> when <literal>ANALYZE</literal> + is enabled. Otherwise it defaults to <literal>FALSE</literal>. This seems to be part of the concern regarding the naming. I would reword this along these lines: Include summary information (eg: totalled timing information) after the query plan. Summary information is included by default when ANALYZE is used. Without ANALYZE, summary information is not included by default but can be enabled using this option. (that's not great, but hopefully it gets the point across) The code changes look alright on a cursory look, but we need to hammer down if we agree on this term or if we need to invent something else. Thanks! Stephen
On Wed, Jan 4, 2017 at 8:24 PM, Stephen Frost <sfrost@snowman.net> wrote: > Ashutosh, > > I realize you were replying to yourself, but you probably didn't need to > include the entire thread below or to top-post. Sorry, that was unintentional. > > * Ashutosh Bapat (ashutosh.bapat@enterprisedb.com) wrote: >> 1. pg_explain_plan_time_v3 adds SUMMARY option which behaves as: >> SUMMARY when ON prints planning time. With ANALYZE ON, it also prints >> execution time. When user explicitly uses SUMMARY OFF, it does not >> print planning and execution time (even when ANALYZE is ON). By >> default SUMMARY is ON when ANALYZE is ON, otherwise SUMMARY defaults >> to OFF. Attached explain_summary_examples.out.txt shows examples. > > Right, this is how I had been thinking 'summary' would behave. > >> 2. explain_exec_timing adds support to print planning time in EXPLAIN >> EXECUTE output with SUMMARY option. In this case, planning time >> includes time required to fetch the plan from cache and plan the query >> if necessary (i.e. after invalidation or the first time it's >> executed.) E.g. > > Ok. > > diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml > [...] > + Include planning time and execution time. Execution time is included > + when <literal>ANALYZE</literal> is enabled. > + It defaults to <literal>TRUE</literal> when <literal>ANALYZE</literal> > + is enabled. Otherwise it defaults to <literal>FALSE</literal>. > > This seems to be part of the concern regarding the naming. I would > reword this along these lines: > > Include summary information (eg: totalled timing information) after the > query plan. Summary information is included by default when ANALYZE is > used. Without ANALYZE, summary information is not included by default > but can be enabled using this option. > > (that's not great, but hopefully it gets the point across) I think it's better than mine which was "overfitting", if we allow some machine learning terminology here :). > > The code changes look alright on a cursory look, but we need to hammer > down if we agree on this term or if we need to invent something else. Agreed. Will wait for consensus. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On 2016-12-28 10:29:48 -0500, Tom Lane wrote: > How about just saying that the existing TIMING option turns this on, > if it's specified without ANALYZE? Right now that combination draws > an error: > > regression=# explain (timing on) select 1; > ERROR: EXPLAIN option TIMING requires ANALYZE > > so there's no existing usage that this would break. I don't like this much - I'd like (as previously stated in [1]) to be able to have an actual EXPLAIN ANALYZE (COSTS off, TIMING OFF) in tests because that shows the number of loops, rechecks, etc. Andres [1] http://archives.postgresql.org/message-id/20140603180548.GU24145%40awork2.anarazel.de
Andres Freund <andres@anarazel.de> writes: > On 2016-12-28 10:29:48 -0500, Tom Lane wrote: >> How about just saying that the existing TIMING option turns this on, > I don't like this much - I'd like (as previously stated in [1]) to be > able to have an actual EXPLAIN ANALYZE (COSTS off, TIMING OFF) in tests > because that shows the number of loops, rechecks, etc. Hmm ... regression=# EXPLAIN (analyze, COSTS off, TIMING OFF) select * from tenk1; QUERY PLAN -----------------------------------------------Seq Scan on tenk1 (actual rows=10000 loops=1)Planning time: 1.075 msExecutiontime: 2.723 ms (3 rows) I see your point. OK, that's a use case not within the scope of the original proposal, but it's a reasonable argument for having a SUMMARY OFF option. regards, tom lane
On Wed, Jan 4, 2017 at 7:59 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > Here are patches for following Those patches have received no code-level reviews, so moved to CF 2017-03. -- Michael
* Michael Paquier (michael.paquier@gmail.com) wrote: > On Wed, Jan 4, 2017 at 7:59 PM, Ashutosh Bapat > <ashutosh.bapat@enterprisedb.com> wrote: > > Here are patches for following > > Those patches have received no code-level reviews, so moved to CF 2017-03. Yeah, I've been rather busy with email and bug fixes, but this is still on my plate to work on very shortly. Thanks! Stephen
Ashutosh, * Ashutosh Bapat (ashutosh.bapat@enterprisedb.com) wrote: > Here are patches for follwing > 1. pg_explain_plan_time_v3 adds SUMMARY option which behaves as: > SUMMARY when ON prints planning time. With ANALYZE ON, it also prints > execution time. When user explicitly uses SUMMARY OFF, it does not > print planning and execution time (even when ANALYZE is ON). By > default SUMMARY is ON when ANALYZE is ON, otherwise SUMMARY defaults > to OFF. Attached explain_summary_examples.out.txt shows examples. > > 2. explain_exec_timing adds support to print planning time in EXPLAIN > EXECUTE output with SUMMARY option. In this case, planning time > includes time required to fetch the plan from cache and plan the query > if necessary (i.e. after invalidation or the first time it's > executed.) E.g. I'm going through these with an eye towards committing them soon. I've already adjusted some of the documentation and comments per our earlier discussion but I'm now reviewing the changes to ExplainExecuteQuery() and trying to understand the reasoning for not including the EvaluateParams() call in the planning time. Not including that feels to me like we're ending up leaving something out of the overall timing picture, which doesn't seem right. If we do include that, then planning time+execution time will equal the overall query time and that feels like the right approach to use here. Otherwise the overall query time is "planning time+execution time+something else that we don't tell you about" which doesn't seem good to me. Thoughts? Thanks! Stephen
On Tue, Mar 7, 2017 at 9:23 PM, Stephen Frost <sfrost@snowman.net> wrote: > Ashutosh, > > * Ashutosh Bapat (ashutosh.bapat@enterprisedb.com) wrote: >> Here are patches for follwing >> 1. pg_explain_plan_time_v3 adds SUMMARY option which behaves as: >> SUMMARY when ON prints planning time. With ANALYZE ON, it also prints >> execution time. When user explicitly uses SUMMARY OFF, it does not >> print planning and execution time (even when ANALYZE is ON). By >> default SUMMARY is ON when ANALYZE is ON, otherwise SUMMARY defaults >> to OFF. Attached explain_summary_examples.out.txt shows examples. >> >> 2. explain_exec_timing adds support to print planning time in EXPLAIN >> EXECUTE output with SUMMARY option. In this case, planning time >> includes time required to fetch the plan from cache and plan the query >> if necessary (i.e. after invalidation or the first time it's >> executed.) E.g. > > I'm going through these with an eye towards committing them soon. I've > already adjusted some of the documentation and comments per our earlier > discussion Thanks a lot. > but I'm now reviewing the changes to ExplainExecuteQuery() > and trying to understand the reasoning for not including the > EvaluateParams() call in the planning time. Not including that feels to > me like we're ending up leaving something out of the overall timing > picture, which doesn't seem right. > > If we do include that, then planning time+execution time will equal the > overall query time and that feels like the right approach to use here. > Otherwise the overall query time is "planning time+execution > time+something else that we don't tell you about" which doesn't seem > good to me. Thanks for pointing that out. I didn't include parameter evaluation time earlier, since it's not strictly planning time. But I think it's important to include the parameter evaluation since different set of parameters may cause planner to create a customized plan. So it looks like something we should include in the planning time. I have updated the patch to do so. I have also rebased the patches on top of current head, resolving a conflict. The new patches have slightly different names than previous ones, since I am now using git format-patch to create those. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Ashutosh, * Ashutosh Bapat (ashutosh.bapat@enterprisedb.com) wrote: > On Tue, Mar 7, 2017 at 9:23 PM, Stephen Frost <sfrost@snowman.net> wrote: > > I'm going through these with an eye towards committing them soon. I've > > already adjusted some of the documentation and comments per our earlier > > discussion > > Thanks a lot. I've pushed this with the editorialization of the documentation which we discussed up-thread along with some improvements to the comments along with your latest variable name suggestions and changes as discussed. Thanks! Stephen
On Thu, Mar 9, 2017 at 2:58 AM, Stephen Frost <sfrost@snowman.net> wrote: > Ashutosh, > > * Ashutosh Bapat (ashutosh.bapat@enterprisedb.com) wrote: >> On Tue, Mar 7, 2017 at 9:23 PM, Stephen Frost <sfrost@snowman.net> wrote: >> > I'm going through these with an eye towards committing them soon. I've >> > already adjusted some of the documentation and comments per our earlier >> > discussion >> >> Thanks a lot. > > I've pushed this with the editorialization of the documentation which > we discussed up-thread along with some improvements to the comments > along with your latest variable name suggestions and changes as > discussed. The documentation looks pretty good and comment changes look good too. The way those two have been phrased, it's easy to understand the defaults for SUMMARY, which otherwise could have been confusing. The test looks good too. Thanks for all the improvements. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company