Thread: Review of: explain / allow collecting row counts without timing info
Eric's review follows: Compiling on Ubuntu 10.04 LTS AMD64 on a GoGrid virtual machine from 2012-01-12 git checkout. Patch applied fine. 'make check' results in failures when this patch is put into place. ========================6 of 128 tests failed. ======================== Here are the relevant failures. parallel group (2 tests): create_view create_index create_index ... FAILED parallel group (9 tests): create_cast create_aggregate drop_if_exists typed_table vacuum constraints create_table_like triggers inherit inherit ... FAILED parallel group (20 tests): select_distinct_on btree_index update random select_distinct select_having namespace delete case hash_index union select_implicit select_into transactions portals subselect arrays aggregates join prepared_xacts join ... FAILED aggregates ... FAILED parallel group (15 tests): combocid portals_p2 advisory_lock xmlmap tsdicts guc functional_deps dependency select_views cluster tsearch window foreign_data foreign_key bitmapops foreign_data ... FAILED parallel group (19 tests): limit prepare copy2 conversion xml plancache returning temp sequence without_oid with rowtypes truncate polymorphism domain largeobject rangefuncs alter_table plpgsql alter_table ... FAILED -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 13.1.2012 18:07, Josh Berkus wrote: > Eric's review follows: > > Compiling on Ubuntu 10.04 LTS AMD64 on a GoGrid virtual machine from > 2012-01-12 git checkout. > > Patch applied fine. > > 'make check' results in failures when this patch is put into place. > > ======================== > 6 of 128 tests failed. > ======================== > > Here are the relevant failures. > > parallel group (2 tests): create_view create_index > create_index ... FAILED > parallel group (9 tests): create_cast create_aggregate drop_if_exists > typed_table vacuum constraints create_table_like triggers inherit > inherit ... FAILED > parallel group (20 tests): select_distinct_on btree_index update random > select_distinct select_having namespace delete case hash_index union > select_implicit select_into transactions portals subselect arrays > aggregates join prepared_xacts > join ... FAILED > aggregates ... FAILED > parallel group (15 tests): combocid portals_p2 advisory_lock xmlmap > tsdicts guc functional_deps dependency select_views cluster tsearch > window foreign_data foreign_key bitmapops > foreign_data ... FAILED > parallel group (19 tests): limit prepare copy2 conversion xml plancache > returning temp sequence without_oid with rowtypes truncate polymorphism > domain largeobject rangefuncs alter_table plpgsql > alter_table ... FAILED > Fixed. The default value of TIMING option did not work as intended, it was set to true even for plain EXPLAIN (without ANALYZE). In that case the EXPLAIN failed. Tomas
Attachment
On Fri, Jan 13, 2012 at 3:07 PM, Tomas Vondra <tv@fuzzy.cz> wrote: > > Fixed. The default value of TIMING option did not work as intended, it > was set to true even for plain EXPLAIN (without ANALYZE). In that case > the EXPLAIN failed. > I've applied this over the "show Heap Fetches in EXPLAIN ANALYZE output for index-only scans" patch. It applied and built and passes installcheck. I have not done a full review of this, but I want to say that I want this very much. Not only can I get the row counts, but also the Heap Fetches and the output of BUFFERS, without the overhead of timing. I haven't looked at contrib aspects of it at all. Thank you for making this. Cheers, Jeff
On Sat, Jan 21, 2012 at 10:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Fri, Jan 13, 2012 at 3:07 PM, Tomas Vondra <tv@fuzzy.cz> wrote: >> >> Fixed. The default value of TIMING option did not work as intended, it >> was set to true even for plain EXPLAIN (without ANALYZE). In that case >> the EXPLAIN failed. >> > > I've applied this over the "show Heap Fetches in EXPLAIN ANALYZE > output for index-only scans" patch. It applied and built and passes > installcheck. > > I have not done a full review of this, but I want to say that I want > this very much. Not only can I get the row counts, but also the Heap > Fetches and the output of BUFFERS, without the overhead of timing. I > haven't looked at contrib aspects of it at all. > > Thank you for making this. +1. After looking at this a bit, I think we can make the interface a bit more convenient. I'd like to propose that we call the EXPLAIN option "ROWS" rather than "TIMING", and that we provide the row-count information if *either* ROWS or ANALYZE is specified. Then, if you want rows without timing, you can say this: EXPLAIN (ROWS) query... Rather than, as in the approach taken by the current patch: EXPLAIN (ANALYZE, TIMING off) query... ...which is a little more long-winded. This also has the nice advantage of making the name of the EXPLAIN option match the name of the INSTRUMENT flag. Revised patch along those lines attached. Barring objections, I'll commit this version. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 3 Únor 2012, 17:12, Robert Haas wrote: > On Sat, Jan 21, 2012 at 10:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> On Fri, Jan 13, 2012 at 3:07 PM, Tomas Vondra <tv@fuzzy.cz> wrote: >>> >>> Fixed. The default value of TIMING option did not work as intended, it >>> was set to true even for plain EXPLAIN (without ANALYZE). In that case >>> the EXPLAIN failed. >>> >> >> I've applied this over the "show Heap Fetches in EXPLAIN ANALYZE >> output for index-only scans" patch. It applied and built and passes >> installcheck. >> >> I have not done a full review of this, but I want to say that I want >> this very much. Not only can I get the row counts, but also the Heap >> Fetches and the output of BUFFERS, without the overhead of timing. I >> haven't looked at contrib aspects of it at all. >> >> Thank you for making this. > > +1. > > After looking at this a bit, I think we can make the interface a bit > more convenient. I'd like to propose that we call the EXPLAIN option > "ROWS" rather than "TIMING", and that we provide the row-count > information if *either* ROWS or ANALYZE is specified. > > Then, if you want rows without timing, you can say this: > > EXPLAIN (ROWS) query... > > Rather than, as in the approach taken by the current patch: > > EXPLAIN (ANALYZE, TIMING off) query... > > ...which is a little more long-winded. This also has the nice > advantage of making the name of the EXPLAIN option match the name of > the INSTRUMENT flag. > > Revised patch along those lines attached. Barring objections, I'll > commit this version. I don't think changing the EXPLAIN syntax this way is a good idea. I think that one option should not silently enable/disable others, so ROWS should not enable ANALYZE automatically. It should throw an error just like the TIMING option does when invoked without ANALYZE (IIRC). Which leads to syntax like this EXPLAIN (ANALYZE, ROWS) and that's a bit strange because it mentions ROWS but actually manipulates TIMING behind the curtains. You can't actually do EXPLAIN (ANALYZE, ROWS off) because row counts are always collected. So I think the syntax I proposed makes more sense. kind regards Tomas
On Fri, Feb 3, 2012 at 12:08 PM, Tomas Vondra <tv@fuzzy.cz> wrote: > I don't think changing the EXPLAIN syntax this way is a good idea. I think > that one option should not silently enable/disable others, so ROWS should > not enable ANALYZE automatically. I didn't propose that. The point is that the desired behavior (however we name it) is a SUBSET of what ANALYZE does. So we can either: 1. Have ANALYZE enable all the behavior, and have another option (TIMING) that can be used to turn some of it back on again. 2. Have ANALYZE enable all the behavior, and have another option (ROWS) that enables just the subset of it that we want. I prefer #2 to #1, because I think it's a bit confusing to have an option whose effect is to partially disable another option. YMMV, of course. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3.2.2012 18:28, Robert Haas wrote: > On Fri, Feb 3, 2012 at 12:08 PM, Tomas Vondra <tv@fuzzy.cz> wrote: >> I don't think changing the EXPLAIN syntax this way is a good idea. I think >> that one option should not silently enable/disable others, so ROWS should >> not enable ANALYZE automatically. > > I didn't propose that. The point is that the desired behavior > (however we name it) is a SUBSET of what ANALYZE does. > > So we can either: > > 1. Have ANALYZE enable all the behavior, and have another option > (TIMING) that can be used to turn some of it back on again. > > 2. Have ANALYZE enable all the behavior, and have another option > (ROWS) that enables just the subset of it that we want. > > I prefer #2 to #1, because I think it's a bit confusing to have an > option whose effect is to partially disable another option. YMMV, of > course. OK, thanks for the explanation. I don't like the idea of subsets as it IMHO makes it less obvious what options are enabled. For example this EXPLAIN (ROWS) query... does not immediately show it's actually going to do ANALYZE. I prefer to keep the current 'ANALYZE' definition, i.e. collecting both row counts and timing data (which is what 99% of people wants anyway), and an option to disable the timing. And the BUFFERS option currently works exactly like that, so defining ROWS the way you proposed would be inconsistent with the current behavior. Sure, we could redefine BUFFERS as a subset, so you could do EXPLAIN (ROWS) ... instead of ... EXPLAIN (ANALYZE, TIMING off) EXPLAIN (BUFFERS) ... instead of ... EXPLAIN (ANALYZE, BUFFERSon) but what if someone wants both at the same time? Maybe he could do EXPLAIN (ROWS, BUFFERS) and treat that as a union of those subsets. I don't think it's worth it. I surely can live with both solutions (mine or the one you proposed). kind regards Tomas
On Fri, Feb 3, 2012 at 2:56 PM, Tomas Vondra <tv@fuzzy.cz> wrote: > OK, thanks for the explanation. I don't like the idea of subsets as it > IMHO makes it less obvious what options are enabled. For example this > > EXPLAIN (ROWS) query... > > does not immediately show it's actually going to do ANALYZE. Well, it isn't, if ANALYZE means rows + timing... > I prefer to keep the current 'ANALYZE' definition, i.e. collecting both > row counts and timing data (which is what 99% of people wants anyway), > and an option to disable the timing. > > And the BUFFERS option currently works exactly like that, so defining > ROWS the way you proposed would be inconsistent with the current behavior. > > Sure, we could redefine BUFFERS as a subset, so you could do > > EXPLAIN (ROWS) ... instead of ... EXPLAIN (ANALYZE, TIMING off) > EXPLAIN (BUFFERS) ... instead of ... EXPLAIN (ANALYZE, BUFFERS on) > > but what if someone wants both at the same time? Maybe he could do > > EXPLAIN (ROWS, BUFFERS) > > and treat that as a union of those subsets. I don't think it's worth it. Yeah, I forgot that we'd have to allow that, though I don't think it would be a big deal to fix that. > I surely can live with both solutions (mine or the one you proposed). Let's wait and see if anyone else has an opinion. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3.2.2012 22:51, Robert Haas wrote: > On Fri, Feb 3, 2012 at 2:56 PM, Tomas Vondra <tv@fuzzy.cz> wrote: >> OK, thanks for the explanation. I don't like the idea of subsets as it >> IMHO makes it less obvious what options are enabled. For example this >> >> EXPLAIN (ROWS) query... >> >> does not immediately show it's actually going to do ANALYZE. > > Well, it isn't, if ANALYZE means rows + timing... Yeah, sure. It depends on how you define ANALYZE. I see that as 'stats collected at runtime' (in contrast to a query plan, which is just a ... well, plan). >> but what if someone wants both at the same time? Maybe he could do >> >> EXPLAIN (ROWS, BUFFERS) >> >> and treat that as a union of those subsets. I don't think it's worth it. > > Yeah, I forgot that we'd have to allow that, though I don't think it > would be a big deal to fix that. > >> I surely can live with both solutions (mine or the one you proposed). > > Let's wait and see if anyone else has an opinion. Sure. And thanks for your feedback! Tomas
On Fri, Feb 03, 2012 at 11:12:33AM -0500, Robert Haas wrote: > After looking at this a bit, I think we can make the interface a bit > more convenient. I'd like to propose that we call the EXPLAIN option > "ROWS" rather than "TIMING", and that we provide the row-count > information if *either* ROWS or ANALYZE is specified. > > Then, if you want rows without timing, you can say this: > > EXPLAIN (ROWS) query... > > Rather than, as in the approach taken by the current patch: > > EXPLAIN (ANALYZE, TIMING off) query... > > ...which is a little more long-winded. I favor the originally-submitted syntax. I like having a single option, ANALYZE, signal whether to actually run the statement. The submitted syntax also aligns with the fact that would motivate someone to use it: "Timing has high overhead on my system." or "Timing makes the output unstable." We have both estimated row counts and actual row counts. It may someday prove useful to have an invocation that plans the query and shows estimated row counts, omitting only cost estimates. Having a "ROWS" option that implies running the query could complicate that effort. Thanks, nm
On Fri, Feb 3, 2012 at 8:12 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Jan 21, 2012 at 10:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> On Fri, Jan 13, 2012 at 3:07 PM, Tomas Vondra <tv@fuzzy.cz> wrote: >>> >>> Fixed. The default value of TIMING option did not work as intended, it >>> was set to true even for plain EXPLAIN (without ANALYZE). In that case >>> the EXPLAIN failed. >>> >> >> I've applied this over the "show Heap Fetches in EXPLAIN ANALYZE >> output for index-only scans" patch. It applied and built and passes >> installcheck. >> >> I have not done a full review of this, but I want to say that I want >> this very much. Not only can I get the row counts, but also the Heap >> Fetches and the output of BUFFERS, without the overhead of timing. I >> haven't looked at contrib aspects of it at all. >> >> Thank you for making this. > > +1. > > After looking at this a bit, I think we can make the interface a bit > more convenient. I'd like to propose that we call the EXPLAIN option > "ROWS" rather than "TIMING", and that we provide the row-count > information if *either* ROWS or ANALYZE is specified. If we are allowed to make big breaks with the past, I would get rid of ANALYZE altogether. "analyze" basically is a synonym of "explain", or is contained within it. You can't explain anything until you have first analyzed it. Without ANALYZE, you are explaining the analysis done by the planner. With it, you are explaining an analysis done by the planner and the executor. So we could do away with ANALYZE and have TIMING, ROWS, and BUFFERS, any of which causes the query to actually be executed, not just planned. I suspect we will be unwilling to make such a break with the past. In that case, I think I prefer the originally proposed semantics, even though I agree they are somewhat less natural. ANALYZE is a big flag that means "This query will be executed, not just planned". If we are not going to make a major break, but only nibble around the edges, then I don't think we should remove the property that the query will be executed if and only if ANALYZE is specified. Cheers, Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > I suspect we will be unwilling to make such a break with the past. In > that case, I think I prefer the originally proposed semantics, even > though I agree they are somewhat less natural. ANALYZE is a big flag > that means "This query will be executed, not just planned". If we are > not going to make a major break, but only nibble around the edges, > then I don't think we should remove the property that the query will > be executed if and only if ANALYZE is specified. Yeah, I think we need to preserve that property. Unexpectedly executing a query (which may have side-effects) is a very dangerous thing. People are used to the idea that ANALYZE == execute, and adding random other flags that also cause execution is going to burn somebody. regards, tom lane