Thread: Review of: explain / allow collecting row counts without timing info

Review of: explain / allow collecting row counts without timing info

From
Josh Berkus
Date:
Eric's review follows:

Compiling on Ubuntu 10.04 LTS AMD64 on a GoGrid virtual machine from
2012-01-12 git checkout.

Patch applied fine.

'make check' results in failures when this patch is put into place.

========================6 of 128 tests failed.
========================

Here are the relevant failures.

parallel group (2 tests):  create_view create_index    create_index             ... FAILED
parallel group (9 tests):  create_cast create_aggregate drop_if_exists
typed_table vacuum constraints create_table_like triggers inherit    inherit                  ... FAILED
parallel group (20 tests):  select_distinct_on btree_index update random
select_distinct select_having namespace delete case hash_index union
select_implicit select_into transactions portals subselect arrays
aggregates join prepared_xacts    join                     ... FAILED    aggregates               ... FAILED
parallel group (15 tests):  combocid portals_p2 advisory_lock xmlmap
tsdicts guc functional_deps dependency select_views cluster tsearch
window foreign_data foreign_key bitmapops    foreign_data             ... FAILED
parallel group (19 tests):  limit prepare copy2 conversion xml plancache
returning temp sequence without_oid with rowtypes truncate polymorphism
domain largeobject rangefuncs alter_table plpgsql    alter_table              ... FAILED


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Review of: explain / allow collecting row counts without timing info

From
Tomas Vondra
Date:
On 13.1.2012 18:07, Josh Berkus wrote:
> Eric's review follows:
>
> Compiling on Ubuntu 10.04 LTS AMD64 on a GoGrid virtual machine from
> 2012-01-12 git checkout.
>
> Patch applied fine.
>
> 'make check' results in failures when this patch is put into place.
>
> ========================
>  6 of 128 tests failed.
> ========================
>
> Here are the relevant failures.
>
> parallel group (2 tests):  create_view create_index
>      create_index             ... FAILED
> parallel group (9 tests):  create_cast create_aggregate drop_if_exists
> typed_table vacuum constraints create_table_like triggers inherit
>      inherit                  ... FAILED
> parallel group (20 tests):  select_distinct_on btree_index update random
> select_distinct select_having namespace delete case hash_index union
> select_implicit select_into transactions portals subselect arrays
> aggregates join prepared_xacts
>      join                     ... FAILED
>      aggregates               ... FAILED
> parallel group (15 tests):  combocid portals_p2 advisory_lock xmlmap
> tsdicts guc functional_deps dependency select_views cluster tsearch
> window foreign_data foreign_key bitmapops
>      foreign_data             ... FAILED
> parallel group (19 tests):  limit prepare copy2 conversion xml plancache
> returning temp sequence without_oid with rowtypes truncate polymorphism
> domain largeobject rangefuncs alter_table plpgsql
>      alter_table              ... FAILED
>

Fixed. The default value of TIMING option did not work as intended, it
was set to true even for plain EXPLAIN (without ANALYZE). In that case
the EXPLAIN failed.

Tomas

Attachment

Re: Review of: explain / allow collecting row counts without timing info

From
Jeff Janes
Date:
On Fri, Jan 13, 2012 at 3:07 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>
> Fixed. The default value of TIMING option did not work as intended, it
> was set to true even for plain EXPLAIN (without ANALYZE). In that case
> the EXPLAIN failed.
>

I've applied this over the "show Heap Fetches in EXPLAIN ANALYZE
output for index-only scans" patch.  It applied and built and passes
installcheck.

I have not done a full review of this, but I want to say that I want
this very much.  Not only can I get the row counts, but also the Heap
Fetches and the output of BUFFERS, without the overhead of timing.  I
haven't looked at contrib aspects of it at all.

Thank you for making this.

Cheers,

Jeff


Re: Review of: explain / allow collecting row counts without timing info

From
Robert Haas
Date:
On Sat, Jan 21, 2012 at 10:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Fri, Jan 13, 2012 at 3:07 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>>
>> Fixed. The default value of TIMING option did not work as intended, it
>> was set to true even for plain EXPLAIN (without ANALYZE). In that case
>> the EXPLAIN failed.
>>
>
> I've applied this over the "show Heap Fetches in EXPLAIN ANALYZE
> output for index-only scans" patch.  It applied and built and passes
> installcheck.
>
> I have not done a full review of this, but I want to say that I want
> this very much.  Not only can I get the row counts, but also the Heap
> Fetches and the output of BUFFERS, without the overhead of timing.  I
> haven't looked at contrib aspects of it at all.
>
> Thank you for making this.

+1.

After looking at this a bit, I think we can make the interface a bit
more convenient.  I'd like to propose that we call the EXPLAIN option
"ROWS" rather than "TIMING", and that we provide the row-count
information if *either* ROWS or ANALYZE is specified.

Then, if you want rows without timing, you can say this:

EXPLAIN (ROWS) query...

Rather than, as in the approach taken by the current patch:

EXPLAIN (ANALYZE, TIMING off) query...

...which is a little more long-winded.  This also has the nice
advantage of making the name of the EXPLAIN option match the name of
the INSTRUMENT flag.

Revised patch along those lines attached.  Barring objections, I'll
commit this version.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: Review of: explain / allow collecting row counts without timing info

From
"Tomas Vondra"
Date:
On 3 Únor 2012, 17:12, Robert Haas wrote:
> On Sat, Jan 21, 2012 at 10:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> On Fri, Jan 13, 2012 at 3:07 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>>>
>>> Fixed. The default value of TIMING option did not work as intended, it
>>> was set to true even for plain EXPLAIN (without ANALYZE). In that case
>>> the EXPLAIN failed.
>>>
>>
>> I've applied this over the "show Heap Fetches in EXPLAIN ANALYZE
>> output for index-only scans" patch.  It applied and built and passes
>> installcheck.
>>
>> I have not done a full review of this, but I want to say that I want
>> this very much.  Not only can I get the row counts, but also the Heap
>> Fetches and the output of BUFFERS, without the overhead of timing.  I
>> haven't looked at contrib aspects of it at all.
>>
>> Thank you for making this.
>
> +1.
>
> After looking at this a bit, I think we can make the interface a bit
> more convenient.  I'd like to propose that we call the EXPLAIN option
> "ROWS" rather than "TIMING", and that we provide the row-count
> information if *either* ROWS or ANALYZE is specified.
>
> Then, if you want rows without timing, you can say this:
>
> EXPLAIN (ROWS) query...
>
> Rather than, as in the approach taken by the current patch:
>
> EXPLAIN (ANALYZE, TIMING off) query...
>
> ...which is a little more long-winded.  This also has the nice
> advantage of making the name of the EXPLAIN option match the name of
> the INSTRUMENT flag.
>
> Revised patch along those lines attached.  Barring objections, I'll
> commit this version.

I don't think changing the EXPLAIN syntax this way is a good idea. I think
that one option should not silently enable/disable others, so ROWS should
not enable ANALYZE automatically. It should throw an error just like the
TIMING option does when invoked without ANALYZE (IIRC).

Which leads to syntax like this
 EXPLAIN (ANALYZE, ROWS)

and that's a bit strange because it mentions ROWS but actually manipulates
TIMING behind the curtains. You can't actually do
 EXPLAIN (ANALYZE, ROWS off)

because row counts are always collected. So I think the syntax I proposed
makes more sense.

kind regards
Tomas



Re: Review of: explain / allow collecting row counts without timing info

From
Robert Haas
Date:
On Fri, Feb 3, 2012 at 12:08 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
> I don't think changing the EXPLAIN syntax this way is a good idea. I think
> that one option should not silently enable/disable others, so ROWS should
> not enable ANALYZE automatically.

I didn't propose that.  The point is that the desired behavior
(however we name it) is a SUBSET of what ANALYZE does.

So we can either:

1. Have ANALYZE enable all the behavior, and have another option
(TIMING) that can be used to turn some of it back on again.

2. Have ANALYZE enable all the behavior, and have another option
(ROWS) that enables just the subset of it that we want.

I prefer #2 to #1, because I think it's a bit confusing to have an
option whose effect is to partially disable another option.  YMMV, of
course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Review of: explain / allow collecting row counts without timing info

From
Tomas Vondra
Date:
On 3.2.2012 18:28, Robert Haas wrote:
> On Fri, Feb 3, 2012 at 12:08 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>> I don't think changing the EXPLAIN syntax this way is a good idea. I think
>> that one option should not silently enable/disable others, so ROWS should
>> not enable ANALYZE automatically.
> 
> I didn't propose that.  The point is that the desired behavior
> (however we name it) is a SUBSET of what ANALYZE does.
> 
> So we can either:
> 
> 1. Have ANALYZE enable all the behavior, and have another option
> (TIMING) that can be used to turn some of it back on again.
> 
> 2. Have ANALYZE enable all the behavior, and have another option
> (ROWS) that enables just the subset of it that we want.
> 
> I prefer #2 to #1, because I think it's a bit confusing to have an
> option whose effect is to partially disable another option.  YMMV, of
> course.

OK, thanks for the explanation. I don't like the idea of subsets as it
IMHO makes it less obvious what options are enabled. For example this
  EXPLAIN (ROWS) query...

does not immediately show it's actually going to do ANALYZE.

I prefer to keep the current 'ANALYZE' definition, i.e. collecting both
row counts and timing data (which is what 99% of people wants anyway),
and an option to disable the timing.

And the BUFFERS option currently works exactly like that, so defining
ROWS the way you proposed would be inconsistent with the current behavior.

Sure, we could redefine BUFFERS as a subset, so you could do
 EXPLAIN (ROWS) ... instead of ... EXPLAIN (ANALYZE, TIMING off) EXPLAIN (BUFFERS) ... instead of ... EXPLAIN (ANALYZE,
BUFFERSon)
 

but what if someone wants both at the same time? Maybe he could do
 EXPLAIN (ROWS, BUFFERS)

and treat that as a union of those subsets. I don't think it's worth it.

I surely can live with both solutions (mine or the one you proposed).


kind regards
Tomas


Re: Review of: explain / allow collecting row counts without timing info

From
Robert Haas
Date:
On Fri, Feb 3, 2012 at 2:56 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
> OK, thanks for the explanation. I don't like the idea of subsets as it
> IMHO makes it less obvious what options are enabled. For example this
>
>   EXPLAIN (ROWS) query...
>
> does not immediately show it's actually going to do ANALYZE.

Well, it isn't, if ANALYZE means rows + timing...

> I prefer to keep the current 'ANALYZE' definition, i.e. collecting both
> row counts and timing data (which is what 99% of people wants anyway),
> and an option to disable the timing.
>
> And the BUFFERS option currently works exactly like that, so defining
> ROWS the way you proposed would be inconsistent with the current behavior.
>
> Sure, we could redefine BUFFERS as a subset, so you could do
>
>  EXPLAIN (ROWS) ... instead of ... EXPLAIN (ANALYZE, TIMING off)
>  EXPLAIN (BUFFERS) ... instead of ... EXPLAIN (ANALYZE, BUFFERS on)
>
> but what if someone wants both at the same time? Maybe he could do
>
>  EXPLAIN (ROWS, BUFFERS)
>
> and treat that as a union of those subsets. I don't think it's worth it.

Yeah, I forgot that we'd have to allow that, though I don't think it
would be a big deal to fix that.

> I surely can live with both solutions (mine or the one you proposed).

Let's wait and see if anyone else has an opinion.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Review of: explain / allow collecting row counts without timing info

From
Tomas Vondra
Date:
On 3.2.2012 22:51, Robert Haas wrote:
> On Fri, Feb 3, 2012 at 2:56 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>> OK, thanks for the explanation. I don't like the idea of subsets as it
>> IMHO makes it less obvious what options are enabled. For example this
>>
>>   EXPLAIN (ROWS) query...
>>
>> does not immediately show it's actually going to do ANALYZE.
> 
> Well, it isn't, if ANALYZE means rows + timing...

Yeah, sure. It depends on how you define ANALYZE. I see that as 'stats
collected at runtime' (in contrast to a query plan, which is just a ...
well, plan).

>> but what if someone wants both at the same time? Maybe he could do
>>
>>  EXPLAIN (ROWS, BUFFERS)
>>
>> and treat that as a union of those subsets. I don't think it's worth it.
> 
> Yeah, I forgot that we'd have to allow that, though I don't think it
> would be a big deal to fix that.
> 
>> I surely can live with both solutions (mine or the one you proposed).
> 
> Let's wait and see if anyone else has an opinion.

Sure. And thanks for your feedback!

Tomas


Re: Review of: explain / allow collecting row counts without timing info

From
Noah Misch
Date:
On Fri, Feb 03, 2012 at 11:12:33AM -0500, Robert Haas wrote:
> After looking at this a bit, I think we can make the interface a bit
> more convenient.  I'd like to propose that we call the EXPLAIN option
> "ROWS" rather than "TIMING", and that we provide the row-count
> information if *either* ROWS or ANALYZE is specified.
> 
> Then, if you want rows without timing, you can say this:
> 
> EXPLAIN (ROWS) query...
> 
> Rather than, as in the approach taken by the current patch:
> 
> EXPLAIN (ANALYZE, TIMING off) query...
> 
> ...which is a little more long-winded.

I favor the originally-submitted syntax.  I like having a single option,
ANALYZE, signal whether to actually run the statement.  The submitted syntax
also aligns with the fact that would motivate someone to use it: "Timing has
high overhead on my system." or "Timing makes the output unstable."

We have both estimated row counts and actual row counts.  It may someday prove
useful to have an invocation that plans the query and shows estimated row
counts, omitting only cost estimates.  Having a "ROWS" option that implies
running the query could complicate that effort.

Thanks,
nm


Re: Review of: explain / allow collecting row counts without timing info

From
Jeff Janes
Date:
On Fri, Feb 3, 2012 at 8:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Jan 21, 2012 at 10:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> On Fri, Jan 13, 2012 at 3:07 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>>>
>>> Fixed. The default value of TIMING option did not work as intended, it
>>> was set to true even for plain EXPLAIN (without ANALYZE). In that case
>>> the EXPLAIN failed.
>>>
>>
>> I've applied this over the "show Heap Fetches in EXPLAIN ANALYZE
>> output for index-only scans" patch.  It applied and built and passes
>> installcheck.
>>
>> I have not done a full review of this, but I want to say that I want
>> this very much.  Not only can I get the row counts, but also the Heap
>> Fetches and the output of BUFFERS, without the overhead of timing.  I
>> haven't looked at contrib aspects of it at all.
>>
>> Thank you for making this.
>
> +1.
>
> After looking at this a bit, I think we can make the interface a bit
> more convenient.  I'd like to propose that we call the EXPLAIN option
> "ROWS" rather than "TIMING", and that we provide the row-count
> information if *either* ROWS or ANALYZE is specified.

If we are allowed to make big breaks with the past, I would get rid of
ANALYZE altogether.

"analyze" basically is a synonym of "explain", or is contained within
it.  You can't explain anything until you have first analyzed it.

Without ANALYZE, you are explaining the analysis done by the planner.
With it, you are explaining an analysis done by the planner and the
executor.

So we could do away with ANALYZE and have TIMING, ROWS, and BUFFERS,
any of which causes the query to actually be executed, not just
planned.

I suspect we will be unwilling to make such a break with the past.  In
that case, I think I prefer the originally proposed semantics, even
though I agree they are somewhat less natural.  ANALYZE is a big flag
that means "This query will be executed, not just planned".  If we are
not going to make a major break, but only nibble around the edges,
then I don't think we should remove the property that the query will
be executed if and only if ANALYZE is specified.

Cheers,

Jeff


Re: Review of: explain / allow collecting row counts without timing info

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> I suspect we will be unwilling to make such a break with the past.  In
> that case, I think I prefer the originally proposed semantics, even
> though I agree they are somewhat less natural.  ANALYZE is a big flag
> that means "This query will be executed, not just planned".  If we are
> not going to make a major break, but only nibble around the edges,
> then I don't think we should remove the property that the query will
> be executed if and only if ANALYZE is specified.

Yeah, I think we need to preserve that property.  Unexpectedly executing
a query (which may have side-effects) is a very dangerous thing.  People
are used to the idea that ANALYZE == execute, and adding random other
flags that also cause execution is going to burn somebody.
        regards, tom lane