Re: EXPLAIN VERBOSE with parallel Aggregate - Mailing list pgsql-hackers

From David Rowley
Subject Re: EXPLAIN VERBOSE with parallel Aggregate
Date
Msg-id CAKJS1f-kp2qxtWGL3SDu2vRoZps8Mn--vyUOKkJLUzsS6P3Acg@mail.gmail.com
Whole thread Raw
In response to Re: EXPLAIN VERBOSE with parallel Aggregate  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: EXPLAIN VERBOSE with parallel Aggregate
List pgsql-hackers
On 16 April 2016 at 04:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I definitely agree that the current output is messed up, but I'm not
>> sure your proposed output is much better.  I wonder if it shouldn't
>> say something like:
>> Output: serialfn(transfn(args))
>> for the partial aggregate and
>> Output: finalfn(combinefn(deserialfn(args)))
>> for the finalize aggregate step.
>
>> Or maybe just insert the word PARTIAL before each partial aggregate
>> step, like PARTIAL sum(num) for the partial step and then just
>> sum(num) for the final step.
>
> +1 for the latter, if we can do it conveniently.  I think exposing
> the names of the aggregate implementation functions would be very
> user-unfriendly, as nobody but us hackers knows what those are.

It does not really seem all that convenient to do this. It also seems
a bit strange to me to have a parent node report a column which does
not exist in any nodes descending from it. Remember that the combine
Aggref does not have the same ->args as its corresponding partial
Aggref. It's not all that clear to me if there is any nice way to do
have this work the way you'd like. If we were to just call
get_rule_expr() on the first arg of the combine aggregate node, it
would re-print the PARTIAL keyword again.

In the attached I have it displaying as:

postgres=# explain verbose select count(*),max(a),avg(a) filter (where
a = 0) from t;
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=14739.56..14739.57 rows=1 width=44)
   Output: pg_catalog.count(*), max((max(a))), pg_catalog.avg((PARTIAL
avg(a) FILTER (WHERE (a = 0))))
   ->  Gather  (cost=14739.33..14739.54 rows=2 width=44)
         Output: (count(*)), (max(a)), (PARTIAL avg(a) FILTER (WHERE (a = 0)))
         Workers Planned: 2
         ->  Partial Aggregate  (cost=13739.33..13739.34 rows=1 width=44)
               Output: count(*), max(a), PARTIAL avg(a) FILTER (WHERE (a = 0))
               ->  Parallel Seq Scan on public.t  (cost=0.00..9572.67
rows=416667 width=4)
                     Output: a
(9 rows)

I like this much better, as there's no fudging of any function
arguments to print them as something they're not.

Note that max() does not get tagged with PARTIAL since it has no
finalfn. Obtaining this information does require a syscache lookup in
get_agg_expr(), but I've managed to make that only happen when
aggpartial is true.

>> I think ending up with sum(sum(num)) is
>> right out.  It doesn't look so bad for that case but avg(avg(num))
>> would certainly imply something that's not the actual behavior.
>
> Agreed.

Note that I've done nothing for the weird schema prefixing problem I
mentioned. I think I'd need input on the best way to solve this. If
it's actually a problem.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: parallel query vs extensions
Next
From: Craig Ringer
Date:
Subject: Timeline following is a bit tangled...