Home > mailing lists

Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: HashAgg's batching counter starts at 0, but Hash's starts at 1.
Date	July 27, 2020 01:48:45
Msg-id	CAApHDvqTvD3bXBiC5cmuoATFYeP+RSkv5WTgfyuBb6JpwKmSoQ@mail.gmail.com Whole thread Raw
In response to	Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (Jeff Davis <pgsql@j-davis.com>)
Responses	Re: HashAgg's batching counter starts at 0, but Hash's starts at 1.
List	pgsql-hackers

Tree view

On Wed, 1 Jul 2020 at 18:46, Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Tue, Jun 30, 2020, 7:04 PM David Rowley <dgrowleyml@gmail.com> wrote:
>>
>> Does anyone have any objections to that being changed?
>
> That's OK with me. By the way, I'm on vacation and will catch up on these HashAgg threads next week.

(Adding Justin as I know he's expressed interest in the EXPLAIN output
of HashAgg before)

I've written a patch to bring the HashAgg EXPLAIN ANALYZE output to be
more aligned to the Hash Join output.

Couple of things I observed about Hash Join EXPLAIN ANALYZE:
1. The number of batches starts at 1.
2. We always display the number of batches.
3. We write "Batches" for text format and "Hash Batches" for non-text formats.
4. We write "Memory Usage" for text format and "Peak Memory Usage" for
non-text formats.
5. "Batches" comes before memory usage.

Before this patch, HashAgg EXPLAIN ANALYZE output would:
1. Start the number of batches at 0.
2. Only display "Hash Batches" when batches > 0.
3. Used the words "HashAgg Batches" for text and non-text formats.
4. Used the words "Peak Memory Usage" for text and non-text formats.
5. "Hash Batches" was written after memory usage.

In the attached patch I've changed HashAgg to be aligned to Hash Join
on each of the points above.

e.g.

Before:

postgres=# explain analyze select c.relname,count(*) from pg_class c
inner join pg_Attribute a on c.oid = a.attrelid group by c.relname;
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=138.37..142.23 rows=386 width=72) (actual
time=3.121..3.201 rows=427 loops=1)
   Group Key: c.relname
   Peak Memory Usage: 109kB
   ->  Hash Join  (cost=21.68..124.10 rows=2855 width=64) (actual
time=0.298..1.768 rows=3153 loops=1)
         Hash Cond: (a.attrelid = c.oid)
         ->  Seq Scan on pg_attribute a  (cost=0.00..93.95 rows=3195
width=4) (actual time=0.011..0.353 rows=3153 loops=1)
         ->  Hash  (cost=16.86..16.86 rows=386 width=68) (actual
time=0.279..0.279 rows=427 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 50kB
               ->  Seq Scan on pg_class c  (cost=0.00..16.86 rows=386
width=68) (actual time=0.007..0.112 rows=427 loops=1)
 Planning Time: 0.421 ms
 Execution Time: 3.294 ms
(11 rows)

After:

postgres=# explain analyze select c.relname,count(*) from pg_class c
inner join pg_Attribute a on c.oid = a.attrelid group by c.relname;
                                                          QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=566.03..580.00 rows=1397 width=72) (actual
time=13.097..13.430 rows=1397 loops=1)
   Group Key: c.relname
   Batches: 1  Memory Usage: 321kB
   ->  Hash Join  (cost=64.43..496.10 rows=13985 width=64) (actual
time=0.838..7.546 rows=13985 loops=1)
         Hash Cond: (a.attrelid = c.oid)
         ->  Seq Scan on pg_attribute a  (cost=0.00..394.85 rows=13985
width=4) (actual time=0.010..1.462 rows=13985 loops=1)
         ->  Hash  (cost=46.97..46.97 rows=1397 width=68) (actual
time=0.820..0.821 rows=1397 loops=1)
               Buckets: 2048  Batches: 1  Memory Usage: 153kB
               ->  Seq Scan on pg_class c  (cost=0.00..46.97 rows=1397
width=68) (actual time=0.009..0.362 rows=1397 loops=1)
 Planning Time: 0.440 ms
 Execution Time: 13.634 ms
(11 rows)

(ignore the change in memory consumption. That was due to adding
records for testing)

Any objections to this change?

David

Attachment

yet_more_hashagg_explain_fixes.patch

pgsql-hackers by date:

From: Thomas Munro
Date: 27 July 2020, 01:17:36
Subject: Re: Parallel bitmap index scan

From: Jeff Davis
Date: 27 July 2020, 02:17:38
Subject: Re: hashagg slowdown due to spill changes

Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. - Mailing list pgsql-hackers

Attachment

Previous

Next