Re: hashagg slowdown due to spill changes - Mailing list pgsql-hackers

From Andres Freund
Subject Re: hashagg slowdown due to spill changes
Date
Msg-id 20200612213715.op4ye4q7gktqvpuo@alap3.anarazel.de
Whole thread Raw
In response to Re: hashagg slowdown due to spill changes  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: hashagg slowdown due to spill changes
Re: hashagg slowdown due to spill changes
List pgsql-hackers
Hi,

On 2020-06-11 11:14:02 -0700, Jeff Davis wrote:
> On Thu, 2020-06-11 at 10:45 -0700, Andres Freund wrote:
> > Did you run any performance tests?
>
> Yes, I reproduced your ~12% regression from V12, and this patch nearly
> eliminated it for me.

I spent a fair bit of time looking at the difference. Jeff had let me
know on chat that he was still seeing some difference, but couldn't
quite figure out where that was.

Trying it out myself, I observed that the patch helped, but not that
much. After a bit I found one major reason for why:
LookupTupleHashEntryHash() assigned the hash to pointer provided by the
caller's before doing the insertion. That ended up causing a pipeline
stall (I assume it's store forwarding, but not sure). Moving the
assignment to the caller variable to after the insertion got rid of
that.

It got within 3-4% after that change. I did a number of small
microoptimizations that each helped, but didn't get quite get to the
level of 12.

Finally I figured out that that's due to an issue outside of nodeAgg.c
itself:

commit 4cad2534da6d17067d98cf04be2dfc1bda8f2cd0
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Date:   2020-05-31 14:43:13 +0200

    Use CP_SMALL_TLIST for hash aggregate

Due to this change we end up with an additional projection in queries
like this:

postgres[212666][1]=# \d fewgroups_many_rows
         Table "public.fewgroups_many_rows"
┌────────┬─────────┬───────────┬──────────┬─────────┐
│ Column │  Type   │ Collation │ Nullable │ Default │
├────────┼─────────┼───────────┼──────────┼─────────┤
│ cat    │ integer │           │ not null │         │
│ val    │ integer │           │ not null │         │
└────────┴─────────┴───────────┴──────────┴─────────┘

postgres[212666][1]=# explain SELECT cat, count(*) FROM fewgroups_many_rows GROUP BY 1;
┌───────────────────────────────────────────────────────────────────────────────────────┐
│                                      QUERY PLAN                                       │
├───────────────────────────────────────────────────────────────────────────────────────┤
│ HashAggregate  (cost=1942478.48..1942478.53 rows=5 width=12)                          │
│   Group Key: cat                                                                      │
│   ->  Seq Scan on fewgroups_many_rows  (cost=0.00..1442478.32 rows=100000032 width=4) │
└───────────────────────────────────────────────────────────────────────────────────────┘
(3 rows)

as 'val' is "projected away"..


After neutering the tlist change, Jeff's patch and my changes to it
yield performance *above* v12.


I don't see why it's ok to force an additional projection in the very
common case of hashaggs over a few rows. So I think we need to rethink
4cad2534da6.

Greetings,

Andres Freund

Attachment

pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: Internal key management system
Next
From: Ranier Vilela
Date:
Subject: Re: Postgresql13_beta1 (could not rename temporary statistics file)Windows 64bits