Hi,
On 2020-06-11 11:14:02 -0700, Jeff Davis wrote:
> On Thu, 2020-06-11 at 10:45 -0700, Andres Freund wrote:
> > Did you run any performance tests?
>
> Yes, I reproduced your ~12% regression from V12, and this patch nearly
> eliminated it for me.
I spent a fair bit of time looking at the difference. Jeff had let me
know on chat that he was still seeing some difference, but couldn't
quite figure out where that was.
Trying it out myself, I observed that the patch helped, but not that
much. After a bit I found one major reason for why:
LookupTupleHashEntryHash() assigned the hash to pointer provided by the
caller's before doing the insertion. That ended up causing a pipeline
stall (I assume it's store forwarding, but not sure). Moving the
assignment to the caller variable to after the insertion got rid of
that.
It got within 3-4% after that change. I did a number of small
microoptimizations that each helped, but didn't get quite get to the
level of 12.
Finally I figured out that that's due to an issue outside of nodeAgg.c
itself:
commit 4cad2534da6d17067d98cf04be2dfc1bda8f2cd0
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Date: 2020-05-31 14:43:13 +0200
Use CP_SMALL_TLIST for hash aggregate
Due to this change we end up with an additional projection in queries
like this:
postgres[212666][1]=# \d fewgroups_many_rows
Table "public.fewgroups_many_rows"
┌────────┬─────────┬───────────┬──────────┬─────────┐
│ Column │ Type │ Collation │ Nullable │ Default │
├────────┼─────────┼───────────┼──────────┼─────────┤
│ cat │ integer │ │ not null │ │
│ val │ integer │ │ not null │ │
└────────┴─────────┴───────────┴──────────┴─────────┘
postgres[212666][1]=# explain SELECT cat, count(*) FROM fewgroups_many_rows GROUP BY 1;
┌───────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────────────────────────────────────────┤
│ HashAggregate (cost=1942478.48..1942478.53 rows=5 width=12) │
│ Group Key: cat │
│ -> Seq Scan on fewgroups_many_rows (cost=0.00..1442478.32 rows=100000032 width=4) │
└───────────────────────────────────────────────────────────────────────────────────────┘
(3 rows)
as 'val' is "projected away"..
After neutering the tlist change, Jeff's patch and my changes to it
yield performance *above* v12.
I don't see why it's ok to force an additional projection in the very
common case of hashaggs over a few rows. So I think we need to rethink
4cad2534da6.
Greetings,
Andres Freund