Thread: hashagg slowdown due to spill changes

hashagg slowdown due to spill changes

From
Andres Freund
Date:
Hi,

While preparing my pgcon talk I noticed that our hash-agg performance
degraded noticeably. Looks to me like it's due to the spilling-hashagg
changes.

Sample benchmark:

config:
-c huge_pages=on -c shared_buffers=32GB -c jit=0 -c max_parallel_workers_per_gather=0
(largely just to reduce variance)

data prep:
CREATE TABLE fewgroups_many_rows AS SELECT (random() * 4)::int cat, (random()*10000)::int val FROM generate_series(1,
100000000);
VACUUM (FREEZE, ANALYZE) fewgroups_many_rows;

test prep:
CREATE EXTENSION IF NOT EXISTS pg_prewarm;SELECT pg_prewarm('fewgroups_many_rows', 'buffer');

test:
SELECT cat, count(*) FROM fewgroups_many_rows GROUP BY 1;

Using best-of-three timing:

12                  12221.031 ms
master              13855.129 ms

While not the end of the world, that's a definitely noticable and
reproducible slowdown (~12%).

I don't think this is actually an inherent cost, but a question of how
the code ended up being organized. Here's a perf diff of profiles for
both versions:

# Baseline  Delta Abs  Shared Object     Symbol
# ........  .........  ................  .........................................
#
               +6.70%  postgres          [.] LookupTupleHashEntryHash
               +6.37%  postgres          [.] prepare_hash_slot
               +4.74%  postgres          [.] TupleHashTableHash_internal.isra.0
    20.36%     -2.89%  postgres          [.] ExecInterpExpr
     6.31%     -2.73%  postgres          [.] lookup_hash_entries
               +2.36%  postgres          [.] lookup_hash_entry
               +2.14%  postgres          [.] ExecJustAssignScanVar
     2.28%     +1.97%  postgres          [.] ExecScan
     2.54%     +1.93%  postgres          [.] MemoryContextReset
     3.84%     -1.86%  postgres          [.] SeqNext
    10.19%     -1.50%  postgres          [.] tts_buffer_heap_getsomeattrs
               +1.42%  postgres          [.] hash_bytes_uint32
               +1.39%  postgres          [.] TupleHashTableHash
               +1.10%  postgres          [.] tts_virtual_clear
     3.36%     -0.74%  postgres          [.] ExecAgg
               +0.45%  postgres          [.] CheckForSerializableConflictOutNeeded
     0.25%     +0.44%  postgres          [.] hashint4
     5.80%     -0.35%  postgres          [.] tts_minimal_getsomeattrs
     1.91%     -0.33%  postgres          [.] heap_getnextslot
     4.86%     -0.32%  postgres          [.] heapgettup_pagemode
     1.46%     -0.32%  postgres          [.] tts_minimal_clear

While some of this is likely is just noise, it's pretty clear that we
spend a substantial amount of additional time below
lookup_hash_entries().

And looking at the code, I'm not too surprised:

Before there was basically one call from nodeAgg.c to execGrouping.c for
each tuple and hash table. Now it's a lot more complicated:
1) nodeAgg.c: prepare_hash_slot()
2) execGrouping.c: TupleHashTableHash()
3) nodeAgg.c: lookup_hash_entry()
4) execGrouping.c: LookupTupleHashEntryHash()

For each of these data needs to be peeled out of one or more of AggState
/ AggStatePerHashData / TupleHashTable. There's no way the compiler can
know that nothing inside those changes, therefore it has to reload the
contents repeatedly.  By my look at the profiles, that's where most of
the time is going.

There's also the issue that the signalling whether to insert / not to
insert got unnecessarily complicated. There's several checks:
1) lookup_hash_entry() (p_isnew = aggstate->hash_spill_mode ? NULL : &isnew;)
2) LookupTupleHashEntry_internal() (if (isnew))
3) lookup_hash_entry() (if (entry == NULL) and if (isnew))
4) lookup_hash_entries() if (!in_hash_table)

Not performance related: I am a bit confused why the new per-hash stuff
in lookup_hash_entries() isn't in lookup_hash_entry()? I assume that's
because of agg_refill_hash_table()?


Why isn't the flow more like this:
1) prepare_hash_slot()
2) if (aggstate->hash_spill_mode) goto 3; else goto 4
3) entry = LookupTupleHashEntry(&hash); if (!entry) hashagg_spill_tuple();
4) InsertTupleHashEntry(&hash, &isnew); if (isnew) initialize(entry)

That way there's again exactly one call to execGrouping.c, there's no
need for nodeAgg to separately compute the hash, there's far fewer
branches...

Doing things this way might perhaps make agg_refill_hash_table() a tiny
bit more complicated, but it'll also avoid the slowdown for the vast
majority of cases where we're not spilling.

Greetings,

Andres Freund



Re: hashagg slowdown due to spill changes

From
Alvaro Herrera
Date:
On 2020-Jun-05, Andres Freund wrote:

> While preparing my pgcon talk I noticed that our hash-agg performance
> degraded noticeably. Looks to me like it's due to the spilling-hashagg
> changes.

Jeff, what are your thoughts on this?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: hashagg slowdown due to spill changes

From
Jeff Davis
Date:
On Fri, 2020-06-05 at 21:11 -0700, Andres Freund wrote:
> Before there was basically one call from nodeAgg.c to execGrouping.c
> for
> each tuple and hash table. Now it's a lot more complicated:
> 1) nodeAgg.c: prepare_hash_slot()
> 2) execGrouping.c: TupleHashTableHash()
> 3) nodeAgg.c: lookup_hash_entry()
> 4) execGrouping.c: LookupTupleHashEntryHash()

The reason that I did it that way was to be able to store the hash
along with the saved tuple (similar to what HashJoin does), which
avoids recalculation.

That could be a nice savings for some cases, like when work_mem is
small but the data still fits in system memory, which I expect to be
fairly common. But based on your numbers, it might be a bad trade-off
overall.

> Why isn't the flow more like this:
> 1) prepare_hash_slot()
> 2) if (aggstate->hash_spill_mode) goto 3; else goto 4
> 3) entry = LookupTupleHashEntry(&hash); if (!entry)
> hashagg_spill_tuple();
> 4) InsertTupleHashEntry(&hash, &isnew); if (isnew) initialize(entry)

I'll work up a patch to refactor this. I'd still like to see if we can
preserve the calculate-hash-once behavior somehow.

Regards,
    Jeff Davis
 




Re: hashagg slowdown due to spill changes

From
Jeff Davis
Date:
On Fri, 2020-06-05 at 21:11 -0700, Andres Freund wrote:
> Why isn't the flow more like this:
> 1) prepare_hash_slot()
> 2) if (aggstate->hash_spill_mode) goto 3; else goto 4
> 3) entry = LookupTupleHashEntry(&hash); if (!entry)
> hashagg_spill_tuple();
> 4) InsertTupleHashEntry(&hash, &isnew); if (isnew) initialize(entry)

I see, you are suggesting that I change around the execGrouping.c
signatures to return the hash, which will avoid the extra call. That
makes more sense.

Regards,
    Jeff Davis







Re: hashagg slowdown due to spill changes

From
Andres Freund
Date:
Hi,

On 2020-06-08 13:41:29 -0700, Jeff Davis wrote:
> On Fri, 2020-06-05 at 21:11 -0700, Andres Freund wrote:
> > Before there was basically one call from nodeAgg.c to execGrouping.c
> > for
> > each tuple and hash table. Now it's a lot more complicated:
> > 1) nodeAgg.c: prepare_hash_slot()
> > 2) execGrouping.c: TupleHashTableHash()
> > 3) nodeAgg.c: lookup_hash_entry()
> > 4) execGrouping.c: LookupTupleHashEntryHash()
> 
> The reason that I did it that way was to be able to store the hash
> along with the saved tuple (similar to what HashJoin does), which
> avoids recalculation.

That makes sense. But then you can just use a separate call into
execGrouping for that purpose.


> > Why isn't the flow more like this:
> > 1) prepare_hash_slot()
> > 2) if (aggstate->hash_spill_mode) goto 3; else goto 4
> > 3) entry = LookupTupleHashEntry(&hash); if (!entry)
> > hashagg_spill_tuple();
> > 4) InsertTupleHashEntry(&hash, &isnew); if (isnew) initialize(entry)
> 
> I'll work up a patch to refactor this. I'd still like to see if we can
> preserve the calculate-hash-once behavior somehow.

Cool!

Greetings,

Andres Freund



Re: hashagg slowdown due to spill changes

From
Jeff Davis
Date:
On Fri, 2020-06-05 at 21:11 -0700, Andres Freund wrote:
> Hi,
> 
> While preparing my pgcon talk I noticed that our hash-agg performance
> degraded noticeably. Looks to me like it's due to the spilling-
> hashagg
> changes.

Attached a proposed fix. (Might require some minor cleanup).

The only awkward part is that LookupTupleHashEntry() needs a new out
parameter to pass the hash value back to the caller. Ordinarily, the
caller can get that from the returned entry, but if isnew==NULL, then
the function might return NULL (and the caller wouldn't have an entry
from which to read the hash value).

Regards,
    Jeff Davis


Attachment

Re: hashagg slowdown due to spill changes

From
Andres Freund
Date:
Hi,

On June 10, 2020 6:15:39 PM PDT, Jeff Davis <pgsql@j-davis.com> wrote:
>On Fri, 2020-06-05 at 21:11 -0700, Andres Freund wrote:
>> Hi,
>>
>> While preparing my pgcon talk I noticed that our hash-agg performance
>> degraded noticeably. Looks to me like it's due to the spilling-
>> hashagg
>> changes.
>
>Attached a proposed fix. (Might require some minor cleanup).
>
>The only awkward part is that LookupTupleHashEntry() needs a new out
>parameter to pass the hash value back to the caller. Ordinarily, the
>caller can get that from the returned entry, but if isnew==NULL, then
>the function might return NULL (and the caller wouldn't have an entry
>from which to read the hash value).

Great!

Did you run any performance tests?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: hashagg slowdown due to spill changes

From
Jeff Davis
Date:
On Thu, 2020-06-11 at 10:45 -0700, Andres Freund wrote:
> Did you run any performance tests?

Yes, I reproduced your ~12% regression from V12, and this patch nearly
eliminated it for me.

Regards,
    Jeff Davis





Re: hashagg slowdown due to spill changes

From
Andres Freund
Date:
Hi,

On 2020-06-11 11:14:02 -0700, Jeff Davis wrote:
> On Thu, 2020-06-11 at 10:45 -0700, Andres Freund wrote:
> > Did you run any performance tests?
>
> Yes, I reproduced your ~12% regression from V12, and this patch nearly
> eliminated it for me.

I spent a fair bit of time looking at the difference. Jeff had let me
know on chat that he was still seeing some difference, but couldn't
quite figure out where that was.

Trying it out myself, I observed that the patch helped, but not that
much. After a bit I found one major reason for why:
LookupTupleHashEntryHash() assigned the hash to pointer provided by the
caller's before doing the insertion. That ended up causing a pipeline
stall (I assume it's store forwarding, but not sure). Moving the
assignment to the caller variable to after the insertion got rid of
that.

It got within 3-4% after that change. I did a number of small
microoptimizations that each helped, but didn't get quite get to the
level of 12.

Finally I figured out that that's due to an issue outside of nodeAgg.c
itself:

commit 4cad2534da6d17067d98cf04be2dfc1bda8f2cd0
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Date:   2020-05-31 14:43:13 +0200

    Use CP_SMALL_TLIST for hash aggregate

Due to this change we end up with an additional projection in queries
like this:

postgres[212666][1]=# \d fewgroups_many_rows
         Table "public.fewgroups_many_rows"
┌────────┬─────────┬───────────┬──────────┬─────────┐
│ Column │  Type   │ Collation │ Nullable │ Default │
├────────┼─────────┼───────────┼──────────┼─────────┤
│ cat    │ integer │           │ not null │         │
│ val    │ integer │           │ not null │         │
└────────┴─────────┴───────────┴──────────┴─────────┘

postgres[212666][1]=# explain SELECT cat, count(*) FROM fewgroups_many_rows GROUP BY 1;
┌───────────────────────────────────────────────────────────────────────────────────────┐
│                                      QUERY PLAN                                       │
├───────────────────────────────────────────────────────────────────────────────────────┤
│ HashAggregate  (cost=1942478.48..1942478.53 rows=5 width=12)                          │
│   Group Key: cat                                                                      │
│   ->  Seq Scan on fewgroups_many_rows  (cost=0.00..1442478.32 rows=100000032 width=4) │
└───────────────────────────────────────────────────────────────────────────────────────┘
(3 rows)

as 'val' is "projected away"..


After neutering the tlist change, Jeff's patch and my changes to it
yield performance *above* v12.


I don't see why it's ok to force an additional projection in the very
common case of hashaggs over a few rows. So I think we need to rethink
4cad2534da6.

Greetings,

Andres Freund

Attachment

Re: hashagg slowdown due to spill changes

From
Jeff Davis
Date:
On Fri, 2020-06-12 at 14:37 -0700, Andres Freund wrote:
> I don't see why it's ok to force an additional projection in the very
> common case of hashaggs over a few rows. So I think we need to
> rethink
> 4cad2534da6.

One possibility is to project only spilled tuples, more similar to
Melanie's patch from a while ago:


https://www.postgresql.org/message-id/CAAKRu_aefEsv+UkQWqu+ioEnoiL2LJu9Diuh9BR8MbyXuZ0j4A@mail.gmail.com

Which makes sense, but it's also more code.

Regards,
    Jeff Davis





Re: hashagg slowdown due to spill changes

From
Tomas Vondra
Date:
On Fri, Jun 12, 2020 at 03:29:08PM -0700, Jeff Davis wrote:
>On Fri, 2020-06-12 at 14:37 -0700, Andres Freund wrote:
>> I don't see why it's ok to force an additional projection in the very
>> common case of hashaggs over a few rows. So I think we need to
>> rethink
>> 4cad2534da6.
>
>One possibility is to project only spilled tuples, more similar to
>Melanie's patch from a while ago:
>
>
>https://www.postgresql.org/message-id/CAAKRu_aefEsv+UkQWqu+ioEnoiL2LJu9Diuh9BR8MbyXuZ0j4A@mail.gmail.com
>
>Which makes sense, but it's also more code.
>

I agree, we should revert 4cad2534da and only project tuples when we
actually need to spill them. Did any of the WIP patches actually
implement that, or do we need to write that patch from scratch?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: hashagg slowdown due to spill changes

From
Andres Freund
Date:
Hi,

On 2020-06-13 01:06:25 +0200, Tomas Vondra wrote:
> I agree, we should revert 4cad2534da and only project tuples when we
> actually need to spill them.

There are cases where projecting helps for non-spilling aggregates too,
but only for the representative tuple. It doesn't help in the case at
hand, because there's just 5 hashtable entries but millions of rows. So
we're unnecessarily projecting all-5 rows. But when there are many
different groups, it'd be different, because then the size of the
representative tuple can matter substantially.

Do you think we should tackle this for 13? To me 4cad2534da seems like a
somewhat independent improvement to spillable hashaggs.

Greetings,

Andres Freund



Re: hashagg slowdown due to spill changes

From
Jeff Davis
Date:
On Fri, 2020-06-12 at 17:12 -0700, Andres Freund wrote:
> Do you think we should tackle this for 13? To me 4cad2534da seems
> like a
> somewhat independent improvement to spillable hashaggs.

We've gone back and forth on this issue a few times, so let's try to
get some agreement before we revert 4cad2534da. I added Robert because
he also seemed to think it was a reasonable idea.

Regards,
    Jeff Davis





Re: hashagg slowdown due to spill changes

From
Tomas Vondra
Date:
On Sat, Jun 13, 2020 at 11:48:09AM -0700, Jeff Davis wrote:
>On Fri, 2020-06-12 at 17:12 -0700, Andres Freund wrote:
>> Do you think we should tackle this for 13? To me 4cad2534da seems
>> like a
>> somewhat independent improvement to spillable hashaggs.
>
>We've gone back and forth on this issue a few times, so let's try to
>get some agreement before we revert 4cad2534da. I added Robert because
>he also seemed to think it was a reasonable idea.
>

I can't speak for Robert, but I haven't expected the extra projection
would be this high. And I agree with Andres it's not very nice we have
to do this even for aggregates with just a handful of groups that don't
need to spill.

In any case, I think we need to address this somehow for v13 - either we
keep the 4cad2534da patch in, or we tweak the cost model to reflect the
extra I/O costs, or we project only when spilling.

I'm not in a position to whip up a patch soon, though :-(


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: hashagg slowdown due to spill changes

From
Andres Freund
Date:
Hi,

On 2020-06-12 15:29:08 -0700, Jeff Davis wrote:
> On Fri, 2020-06-12 at 14:37 -0700, Andres Freund wrote:
> > I don't see why it's ok to force an additional projection in the very
> > common case of hashaggs over a few rows. So I think we need to
> > rethink
> > 4cad2534da6.
> 
> One possibility is to project only spilled tuples, more similar to
> Melanie's patch from a while ago:
> 
> 
> https://www.postgresql.org/message-id/CAAKRu_aefEsv+UkQWqu+ioEnoiL2LJu9Diuh9BR8MbyXuZ0j4A@mail.gmail.com
> 
> Which makes sense, but it's also more code.

I'm somewhat inclined to think that we should revert 4cad2534da6 and
then look at how precisely to tackle this in 14.

It'd probably make sense to request small tlists when the number of
estimated groups is large, and not otherwise.

Greetings,

Andres Freund



Re: hashagg slowdown due to spill changes

From
Jeff Davis
Date:
On Sun, 2020-06-14 at 11:14 -0700, Andres Freund wrote:
> I'm somewhat inclined to think that we should revert 4cad2534da6 and
> then look at how precisely to tackle this in 14.

I'm fine with that.

> It'd probably make sense to request small tlists when the number of
> estimated groups is large, and not otherwise.

That seems like a nice compromise that would be non-invasive, at least
for create_agg_plan().

Regards,
    Jeff Davis





Re: hashagg slowdown due to spill changes

From
Tomas Vondra
Date:
On Sun, Jun 14, 2020 at 11:09:55PM -0700, Jeff Davis wrote:
>On Sun, 2020-06-14 at 11:14 -0700, Andres Freund wrote:
>> I'm somewhat inclined to think that we should revert 4cad2534da6 and
>> then look at how precisely to tackle this in 14.
>
>I'm fine with that.
>

I don't see how we could just revert 4cad2534d and leave this for v14.

The hashagg spilling is IMHO almost guaranteed to be a pain point for
some users, as it will force some queries to serialize large amounts of
data. Yes, some of this is a cost for hashagg enforcing work_mem at
runtime, I'm fine with that. We'd get reports about that too, but we can
justify that cost ...

But just reverting 4cad2534d will make this much worse, I think, as
illustrated by the benchmarks I did in [1]. And no, this is not really
fixable by tweaking the cost parameters - even with the current code
(i.e. 4cad2534d in place) I had to increase random_page_cost to 60 on
the temp tablespace (on SATA RAID) to get good plans with parallelism
enabled. I haven't tried, but I presume without 4cad2534d I'd have to
push r_p_c even further ...

[1] https://www.postgresql.org/message-id/20200519151202.u2p2gpiawoaznsv2%40development

>> It'd probably make sense to request small tlists when the number of
>> estimated groups is large, and not otherwise.
>
>That seems like a nice compromise that would be non-invasive, at least
>for create_agg_plan().
>

Maybe. It'd certainly better than nothing. It's not clear to me what
would a good threshold be, though. And it's not going to handle cases of
under-estimates.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: hashagg slowdown due to spill changes

From
Robert Haas
Date:
On Mon, Jun 15, 2020 at 9:34 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> But just reverting 4cad2534d will make this much worse, I think, as
> illustrated by the benchmarks I did in [1].

I share this concern, although I do not know what we should do about it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: hashagg slowdown due to spill changes

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Jun 15, 2020 at 9:34 AM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> But just reverting 4cad2534d will make this much worse, I think, as
>> illustrated by the benchmarks I did in [1].

> I share this concern, although I do not know what we should do about it.

Well, it's only June.  Let's put it on the open issues list for v13
and continue to think about it.  I concur that the hashagg spill patch
has made this something that we should worry about for v13, so just
reverting without a better answer isn't very appetizing.

            regards, tom lane



Re: hashagg slowdown due to spill changes

From
Jeff Davis
Date:
On Mon, 2020-06-15 at 11:19 -0400, Robert Haas wrote:
> On Mon, Jun 15, 2020 at 9:34 AM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> > But just reverting 4cad2534d will make this much worse, I think, as
> > illustrated by the benchmarks I did in [1].
> 
> I share this concern, although I do not know what we should do about
> it.

I attached an updated version of Melanie's patch, combined with the
changes to copy only the necessary attributes to a new slot before
spilling. There are a couple changes:

* I didn't see a reason to descend into a GroupingFunc node, so I
removed that.

* I used a flag in the context rather than two separate callbacks to
the expression walker.

This patch gives the space benefits that we see on master, without the
regression for small numbers of tuples. I saw a little bit of noise in
my test results, but I'm pretty sure it's a win all around. It could
use some review/cleanup though.

Regards,
    Jeff Davis


Attachment

Re: hashagg slowdown due to spill changes

From
Tomas Vondra
Date:
On Mon, Jun 15, 2020 at 07:38:45PM -0700, Jeff Davis wrote:
>On Mon, 2020-06-15 at 11:19 -0400, Robert Haas wrote:
>> On Mon, Jun 15, 2020 at 9:34 AM Tomas Vondra
>> <tomas.vondra@2ndquadrant.com> wrote:
>> > But just reverting 4cad2534d will make this much worse, I think, as
>> > illustrated by the benchmarks I did in [1].
>>
>> I share this concern, although I do not know what we should do about
>> it.
>
>I attached an updated version of Melanie's patch, combined with the
>changes to copy only the necessary attributes to a new slot before
>spilling. There are a couple changes:
>
>* I didn't see a reason to descend into a GroupingFunc node, so I
>removed that.
>
>* I used a flag in the context rather than two separate callbacks to
>the expression walker.
>
>This patch gives the space benefits that we see on master, without the
>regression for small numbers of tuples. I saw a little bit of noise in
>my test results, but I'm pretty sure it's a win all around. It could
>use some review/cleanup though.
>

Looks reasonable. I can't redo my tests at the moment, the machine is
busy with something else. I'll give it a try over the weekend.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: hashagg slowdown due to spill changes

From
Andres Freund
Date:
Hi,

I think it'd be good to get the changes that aren't related to
projection merged. As far as I can tell there's performance regressions
both because of the things I'd listed upthread, and due to the
projection issue.  That's not obvious because because we first won
performance and then lost it again in several incremental steps.

COPY (SELECT (random() * 4)::int cat, (random()*10000)::int val FROM generate_series(1, 100000000)) TO '/tmp/data' WITH
BINARY;
BEGIN;
DROP TABLE IF EXISTS fewgroups_many_rows;
CREATE TABLE fewgroups_many_rows(cat int4 not null, val int4 not null);
COPY fewgroups_many_rows FROM '/tmp/data' WITH (FORMAT BINARY, FREEZE);
COMMIT;
VACUUM FREEZE fewgroups_many_rows;

Test prep:

Test query:
SET seed=0;SELECT cat, count(*) FROM fewgroups_many_rows GROUP BY 1;
(the seed seems to reduce noise due to hashtable iv being the same)

best of six:
9e1c9f959422192bbe1b842a2a1ffaf76b080196        12031.906 ms
d52eaa094847d395f942827a6f413904e516994c        12045.487 ms
ac88807f9b227ddcd92b8be9a053094837c1b99a        11950.006 ms
36d22dd95bc87ca68e742da91f47f8826f8758c9        11769.991 ms
5ac4e9a12c6543414891cd8972b2cd36a08e40cc    11551.932 ms
1fdb7f9789c4550204cd62d1746a7deed1dc4c29        11706.948 ms
4eaea3db150af56aa2e40efe91997fd25f3b6d73        11999.908 ms
11de6c903da99a4b2220acfa776fc26c7f384ccc        11999.054 ms
b7fabe80df9a65010bfe5e5d0a979bacebfec382        12165.463 ms
2742c45080077ed3b08b810bb96341499b86d530        12137.505 ms
1f39bce021540fde00990af55b4432c55ef4b3c7        12501.764 ms
9b60c4b979bce060495e2b05ba01d1cc6bffdd2d        12389.047 ms
4cad2534da6d17067d98cf04be2dfc1bda8f2cd0        13319.786 ms
1b2c29469a58cd9086bd86e20c708eb437564a80        13330.616 ms

There's certainly some noise in here, but I think the trends are valid.


>  /*
> - * find_unaggregated_cols
> - *      Construct a bitmapset of the column numbers of un-aggregated Vars
> - *      appearing in our targetlist and qual (HAVING clause)
> + * Walk tlist and qual to find referenced colnos, dividing them into
> + * aggregated and unaggregated sets.
>   */
> -static Bitmapset *
> -find_unaggregated_cols(AggState *aggstate)
> +static void
> +find_cols(AggState *aggstate, Bitmapset **aggregated, Bitmapset **unaggregated)
>  {

It's not this patch's fault, but none, really none, of this stuff should
be in the executor.

Greetings,

Andres Freund



Re: hashagg slowdown due to spill changes

From
Melanie Plageman
Date:

On Mon, Jun 22, 2020 at 9:02 PM Andres Freund <andres@anarazel.de> wrote:

>  /*
> - * find_unaggregated_cols
> - *     Construct a bitmapset of the column numbers of un-aggregated Vars
> - *     appearing in our targetlist and qual (HAVING clause)
> + * Walk tlist and qual to find referenced colnos, dividing them into
> + * aggregated and unaggregated sets.
>   */
> -static Bitmapset *
> -find_unaggregated_cols(AggState *aggstate)
> +static void
> +find_cols(AggState *aggstate, Bitmapset **aggregated, Bitmapset **unaggregated)
>  {

It's not this patch's fault, but none, really none, of this stuff should
be in the executor.


Were you thinking it could be done in grouping_planner() and then the
bitmaps could be saved in the PlannedStmt?
Or would you have to wait until query_planner()? Or are you imagining
somewhere else entirely?

--
Melanie Plageman

Re: hashagg slowdown due to spill changes

From
Andres Freund
Date:
Hi,

On 2020-06-23 09:23:57 -0700, Melanie Plageman wrote:
> On Mon, Jun 22, 2020 at 9:02 PM Andres Freund <andres@anarazel.de> wrote:
> > It's not this patch's fault, but none, really none, of this stuff should
> > be in the executor.
> >
> >
> Were you thinking it could be done in grouping_planner() and then the
> bitmaps could be saved in the PlannedStmt?
> Or would you have to wait until query_planner()? Or are you imagining
> somewhere else entirely?

I haven't thought about it in too much detail, but I would say
create_agg_plan() et al. I guess there's some argument to be made to do
it in setrefs.c, because we already do convert_combining_aggrefs() there
(but I don't like that much).

There's no reason to do it before we actually decided on one specific
path, so doing it earlier than create_plan() seems unnecessary. And
having it in agg specific code seems better than putting it into global
routines.

There's probably an argument for having a bit more shared code between
create_agg_plan(), create_group_plan() and
create_groupingsets_plan(). But even just adding a new extract_*_cols()
call to each of those would probably be ok.

Greetings,

Andres Freund



Re: hashagg slowdown due to spill changes

From
Melanie Plageman
Date:


On Tue, Jun 23, 2020 at 10:06 AM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2020-06-23 09:23:57 -0700, Melanie Plageman wrote:
> On Mon, Jun 22, 2020 at 9:02 PM Andres Freund <andres@anarazel.de> wrote:
> > It's not this patch's fault, but none, really none, of this stuff should
> > be in the executor.
> >
> >
> Were you thinking it could be done in grouping_planner() and then the
> bitmaps could be saved in the PlannedStmt?
> Or would you have to wait until query_planner()? Or are you imagining
> somewhere else entirely?

I haven't thought about it in too much detail, but I would say
create_agg_plan() et al. I guess there's some argument to be made to do
it in setrefs.c, because we already do convert_combining_aggrefs() there
(but I don't like that much).

There's no reason to do it before we actually decided on one specific
path, so doing it earlier than create_plan() seems unnecessary. And
having it in agg specific code seems better than putting it into global
routines.

There's probably an argument for having a bit more shared code between
create_agg_plan(), create_group_plan() and
create_groupingsets_plan(). But even just adding a new extract_*_cols()
call to each of those would probably be ok.


So, my summary of this point in the context of the other discussion
upthread is:

Planner should extract the columns that hashagg will need later during
planning. Planner should not have HashAgg/MixedAgg nodes request smaller
targetlists from their children with CP_SMALL_TLIST to avoid unneeded
projection overhead.
Also, even this extraction should only be done when the number of groups
is large enough to suspect a spill.

So, I wrote a patch that extracts the columns the same way as in
ExecInitAgg but in create_agg_plan() and it doesn't work because we
haven't called set_plan_references().

Then, I wrote a patch that does this in set_upper_references(), and it
seems to work. I've attached that one.
It is basically Jeff's patch (based somewhat on my patch) which extracts
the columns in ExecInitAgg but I moved the functions over to setrefs.c
and gave them a different name.

It's not very elegant.
I shoved it in at the end of set_upper_references(), but I think there
should be a nice way to do it while setting the references for each var
instead of walking over the nodes again.
Also, I think that the bitmapsets for the colnos should maybe be put
somewhere less prominent (than in the main Agg plan node?), since they
are only used in one small place.
I tried putting both bitmaps in an array of two bitmaps in the Agg node
(since there will always be two) to make it look a bit neater, but it
was pretty confusing and error prone to remember which one was
aggregated and which one was unaggregated.

Note that I didn't do anything with costing like only extracting the
columns if there are a lot of groups.

Also, I didn't revert the CP_SMALL_TLIST change in create_agg_plan() or
create_groupingsets_plan().

Not to stir the pot, but I did notice that hashjoin uses CP_SMALL_TLIST
in create_hashjoin_plan() for the inner side sub-tree and the outer side
one if there are multiple batches. I wondered what was different about
that vs hashagg (i.e. why it is okay to do that there).

--
Melanie Plageman
Attachment

Re: hashagg slowdown due to spill changes

From
Tomas Vondra
Date:
On Wed, Jun 24, 2020 at 05:26:07PM -0700, Melanie Plageman wrote:
>On Tue, Jun 23, 2020 at 10:06 AM Andres Freund <andres@anarazel.de> wrote:
>
>> Hi,
>>
>> On 2020-06-23 09:23:57 -0700, Melanie Plageman wrote:
>> > On Mon, Jun 22, 2020 at 9:02 PM Andres Freund <andres@anarazel.de>
>> wrote:
>> > > It's not this patch's fault, but none, really none, of this stuff
>> should
>> > > be in the executor.
>> > >
>> > >
>> > Were you thinking it could be done in grouping_planner() and then the
>> > bitmaps could be saved in the PlannedStmt?
>> > Or would you have to wait until query_planner()? Or are you imagining
>> > somewhere else entirely?
>>
>> I haven't thought about it in too much detail, but I would say
>> create_agg_plan() et al. I guess there's some argument to be made to do
>> it in setrefs.c, because we already do convert_combining_aggrefs() there
>> (but I don't like that much).
>>
>> There's no reason to do it before we actually decided on one specific
>> path, so doing it earlier than create_plan() seems unnecessary. And
>> having it in agg specific code seems better than putting it into global
>> routines.
>>
>> There's probably an argument for having a bit more shared code between
>> create_agg_plan(), create_group_plan() and
>> create_groupingsets_plan(). But even just adding a new extract_*_cols()
>> call to each of those would probably be ok.
>>
>>
>So, my summary of this point in the context of the other discussion
>upthread is:
>
>Planner should extract the columns that hashagg will need later during
>planning. Planner should not have HashAgg/MixedAgg nodes request smaller
>targetlists from their children with CP_SMALL_TLIST to avoid unneeded
>projection overhead.
>Also, even this extraction should only be done when the number of groups
>is large enough to suspect a spill.
>

IMO we should extract the columns irrespectedly of the estimates,
otherwise we won't be able to handle underestimates efficiently.

>
>Not to stir the pot, but I did notice that hashjoin uses CP_SMALL_TLIST
>in create_hashjoin_plan() for the inner side sub-tree and the outer side
>one if there are multiple batches. I wondered what was different about
>that vs hashagg (i.e. why it is okay to do that there).
>

Yeah. That means that if we have to start batching during execution, we
may need to spill much more datai. I'd say that's a hashjoin issue that
we should fix too (in v14).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: hashagg slowdown due to spill changes

From
Andres Freund
Date:
Hi,

On 2020-06-12 14:37:15 -0700, Andres Freund wrote:
> On 2020-06-11 11:14:02 -0700, Jeff Davis wrote:
> > On Thu, 2020-06-11 at 10:45 -0700, Andres Freund wrote:
> > > Did you run any performance tests?
> >
> > Yes, I reproduced your ~12% regression from V12, and this patch nearly
> > eliminated it for me.
> 
> I spent a fair bit of time looking at the difference. Jeff had let me
> know on chat that he was still seeing some difference, but couldn't
> quite figure out where that was.
> 
> Trying it out myself, I observed that the patch helped, but not that
> much. After a bit I found one major reason for why:
> LookupTupleHashEntryHash() assigned the hash to pointer provided by the
> caller's before doing the insertion. That ended up causing a pipeline
> stall (I assume it's store forwarding, but not sure). Moving the
> assignment to the caller variable to after the insertion got rid of
> that.

This is still not resolved. We're right now slower than 12. It's
effectively not possible to do performance comparisons right now. This
was nearly two months ago.

Greetings,

Andres Freund



Re: hashagg slowdown due to spill changes

From
Peter Geoghegan
Date:
On Fri, Jul 24, 2020 at 4:51 PM Andres Freund <andres@anarazel.de> wrote:
> This is still not resolved. We're right now slower than 12. It's
> effectively not possible to do performance comparisons right now. This
> was nearly two months ago.

I have added a new open item for this separate
LookupTupleHashEntryHash()/lookup_hash_entry() pipeline-stall issue.

(For the record I mistakenly believed that commit 23023022 resolved
all of the concerns raised on this thread, which is why I closed out
the open item associated with this thread. Evidently work remains to
fix a remaining regression that affects simple in-memory hash
aggregation, though.)

-- 
Peter Geoghegan



Re: hashagg slowdown due to spill changes

From
Peter Geoghegan
Date:
On Sat, Jul 25, 2020 at 12:41 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I have added a new open item for this separate
> LookupTupleHashEntryHash()/lookup_hash_entry() pipeline-stall issue.

Attached is a rebased version of Andres' now-bitrot 2020-06-12 patch
("aggspeed.diff").

I find that Andres original "SELECT cat, count(*) FROM
fewgroups_many_rows GROUP BY 1;" test case is noticeably improved by
the patch. Without the patch, v13 takes ~11.46 seconds. With the
patch, it takes only ~10.64 seconds.

Didn't test it against v12 yet, but I have no reason to doubt Andres'
explanation. I gather that if we can get this patch committed, we can
close the relevant LookupTupleHashEntryHash() open item.

Can you take this off my hands, Jeff?

Thanks
-- 
Peter Geoghegan

Attachment

Re: hashagg slowdown due to spill changes

From
Jeff Davis
Date:
On Sat, 2020-07-25 at 15:08 -0700, Peter Geoghegan wrote:
> I find that Andres original "SELECT cat, count(*) FROM
> fewgroups_many_rows GROUP BY 1;" test case is noticeably improved by
> the patch. Without the patch, v13 takes ~11.46 seconds. With the
> patch, it takes only ~10.64 seconds.

I saw less of an improvement than you or Andres (perhaps just more
noise). But given that both you and Andres are reporting a measurable
improvement, then I went ahead and committed it. Thank you.

Regards,
    Jeff Davis





Re: hashagg slowdown due to spill changes

From
Peter Geoghegan
Date:
On Sun, Jul 26, 2020 at 4:17 PM Jeff Davis <pgsql@j-davis.com> wrote:
> I saw less of an improvement than you or Andres (perhaps just more
> noise). But given that both you and Andres are reporting a measurable
> improvement, then I went ahead and committed it. Thank you.

Thanks!

-- 
Peter Geoghegan