Re: Memory-Bounded Hash Aggregation - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Memory-Bounded Hash Aggregation
Date
Msg-id 9be86fc1adc315f69b8af4b379087ab451008d8a.camel@j-davis.com
Whole thread Raw
In response to Re: Memory-Bounded Hash Aggregation  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Memory-Bounded Hash Aggregation  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: Memory-Bounded Hash Aggregation  (Adam Lee <ali@pivotal.io>)
List pgsql-hackers
On Thu, 2019-07-11 at 17:55 +0200, Tomas Vondra wrote:
> Makes sense. I haven't thought about how the hybrid approach would be
> implemented very much, so I can't quite judge how complicated would
> it be
> to extend "approach 1" later. But if you think it's a sensible first
> step,
> I trust you. And I certainly agree we need something to compare the
> other
> approaches against.

Is this a duplicate of your previous email?

I'm slightly confused but I will use the opportunity to put out another
WIP patch. The patch could use a few rounds of cleanup and quality
work, but the funcionality is there and the performance seems
reasonable.

I rebased on master and fixed a few bugs, and most importantly, added
tests.

It seems to be working with grouping sets fine. It will take a little
longer to get good performance numbers, but even for group size of one,
I'm seeing HashAgg get close to Sort+Group in some cases.

You are right that the missed lookups appear to be costly, at least
when the data all fits in system memory. I think it's the cache misses,
because sometimes reducing work_mem improves performance. I'll try
tuning the number of buckets for the hash table and see if that helps.
If not, then the performance still seems pretty good to me.

Of course, HashAgg can beat sort for larger group sizes, but I'll try
to gather some more data on the cross-over point.

Regards,
    Jeff Davis


Attachment

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Next
From: Michael Paquier
Date:
Subject: Re: Add parallelism and glibc dependent only options to reindexdb