Home > mailing lists

Re: Memory-Bounded Hash Aggregation - Mailing list pgsql-hackers

From	Adam Lee
Subject	Re: Memory-Bounded Hash Aggregation
Date	December 20, 2019 09:16:26
Msg-id	20191220091626.GD45905@mars.local Whole thread Raw
In response to	Re: Memory-Bounded Hash Aggregation (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List	pgsql-hackers

Tree view

On Sat, Dec 14, 2019 at 06:32:25PM +0100, Tomas Vondra wrote:
> I've done a bit more testing on this, after resolving a couple of minor
> conflicts due to recent commits (rebased version attached).
> 
> In particular, I've made a comparison with different dataset sizes,
> group sizes, GUC settings etc. The script and results from two different
> machines are available here:
> 
> The script essentially runs a simple grouping query with different
> number of rows, groups, work_mem and parallelism settings. There's
> nothing particularly magical about it.

Nice!

> I did run it both on master and patched code, allowing us to compare
> results and assess impact of the patch. Overall, the changes are
> expected and either neutral or beneficial, i.e. the timing are the same
> or faster.
> 
> The number of cases that regressed is fairly small, but sometimes the
> regressions are annoyingly large - up to 2x in some cases. Consider for
> example this trivial example with 100M rows:

I suppose this is because the patch has no costing changes yet. I hacked
a little to give hash agg a spilling punish, just some value based on
(groups_in_hashtable * num_of_input_tuples)/num_groups_from_planner, it
would not choose hash aggregate in this case.

However, that punish is wrong, because comparing to the external sort
algorithm, hash aggregate has the respilling, which involves even more
I/O, especially with a very large number of groups but a very small
number of tuples in a single group like the test you did. It would be a
challenge.

BTW, Jeff, Greenplum has a test for hash agg spill, I modified a little
to check how many batches a query uses, it's attached, not sure if it
would help.

-- 
Adam Lee

Attachment

hashagg_spill.sql

pgsql-hackers by date:

From: Mark Lorenz
Date: 20 December 2019, 08:38:16
Subject: Re: Created feature for to_date() conversion using patterns'YYYY-WW', 'YYYY-WW-D', 'YYYY-MM-W' and 'YYYY-MM-W-D'

From: Jehan-Guillaume de Rorthais
Date: 20 December 2019, 09:55:25
Subject: Re: How is this possible "publication does not exist"

Re: Memory-Bounded Hash Aggregation - Mailing list pgsql-hackers

Attachment

Previous

Next