Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Default setting for enable_hashagg_disk
Date
Msg-id 6292f5a766198d5745eb8cc17ac736366056671d.camel@j-davis.com
Whole thread Raw
In response to Re: Default setting for enable_hashagg_disk  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Default setting for enable_hashagg_disk  (Melanie Plageman <melanieplageman@gmail.com>)
List pgsql-hackers
On Thu, 2020-04-09 at 15:26 -0400, Robert Haas wrote:
> I think it's actually pretty different. All of the other enable_*
> GUCs
> disable an entire type of plan node, except for cases where that
> would
> otherwise result in planning failure. This just disables a portion of
> the planning logic for a certain kind of node, without actually
> disabling the whole node type. I'm not sure that's a bad idea, but it
> definitely seems to be inconsistent with what we've done in the past.

The patch adds two GUCs. Both are slightly weird, to be honest, but let
me explain the reasoning. I am open to other suggestions.

1. enable_hashagg_disk (default true):

This is essentially there just to get some of the old behavior back, to
give people an escape hatch if they see bad plans while we are tweaking
the costing. The old behavior was weird, so this GUC is also weird.

Perhaps we can make this a compatibility GUC that we eventually drop? I
don't necessarily think this GUC would make sense, say, 5 versions from
now. I'm just trying to be conservative because I know that, even if
the plans are faster for 90% of people, the other 10% will be unhappy
and want a way to work around it.

2. enable_groupingsets_hash_disk (default false):

This is about how we choose which grouping sets to hash and which to
sort when generating mixed mode paths.

Even before this patch, there are quite a few paths that could be
generated. It tries to estimate the size of each grouping set's hash
table, and then see how many it can fit in work_mem (knapsack), while
also taking advantage of any path keys, etc.

With Disk-based Hash Aggregation, in principle we can generate paths
representing any combination of hashing and sorting for the grouping
sets. But that would be overkill (and grow to a huge number of paths if
we have more than a handful of grouping sets). So I think the existing
planner logic for grouping sets is fine for now. We might come up with
a better approach later.

But that created a testing problem, because if the planner estimates
correctly, no hashed grouping sets will spill, and the spilling code
won't be exercised. This GUC makes the planner disregard which grouping
sets' hash tables will fit, making it much easier to exercise the
spilling code. Is there a better way I should be testing this code
path?

Regards,
    Jeff Davis





pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Parallel copy
Next
From: Stephen Frost
Date:
Subject: Re: where should I stick that backup?