Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: Default setting for enable_hashagg_disk
Date
Msg-id CAKFQuwafWCNrUKQXZUOf4TpeLC99XU6YNoEyfFqTMTYxnbBp-A@mail.gmail.com
Whole thread Raw
In response to Re: Default setting for enable_hashagg_disk  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Sun, Jul 19, 2020 at 4:38 AM Stephen Frost <sfrost@snowman.net> wrote:
> (The only reason I'm in favor of heap_mem[_multiplier] is that it
> seems like it might be possible to use it to get *better* plans
> than before.  I do not see it as a backwards-compatibility knob.)

I still don't think a hash_mem-type thing is really the right direction
to go in, even if making a distinction between memory used for sorting
and memory used for hashing is, and I'm of the general opinion that we'd
be thinking about doing something better and more appropriate- except
for the fact that we're talking about adding this in during beta.

In other words, if we'd stop trying to shoehorn something in, which
we're doing because we're in beta, we'd very likely be talking about all
of this in a very different way and probably be contemplating something
like a query_mem that provides for an overall memory limit and which
favors memory for hashing over memory for sorting, etc.

At minimum we'd need a patch we would be happy with dropping in should there be user complaints.  And once this conversation ends with that in hand I have my doubts whether there will be interest, or even project desirability, in working toward a "better" solution should this one prove itself "good enough".  And as it seems unlikely that this patch would foreclose on other promising solutions, combined with there being a non-trivial behavioral change that we've made, suggests to me that we might as well just deploy whatever short-term solution we come up with now.

As for hashagg_avoid_disk_plan...

The physical processes we are modelling here:
1. Processing D amount of records takes M amount of memory
2. Processing D amount of records in-memory takes T time per record while doing the same on-disk takes V time per record
3. Processing D amount of records via some other plan has an effective cost U
3. V >> T (is strictly greater than)
4. Having chosen a value for M that ensures T it is still possible for V to end up used

Thus:

If we get D wrong the user can still tweak the system by changing the hash_mem_multiplier (this is strictly better than v12 which used work_mem)

Setting hashagg_avoid_disk_plan = off provides a means to move V infinitely far away from T (set to on by default, off reverts to v12 behavior).

There is no way for the user to move V's relative position toward T (n/a in v12)

The only way to move T is to make it infinitely large by setting enable_hashagg = off (same as in v12)

Is hashagg_disk_cost_multiplier = [0.0, 1,000,000,000.0] i.e., (T * hashagg_disk_cost_multiplier == V) doable?

It has a nice symmetry with hash_mem_multiplier and can move V both toward and away from T.  To the extent T is tunable or not in v12 it can remain the same in v13.

David J.

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: WIP: BRIN multi-range indexes
Next
From: Tom Lane
Date:
Subject: Fix initdb's unsafe not-null-marking rule