Home > mailing lists

Re: Treating work_mem as a shared resource (Was: Parallel Hash take II) - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: Treating work_mem as a shared resource (Was: Parallel Hash take II)
Date	November 18, 2017 00:22:22
Msg-id	CAH2-Wz=bXm-mc_vJ9q42Yns2PcG2dSBPG4H+30eELuDLV93Yfw@mail.gmail.com Whole thread Raw
In response to	Re: Treating work_mem as a shared resource (Was: Parallel Hash take II) (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Treating work_mem as a shared resource (Was: Parallel Hash take II) (Tom Lane <tgl@sss.pgh.pa.us>) Re: Treating work_mem as a shared resource (Was: Parallel Hash take II) (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

On Fri, Nov 17, 2017 at 7:31 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Nov 16, 2017 at 11:50 AM, Serge Rielau <serge@rielau.com> wrote:
>>
>> Just as you have, we have also considered holistic provisioning of work_mem across all consumers, but we find that
tobe too complex. 
>> Having an “emergency fund” in shared memory is also an option, but I find it too limiting.
>
>
> I agree.

Yeah. I suspect that that idea is not ambitious enough to do a lot of
what we want, and yet is too ambitious to justify working on given its
limited shelf life.

> I think this is basically a planning problem.  For example, say we wanted to have work_mem_per_query instead of
work_mem_per_node. There is an obvious design: consider memory use as an independent dimension of merit during path
generationand comparison (less is better).  Discard candidate paths whose memory use exceeds the work_mem_per_query
budgetunless there are no other alternatives.  At the end of planning, pick the cheapest path that survived the
memory-budgetfilter.  Now, this has the problem that it would make planning more expensive (because we'd hang on to
morepaths for longer) but it solves a lot of other problems.  If there's no memory pressure, we can use memory like mad
evenwhen it doesn't save much, but when we have to pick between using more memory for one part of the plan and using
morememory for another part of the plan, the choice that does the best job reducing overall execution time will win.
Awesome.

I'd like to hear some opinions on the feasibility of this approach.
Does David have anything to say about it, for example?

> We could also do more localized variants of this that don't provide hard guarantees but do tend to avoid squandering
resources.

That sounds like independent work, though it could be very useful.

> Yet another thing we could do is to try to get nodes to voluntarily use less than work_mem when possible.  This is
particularlyan issue for sorts.  A 2-batch hash join is so much more expensive than a single-batch hash join that it's
almostnever going to make sense unless we have no realistic alternative, although I suppose a 64-batch hash join might
benot that different from a 32-batch hash join.  But for sorts, given all Peter's work in this area, I bet there are a
lotof sorts that could budget a quarter or less of work_mem and really not be hurt very much.  It depends somewhat on
howfast and how contended your I/O is, though, which we don't have an especially good way to model.  I'm starting to
wonderif that sort_mem GUC might be a good idea... use that for sorts, and keep work_mem for everything else. 

Right. The ability for sorts to do well with less memory is really
striking these days. And though I didn't mean to seriously suggest it,
a hash_mem GUC does seem like it solves some significant problems
without much risk. I think it should be hash_mem, not sort_mem,
because hashing seems more like the special case among operations that
consume work_mem, and because sort_mem is already the old name for
work_mem that is still accepted as a work_mem alias, and because
hash_mem avoids any confusion about whether or not CREATE INDEX uses
the new GUC (it clearly does not).

Since I am primarily concerned about the difference in sensitivity to
the availability of memory that exists when comparing sorting and
hashing, and since a new GUC seems like it would noticeably improve
matters, I am beginning to take the idea of writing a hash_mem patch
for v11 seriously.

--
Peter Geoghegan

pgsql-hackers by date:

From: Fujii Masao
Date: 17 November 2017, 23:57:23
Subject: Re: Speed up the removal of WAL files

From: Peter Eisentraut
Date: 18 November 2017, 01:31:06
Subject: Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

Re: Treating work_mem as a shared resource (Was: Parallel Hash take II) - Mailing list pgsql-hackers

Previous

Next