Re: Treating work_mem as a shared resource (Was: Parallel Hash take II) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Treating work_mem as a shared resource (Was: Parallel Hash take II)
Date
Msg-id CA+TgmoaXBEEsn7pqFGGNqXh0RQV1GDZuXL00cqfxnJBoBAMYUQ@mail.gmail.com
Whole thread Raw
In response to Re: Treating work_mem as a shared resource (Was: Parallel Hashtake II)  (Serge Rielau <serge@rielau.com>)
Responses Re: Treating work_mem as a shared resource (Was: Parallel Hash take II)  (Vladimir Rusinov <vrusinov@google.com>)
Re: Treating work_mem as a shared resource (Was: Parallel Hash take II)  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Thu, Nov 16, 2017 at 11:50 AM, Serge Rielau <serge@rielau.com> wrote:
Just as you have, we have also considered holistic provisioning of work_mem across all consumers, but we find that to be too complex.
Having an “emergency fund” in shared memory is also an option, but I find it too limiting.

I agree.

I think this is basically a planning problem.  For example, say we wanted to have work_mem_per_query instead of work_mem_per_node.  There is an obvious design: consider memory use as an independent dimension of merit during path generation and comparison (less is better).  Discard candidate paths whose memory use exceeds the work_mem_per_query budget unless there are no other alternatives.  At the end of planning, pick the cheapest path that survived the memory-budget filter.  Now, this has the problem that it would make planning more expensive (because we'd hang on to more paths for longer) but it solves a lot of other problems.  If there's no memory pressure, we can use memory like mad even when it doesn't save much, but when we have to pick between using more memory for one part of the plan and using more memory for another part of the plan, the choice that does the best job reducing overall execution time will win.  Awesome.

We could also do more localized variants of this that don't provide hard guarantees but do tend to avoid squandering resources.  I don't think that we can directly incorporate memory use into cost because that will distort the costs of higher-level nodes in the plan tree; cost needs to mean execution time.  However, what we could do is refuse to replace a more expensive path in a relation's path list with a cheaper one when the savings are small and the cheaper path uses a lot more memory.  That way, you wouldn't replace a nested loop that costs a million units with a hash join that costs 999,999 units but uses a GB of RAM; you'd save the hash join for cases where we think it will help significantly.

Yet another thing we could do is to try to get nodes to voluntarily use less than work_mem when possible.  This is particularly an issue for sorts.  A 2-batch hash join is so much more expensive than a single-batch hash join that it's almost never going to make sense unless we have no realistic alternative, although I suppose a 64-batch hash join might be not that different from a 32-batch hash join.  But for sorts, given all Peter's work in this area, I bet there are a lot of sorts that could budget a quarter or less of work_mem and really not be hurt very much.  It depends somewhat on how fast and how contended your I/O is, though, which we don't have an especially good way to model.  I'm starting to wonder if that sort_mem GUC might be a good idea... use that for sorts, and keep work_mem for everything else.

If we really want to be able to dynamically react to change memory conditions, what we need is a forest of plans for a given query rather than just one.  Pick plan A if memory is limited, otherwise pick B.  Or use admission control.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: PG10.1 autovac killed building extended stats
Next
From: Andres Freund
Date:
Subject: Re: Inlining functions with "expensive" parameters