> On Wed, May 18, 2022 at 04:49:24PM -0400, Joe Conway wrote:
> On 5/18/22 16:20, Alvaro Herrera wrote:
> > On 2022-May-18, Joe Conway wrote:
> >
> > > On 5/18/22 11:11, Alvaro Herrera wrote:
> >
> > > > Apparently, if the cgroup goes over the "high" limit, the processes are
> > > > *throttled*. Then if the group goes over the "max" limit, OOM-killer is
> > > > invoked.
> >
> > > You may be misinterpreting "throttle" in this context. From [1]:
> > >
> > > The memory.high boundary on the other hand can be set
> > > much more conservatively. When hit, it throttles
> > > allocations by forcing them into direct reclaim to
> > > work off the excess, but it never invokes the OOM
> > > killer.
> >
> > Well, that means the backend processes don't do their expected task
> > (process some query) but instead they have to do "direct reclaim". I
> > don't know what that is, but it sounds like we'd need to add
> > Linux-specific code in order for this to fix anything.
>
> Postgres does not need to do anything. The kernel just does its thing (e.g.
> clearing page cache or swapping out anon memory) more aggressively than
> normal to clear up some space for the impending allocation.
>
> > And what would we do in such a situation anyway? Seems like our
> > best hope would be to> get malloc() to return NULL and have the
> > resulting transaction abort free enough memory that things in other
> > backends can continue to run.
>
> With the right hooks an extension could detect the memory pressure in an OS
> specific way and return null.
>
> > *If* there is a way to have cgroups make Postgres do that, then that
> > would be useful enough.
>
> Memory accounting under cgroups (particularly v2) can provide the signal
> needed for a Linux specific extension to do that.
To elaborate a bit on this, Linux PSI feature (in the context of
containers, cgroups v2 only) [1] would allow a userspace application to
register a trigger on memory pressure exceeding some threshold. The
pressure here is not exactly how much memory is allocated, but rather
memory stall, and the whole machinery would involve polling -- but still
sounds interesting in the context of this thread.
[1]: https://www.kernel.org/doc/Documentation/accounting/psi.rst