Thread: [GENERAL] Running out of memory the hard way ...

[GENERAL] Running out of memory the hard way ...

From

Bill Moran

Date:

08 February 2017, 12:17:17

If you run a transaction with lots of server side functions that use a
lot of memory, this can trigger the OOM killer in Linux, causing the
PosgreSQL backend to receive a SIGKILL and all the associated bad
stuff.

Tuning the OOM killer is not sufficient. No setting I've found for the
OOM killer will guarantee that it won't SIGKILL a process that's essentially
untenable anyway (because it's going to use more memory than actually
exists on the system at some point anyway). Additionally, "add more RAM"
doesn't "solve" the problem, it only delays it until datasets
scale up to even larger transactions that use even more memory.

This is particularly prevelent with Postgis, because some Postgis functions
are very memory intesive, but I'd be willing to bet real money that I could
trigger it with just about any stored procedure that allocates memory in
such as way that it doesn't get reclaimed until the transaction completes.
See as an example: https://trac.osgeo.org/postgis/ticket/3445
If anyone wants to investigate this but is having trouble reproducing, I
can construct specific failure scenarios fairly easily.

Another workaround is to run the offending statements in smaller
transactional batches. This is the best solution I've found so far, but
it's not quite ideal. In particular it requires the client program to
reimplement transaction guarantees on the client side. Sometimes this
isn't necessary, but other times it is.

What I feel is the best way to mitigate the situation, is to have some
setting that limits the maximum RAM any backend can consume. Attempting to
exceed this limit would cause an error and rollback for that particular
backend without affecting other backends. This would provide information
to the client process that can be rationally interpreted by client code
to result in either an error that a developer can understand, or possibly
adaptive code that changes behavior to accomodate limits on the server
side.

My first question: does this setting exist somewhere and I'm simply not
finding it for some reason?

Assuming this doesn't exist (I haven't found it) my next question is
whether there's a philosophical or technical reason that such a feature
doesn't exist? Should I take this discussion to -hackers?

--
Bill Moran <wmoran@potentialtech.com>

Re: [GENERAL] Running out of memory the hard way ...

From

Albe Laurenz

Date:

08 February 2017, 12:53:10

Bill Moran wrote:
> If you run a transaction with lots of server side functions that use a
> lot of memory, this can trigger the OOM killer in Linux, causing the
> PosgreSQL backend to receive a SIGKILL and all the associated bad
> stuff.
> 
> Tuning the OOM killer is not sufficient. No setting I've found for the
> OOM killer will guarantee that it won't SIGKILL a process that's essentially
> untenable anyway (because it's going to use more memory than actually
> exists on the system at some point anyway). Additionally, "add more RAM"
> doesn't "solve" the problem, it only delays it until datasets
> scale up to even larger transactions that use even more memory.

I routinely set vm.overcommit_memory = 2 and configure vm.overcommit_ratio
so that the kernel does not try to commit more memory than there is in
the machine.

That should get rid of the problem, of course at the risk of leaving
some memory unused.

> This is particularly prevelent with Postgis, because some Postgis functions
> are very memory intesive, but I'd be willing to bet real money that I could
> trigger it with just about any stored procedure that allocates memory in
> such as way that it doesn't get reclaimed until the transaction completes.
[...]
> 
> What I feel is the best way to mitigate the situation, is to have some
> setting that limits the maximum RAM any backend can consume. Attempting to
> exceed this limit would cause an error and rollback for that particular
> backend without affecting other backends.
[...]
>
> My first question: does this setting exist somewhere and I'm simply not
> finding it for some reason?
> 
> Assuming this doesn't exist (I haven't found it) my next question is
> whether there's a philosophical or technical reason that such a feature
> doesn't exist? Should I take this discussion to -hackers?

I don't think that there is such a setting.

work_mem sets a limit per operation, but that is a soft limit that
PostgreSQL server code can choose to ignore if it pleases.
Moreover, it does not limit the *total* memory a backend can use.

I'd delegate that problem to the operating system which, after all,
should know best of all how much memory a process uses.
And I don't see a big advantage in a PostgreSQL generated error message
over an "out of memory" error that is propagated from the operating system.

Of course, if there is no way to limit the amount of memory per process
(excluding shared memory!), you have a point.

Yours,
Laurenz Albe

Re: [GENERAL] Running out of memory the hard way ...

From

Tom Lane

Date:

08 February 2017, 18:44:24

Albe Laurenz <laurenz.albe@wien.gv.at> writes:
> Bill Moran wrote:
>> What I feel is the best way to mitigate the situation, is to have some
>> setting that limits the maximum RAM any backend can consume.

> I'd delegate that problem to the operating system which, after all,
> should know best of all how much memory a process uses.

I've had some success using ulimit in the past, although it does have
the disadvantage that you have to impose the same limit on every PG
process.  (You set it before starting the postmaster and it inherits
to every child process.)  If memory serves, limiting with the -v switch
works better than -d or -m on Linux; but I might be misremembering.
Conceivably we could add code to let the ulimit be set per-process,
if the use-case were strong enough.

To implement a limit inside PG, we'd have to add expensive bookkeeping
to the palloc/pfree mechanism, and even that would be no panacea because
it would fail to account for memory allocated directly from malloc.
Hence, you could be pretty certain that it would be wildly inaccurate
for sessions using third-party code such as PostGIS or Python.  An
OS-enforced limit definitely sounds better from here.

            regards, tom lane

Re: [GENERAL] Running out of memory the hard way ...

From

Paul Ramsey

Date:

08 February 2017, 19:57:35

On Wed, Feb 8, 2017 at 7:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Albe Laurenz <laurenz.albe@wien.gv.at> writes:
> Bill Moran wrote:
>> What I feel is the best way to mitigate the situation, is to have some
>> setting that limits the maximum RAM any backend can consume.

> I'd delegate that problem to the operating system which, after all,
> should know best of all how much memory a process uses.

I've had some success using ulimit in the past, although it does have
the disadvantage that you have to impose the same limit on every PG
process. (You set it before starting the postmaster and it inherits
to every child process.) If memory serves, limiting with the -v switch
works better than -d or -m on Linux; but I might be misremembering.
Conceivably we could add code to let the ulimit be set per-process,
if the use-case were strong enough.

To implement a limit inside PG, we'd have to add expensive bookkeeping
to the palloc/pfree mechanism, and even that would be no panacea because
it would fail to account for memory allocated directly from malloc.
Hence, you could be pretty certain that it would be wildly inaccurate
for sessions using third-party code such as PostGIS or Python. An
OS-enforced limit definitely sounds better from here.

Confirming what Tom said, with respect to the specific example in this thread, a large proportion of the allocations in memory hungry bits of PostGIS are in fact using bare malloc via the GEOS library.

Re: [GENERAL] Running out of memory the hard way ...

From

Bill Moran

Date:

09 February 2017, 10:33:09

On Wed, 08 Feb 2017 10:44:24 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Albe Laurenz <laurenz.albe@wien.gv.at> writes:
> > Bill Moran wrote:
> >> What I feel is the best way to mitigate the situation, is to have some
> >> setting that limits the maximum RAM any backend can consume.
>
> > I'd delegate that problem to the operating system which, after all,
> > should know best of all how much memory a process uses.
>
> I've had some success using ulimit in the past, although it does have
> the disadvantage that you have to impose the same limit on every PG
> process.  (You set it before starting the postmaster and it inherits
> to every child process.)  If memory serves, limiting with the -v switch
> works better than -d or -m on Linux; but I might be misremembering.
> Conceivably we could add code to let the ulimit be set per-process,
> if the use-case were strong enough.

Thanks, Tom. I'm not sure why I didn't think to use this. Although part
of the problem may be that most of the links that come up from a google
search on this topic don't seem to have this suggestion.

Hopefully having this in the list archives will make the search easier
for the next person who has this issue. Does anyone know if there are
any suggestions to this effect in the official documentation? If not,
I'll try to make some time to submit a patch.

> To implement a limit inside PG, we'd have to add expensive bookkeeping
> to the palloc/pfree mechanism, and even that would be no panacea because
> it would fail to account for memory allocated directly from malloc.
> Hence, you could be pretty certain that it would be wildly inaccurate
> for sessions using third-party code such as PostGIS or Python.  An
> OS-enforced limit definitely sounds better from here.

Unfortunate but understandable.

--
Bill Moran <wmoran@potentialtech.com>

Re: [GENERAL] Running out of memory the hard way ...

From

Tom Lane

Date:

09 February 2017, 17:15:13

Bill Moran <wmoran@potentialtech.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Bill Moran wrote:
>>> What I feel is the best way to mitigate the situation, is to have some
>>> setting that limits the maximum RAM any backend can consume.

>> I've had some success using ulimit in the past,

> Hopefully having this in the list archives will make the search easier
> for the next person who has this issue. Does anyone know if there are
> any suggestions to this effect in the official documentation? If not,
> I'll try to make some time to submit a patch.

[ grepgrepgrep... ] There is not.  There are several mentions of ulimit
in the docs, but all of them point towards removing unreasonably-tight
default limits, not towards intentionally adding a limit.  I think
something in or near 18.4.2 "Resource Limits" might be appropriate.

            regards, tom lane