Thread: [GENERAL] Running out of memory the hard way ...
If you run a transaction with lots of server side functions that use a lot of memory, this can trigger the OOM killer in Linux, causing the PosgreSQL backend to receive a SIGKILL and all the associated bad stuff. Tuning the OOM killer is not sufficient. No setting I've found for the OOM killer will guarantee that it won't SIGKILL a process that's essentially untenable anyway (because it's going to use more memory than actually exists on the system at some point anyway). Additionally, "add more RAM" doesn't "solve" the problem, it only delays it until datasets scale up to even larger transactions that use even more memory. This is particularly prevelent with Postgis, because some Postgis functions are very memory intesive, but I'd be willing to bet real money that I could trigger it with just about any stored procedure that allocates memory in such as way that it doesn't get reclaimed until the transaction completes. See as an example: https://trac.osgeo.org/postgis/ticket/3445 If anyone wants to investigate this but is having trouble reproducing, I can construct specific failure scenarios fairly easily. Another workaround is to run the offending statements in smaller transactional batches. This is the best solution I've found so far, but it's not quite ideal. In particular it requires the client program to reimplement transaction guarantees on the client side. Sometimes this isn't necessary, but other times it is. What I feel is the best way to mitigate the situation, is to have some setting that limits the maximum RAM any backend can consume. Attempting to exceed this limit would cause an error and rollback for that particular backend without affecting other backends. This would provide information to the client process that can be rationally interpreted by client code to result in either an error that a developer can understand, or possibly adaptive code that changes behavior to accomodate limits on the server side. My first question: does this setting exist somewhere and I'm simply not finding it for some reason? Assuming this doesn't exist (I haven't found it) my next question is whether there's a philosophical or technical reason that such a feature doesn't exist? Should I take this discussion to -hackers? -- Bill Moran <wmoran@potentialtech.com>
Bill Moran wrote: > If you run a transaction with lots of server side functions that use a > lot of memory, this can trigger the OOM killer in Linux, causing the > PosgreSQL backend to receive a SIGKILL and all the associated bad > stuff. > > Tuning the OOM killer is not sufficient. No setting I've found for the > OOM killer will guarantee that it won't SIGKILL a process that's essentially > untenable anyway (because it's going to use more memory than actually > exists on the system at some point anyway). Additionally, "add more RAM" > doesn't "solve" the problem, it only delays it until datasets > scale up to even larger transactions that use even more memory. I routinely set vm.overcommit_memory = 2 and configure vm.overcommit_ratio so that the kernel does not try to commit more memory than there is in the machine. That should get rid of the problem, of course at the risk of leaving some memory unused. > This is particularly prevelent with Postgis, because some Postgis functions > are very memory intesive, but I'd be willing to bet real money that I could > trigger it with just about any stored procedure that allocates memory in > such as way that it doesn't get reclaimed until the transaction completes. [...] > > What I feel is the best way to mitigate the situation, is to have some > setting that limits the maximum RAM any backend can consume. Attempting to > exceed this limit would cause an error and rollback for that particular > backend without affecting other backends. [...] > > My first question: does this setting exist somewhere and I'm simply not > finding it for some reason? > > Assuming this doesn't exist (I haven't found it) my next question is > whether there's a philosophical or technical reason that such a feature > doesn't exist? Should I take this discussion to -hackers? I don't think that there is such a setting. work_mem sets a limit per operation, but that is a soft limit that PostgreSQL server code can choose to ignore if it pleases. Moreover, it does not limit the *total* memory a backend can use. I'd delegate that problem to the operating system which, after all, should know best of all how much memory a process uses. And I don't see a big advantage in a PostgreSQL generated error message over an "out of memory" error that is propagated from the operating system. Of course, if there is no way to limit the amount of memory per process (excluding shared memory!), you have a point. Yours, Laurenz Albe
Albe Laurenz <laurenz.albe@wien.gv.at> writes: > Bill Moran wrote: >> What I feel is the best way to mitigate the situation, is to have some >> setting that limits the maximum RAM any backend can consume. > I'd delegate that problem to the operating system which, after all, > should know best of all how much memory a process uses. I've had some success using ulimit in the past, although it does have the disadvantage that you have to impose the same limit on every PG process. (You set it before starting the postmaster and it inherits to every child process.) If memory serves, limiting with the -v switch works better than -d or -m on Linux; but I might be misremembering. Conceivably we could add code to let the ulimit be set per-process, if the use-case were strong enough. To implement a limit inside PG, we'd have to add expensive bookkeeping to the palloc/pfree mechanism, and even that would be no panacea because it would fail to account for memory allocated directly from malloc. Hence, you could be pretty certain that it would be wildly inaccurate for sessions using third-party code such as PostGIS or Python. An OS-enforced limit definitely sounds better from here. regards, tom lane
On Wed, Feb 8, 2017 at 7:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Albe Laurenz <laurenz.albe@wien.gv.at> writes:
> Bill Moran wrote:
>> What I feel is the best way to mitigate the situation, is to have some
>> setting that limits the maximum RAM any backend can consume.
> I'd delegate that problem to the operating system which, after all,
> should know best of all how much memory a process uses.
I've had some success using ulimit in the past, although it does have
the disadvantage that you have to impose the same limit on every PG
process. (You set it before starting the postmaster and it inherits
to every child process.) If memory serves, limiting with the -v switch
works better than -d or -m on Linux; but I might be misremembering.
Conceivably we could add code to let the ulimit be set per-process,
if the use-case were strong enough.
To implement a limit inside PG, we'd have to add expensive bookkeeping
to the palloc/pfree mechanism, and even that would be no panacea because
it would fail to account for memory allocated directly from malloc.
Hence, you could be pretty certain that it would be wildly inaccurate
for sessions using third-party code such as PostGIS or Python. An
OS-enforced limit definitely sounds better from here.
Confirming what Tom said, with respect to the specific example in this thread, a large proportion of the allocations in memory hungry bits of PostGIS are in fact using bare malloc via the GEOS library.
P
On Wed, 08 Feb 2017 10:44:24 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Albe Laurenz <laurenz.albe@wien.gv.at> writes: > > Bill Moran wrote: > >> What I feel is the best way to mitigate the situation, is to have some > >> setting that limits the maximum RAM any backend can consume. > > > I'd delegate that problem to the operating system which, after all, > > should know best of all how much memory a process uses. > > I've had some success using ulimit in the past, although it does have > the disadvantage that you have to impose the same limit on every PG > process. (You set it before starting the postmaster and it inherits > to every child process.) If memory serves, limiting with the -v switch > works better than -d or -m on Linux; but I might be misremembering. > Conceivably we could add code to let the ulimit be set per-process, > if the use-case were strong enough. Thanks, Tom. I'm not sure why I didn't think to use this. Although part of the problem may be that most of the links that come up from a google search on this topic don't seem to have this suggestion. Hopefully having this in the list archives will make the search easier for the next person who has this issue. Does anyone know if there are any suggestions to this effect in the official documentation? If not, I'll try to make some time to submit a patch. > To implement a limit inside PG, we'd have to add expensive bookkeeping > to the palloc/pfree mechanism, and even that would be no panacea because > it would fail to account for memory allocated directly from malloc. > Hence, you could be pretty certain that it would be wildly inaccurate > for sessions using third-party code such as PostGIS or Python. An > OS-enforced limit definitely sounds better from here. Unfortunate but understandable. -- Bill Moran <wmoran@potentialtech.com>
Bill Moran <wmoran@potentialtech.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Bill Moran wrote: >>> What I feel is the best way to mitigate the situation, is to have some >>> setting that limits the maximum RAM any backend can consume. >> I've had some success using ulimit in the past, > Hopefully having this in the list archives will make the search easier > for the next person who has this issue. Does anyone know if there are > any suggestions to this effect in the official documentation? If not, > I'll try to make some time to submit a patch. [ grepgrepgrep... ] There is not. There are several mentions of ulimit in the docs, but all of them point towards removing unreasonably-tight default limits, not towards intentionally adding a limit. I think something in or near 18.4.2 "Resource Limits" might be appropriate. regards, tom lane