Thread: Limit allocated memory per session

Limit allocated memory per session

From

daveg

Date:

01 October 2009, 07:34:58

I'd like to propose adding a new GUC to limit the amount of memory a backend
can allocate for its own use. The problem this addresses is that sometimes
one needs to set work_mem fairly high to get good query plans for large joins.
However, some complex queries will then use huge amounts of memory so that
one or a few of them will consume all the memory on the host and run it deep
into swap or trigger the oom killer or worse.

I've attached a patch based on 8.4.1. It works by keeping a track of the
total memory allocated via malloc to AllocBlocks (aset.c). If this is not
shot down/up too badly I will rebase it on CVS and submit it for the next
commit fest.

I would also like to propose a similar limit on temp space use. It is quite
easy for an unintended cartesion product to use hundreds of gigabytes of
scratch space and cause other processes to fail due to lack of disk space.
If this is not objectionable, I'll work on it too.

-dg

--
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Attachment

max_allocated_mem.patch

Re: Limit allocated memory per session

From

Tom Lane

Date:

01 October 2009, 11:36:05

daveg <daveg@sonic.net> writes:
> I'd like to propose adding a new GUC to limit the amount of memory a backend
> can allocate for its own use.

Use ulimit.
        regards, tom lane

Re: Limit allocated memory per session

From

Alvaro Herrera

Date:

01 October 2009, 11:49:30

daveg wrote:
> 
> I'd like to propose adding a new GUC to limit the amount of memory a backend
> can allocate for its own use. The problem this addresses is that sometimes
> one needs to set work_mem fairly high to get good query plans for large joins.
> However, some complex queries will then use huge amounts of memory so that
> one or a few of them will consume all the memory on the host and run it deep
> into swap or trigger the oom killer or worse.

Oh, BTW, did anyone get interested in adding the bits to disable the OOM
killer for postmaster on the various Linux initscripts?  It needs some
games with /proc/<pid>/oom_adj and requires root privileges, but I think
an initscript is in an excellent position to do it.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Limit allocated memory per session

From

Euler Taveira de Oliveira

Date:

01 October 2009, 12:39:32

Tom Lane escreveu:
> daveg <daveg@sonic.net> writes:
>> I'd like to propose adding a new GUC to limit the amount of memory a backend
>> can allocate for its own use.
> 
> Use ulimit.
> 
What about plataforms (Windows) that don't have ulimit?


--  Euler Taveira de Oliveira http://www.timbira.com/

Re: Limit allocated memory per session

From

Bill Moran

Date:

01 October 2009, 12:46:57

In response to Euler Taveira de Oliveira <euler@timbira.com>:

> Tom Lane escreveu:
> > daveg <daveg@sonic.net> writes:
> >> I'd like to propose adding a new GUC to limit the amount of memory a backend
> >> can allocate for its own use.
> > 
> > Use ulimit.
>
> What about plataforms (Windows) that don't have ulimit?

I have a hard time believing that Windows doesn't have a ulimit equivalent.

I don't want to degrade this thread into MS-bashing, but if that were the
case, it would make Windows a pretty crippled OS.

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: Limit allocated memory per session

From

Tom Lane

Date:

01 October 2009, 12:47:52

Euler Taveira de Oliveira <euler@timbira.com> writes:
> Tom Lane escreveu:
>> daveg <daveg@sonic.net> writes:
>>> I'd like to propose adding a new GUC to limit the amount of memory a backend
>>> can allocate for its own use.
>> 
>> Use ulimit.
>> 
> What about plataforms (Windows) that don't have ulimit?

Get a real operating system ;-)

Seriously, the proposed patch introduces overhead into a place that is
already a known hot spot, in return for not much of anything.  It will
*not* bound backend memory use very accurately, because there is no way
to track raw malloc() calls.  And I think that 99% of users will not
find it useful.
        regards, tom lane

Re: Limit allocated memory per session

From

Robert Haas

Date:

01 October 2009, 12:51:07

On Thu, Oct 1, 2009 at 11:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Euler Taveira de Oliveira <euler@timbira.com> writes:
>> Tom Lane escreveu:
>>> daveg <daveg@sonic.net> writes:
>>>> I'd like to propose adding a new GUC to limit the amount of memory a backend
>>>> can allocate for its own use.
>>>
>>> Use ulimit.
>>>
>> What about plataforms (Windows) that don't have ulimit?
>
> Get a real operating system ;-)
>
> Seriously, the proposed patch introduces overhead into a place that is
> already a known hot spot, in return for not much of anything.  It will
> *not* bound backend memory use very accurately, because there is no way
> to track raw malloc() calls.  And I think that 99% of users will not
> find it useful.

What WOULD be useful is to find a way to provide a way to configure
work_mem per backend rather than per executor node.  But that's a much
harder problem.

...Robert

Re: Limit allocated memory per session

From

Tom Lane

Date:

01 October 2009, 13:18:30

Robert Haas <robertmhaas@gmail.com> writes:
> What WOULD be useful is to find a way to provide a way to configure
> work_mem per backend rather than per executor node.  But that's a much
> harder problem.

I think it's mostly a planner problem: how do you deal with the fact
that that would make cost estimates for different sub-problems
interrelated?  The cost of a hash, for instance, depends a lot on how
much memory you assume it can use.
        regards, tom lane

Re: Limit allocated memory per session

From

Euler Taveira de Oliveira

Date:

01 October 2009, 13:23:01

Robert Haas escreveu:
> On Thu, Oct 1, 2009 at 11:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Euler Taveira de Oliveira <euler@timbira.com> writes:
>>> Tom Lane escreveu:
>>>> daveg <daveg@sonic.net> writes:
>>>>> I'd like to propose adding a new GUC to limit the amount of memory a backend
>>>>> can allocate for its own use.
>>>> Use ulimit.
>>>>
>>> What about plataforms (Windows) that don't have ulimit?
>> Get a real operating system ;-)
>>
>> Seriously, the proposed patch introduces overhead into a place that is
>> already a known hot spot, in return for not much of anything.  It will
>> *not* bound backend memory use very accurately, because there is no way
>> to track raw malloc() calls.  And I think that 99% of users will not
>> find it useful.
> 
> What WOULD be useful is to find a way to provide a way to configure
> work_mem per backend rather than per executor node.  But that's a much
> harder problem.
> 
I see. Tough problem is: how do we get per backend memory usage accurately? Is
it relying on OS specific API the only way?


--  Euler Taveira de Oliveira http://www.timbira.com/

Re: Limit allocated memory per session

From

Tom Lane

Date:

01 October 2009, 13:37:14

Euler Taveira de Oliveira <euler@timbira.com> writes:
> I see. Tough problem is: how do we get per backend memory usage accurately? Is
> it relying on OS specific API the only way?

Given all the third-party libraries (perl, python, libxml2, yadda yadda)
that can be in use and won't go through palloc, I think that this would
have to be done at the OS level to be very meaningful.

The other problem is the one Robert touched on: what you actually *want*
is something entirely different, namely for the backend to actively try
to meet an overall target for its memory usage, rather than having
queries fail ungracefully when they hit an arbitrary limit that the
planner didn't even know about.
        regards, tom lane

Re: Limit allocated memory per session

From

Euler Taveira de Oliveira

Date:

01 October 2009, 13:46:41

Tom Lane escreveu:
> Robert Haas <robertmhaas@gmail.com> writes:
>> What WOULD be useful is to find a way to provide a way to configure
>> work_mem per backend rather than per executor node.  But that's a much
>> harder problem.
> 
> I think it's mostly a planner problem: how do you deal with the fact
> that that would make cost estimates for different sub-problems
> interrelated?  The cost of a hash, for instance, depends a lot on how
> much memory you assume it can use.
> 
It could introduce some complexity but you could track (subtract) the memory
usage as you're walking up the tree. Also, you need to decide what to do when
you have more than one node per level. :( How do you deal with priority in
this case?


--  Euler Taveira de Oliveira http://www.timbira.com/

Re: Limit allocated memory per session

From

Tom Lane

Date:

01 October 2009, 14:13:29

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Oh, BTW, did anyone get interested in adding the bits to disable the OOM
> killer for postmaster on the various Linux initscripts?  It needs some
> games with /proc/<pid>/oom_adj and requires root privileges, but I think
> an initscript is in an excellent position to do it.

I was imagining that this would be something for individual distros
to tackle.  It's probably not portable enough to go into the
contrib/start-scripts examples.  On the other hand, it'd make lots
of sense to have the Fedora or Debian or whatever scripts do this,
since they know what kernel version they're targeting.  (If anyone
wants to send me the fixes to make Fedora's script do this ...)
        regards, tom lane

Re: Limit allocated memory per session

From

Stephen Frost

Date:

01 October 2009, 14:16:36

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> I was imagining that this would be something for individual distros
> to tackle.  It's probably not portable enough to go into the
> contrib/start-scripts examples.  On the other hand, it'd make lots
> of sense to have the Fedora or Debian or whatever scripts do this,
> since they know what kernel version they're targeting.  (If anyone
> wants to send me the fixes to make Fedora's script do this ...)

I'm not exactly keen on Debian init scripts hacking kernel settings.
Should it hack up the shared memory numbers too?  This is not what I
would consider 'init script' material for specific applications.
Thanks,
    Stephen

Re: Limit allocated memory per session

From

Alvaro Herrera

Date:

01 October 2009, 14:20:42

Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Oh, BTW, did anyone get interested in adding the bits to disable the OOM
> > killer for postmaster on the various Linux initscripts?  It needs some
> > games with /proc/<pid>/oom_adj and requires root privileges, but I think
> > an initscript is in an excellent position to do it.
> 
> I was imagining that this would be something for individual distros
> to tackle.  It's probably not portable enough to go into the
> contrib/start-scripts examples.

Hmm?  I think it should be just (as root)

if [ -f /proc/$pid_of_postmaster/oom_adj ]; thenecho -17 > /proc/$pid_of_postmaster/oom_adj
fi

This is supported from 2.6.11 onwards AFAIK.  If the kernel is older
than that, the file would not exist and this would be a noop.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Limit allocated memory per session

From

Tom Lane

Date:

01 October 2009, 14:42:19

Stephen Frost <sfrost@snowman.net> writes:
> I'm not exactly keen on Debian init scripts hacking kernel settings.
> Should it hack up the shared memory numbers too?  This is not what I
> would consider 'init script' material for specific applications.

What was suggested was tweaking the oom_adj setting for the postmaster
process only, not messing with any system-wide settings.  Do you really
find that unreasonable?  The default OOM killer behavior is just about
as unreasonable as can be :-(
        regards, tom lane

Re: Limit allocated memory per session

From

daveg

Date:

01 October 2009, 16:40:43

On Thu, Oct 01, 2009 at 10:35:55AM -0400, Tom Lane wrote:
> daveg <daveg@sonic.net> writes:
> > I'd like to propose adding a new GUC to limit the amount of memory a backend
> > can allocate for its own use.
> 
> Use ulimit.

That was my initial thought too. However, ulimit() is documented as superceded
by setrlimit(). Which has the option RLIMIT_DATA to limit the size of the data
segment. Perfect!

Except, RLIMIT_DATA does not appear to work on linux. The call succeeds and
the new value can even be read back with getrlimit(), but it does not seem
to do anything to actually limit the memory allocated. I tested this on
SuSE 11: kernel 2.6.25, and Ubuntu Intrepid: kernel 2.6.28.

Setting RLIMIT_AS to limit the total address space for a process works as
expected. However this seems undesireable for postgresql as it can also cause
stack expansion to fail, which would then force a general restart. Also,
this limit would interact with the buffercache size setting as it includes
the shared address space as well.

-dg

-- 
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Re: Limit allocated memory per session

From

daveg

Date:

01 October 2009, 16:56:12

On Thu, Oct 01, 2009 at 11:47:43AM -0400, Tom Lane wrote:
> Euler Taveira de Oliveira <euler@timbira.com> writes:
> > Tom Lane escreveu:
> >> daveg <daveg@sonic.net> writes:
> >>> I'd like to propose adding a new GUC to limit the amount of memory a backend
> >>> can allocate for its own use.
> >> 
> >> Use ulimit.

> Seriously, the proposed patch introduces overhead into a place that is
> already a known hot spot, in return for not much of anything.  It will

The overhead is simply an integer addition and compare with values that are
likely already in processor caches. And this only occurs when we actually
call malloc() to get a new block, not on every palloc. So I suspect it will
not be noticable. However, I welcome any suggestion on how to test this
and actually measure the overhead if any. pg_bench? Something else?

> *not* bound backend memory use very accurately, because there is no way
> to track raw malloc() calls. And I think that 99% of users will
>  not find it useful.

The use case that motivated is a client that runs many postgresql instances
with a mostly batch/large query workload. Some of the queries are code
generated by an application and can be very complex.  A few times a month
one of these will run through 64GB of memory and oom the host. So it
seriously hurts production. Setting work_mem low enough to prevent this
results in poor query performance.

This client does not use any outside libraries that call malloc() directly.

-dg

-- 
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Re: Limit allocated memory per session

From

Robert Haas

Date:

01 October 2009, 22:18:35

On Thu, Oct 1, 2009 at 12:15 PM, Euler Taveira de Oliveira
<euler@timbira.com> wrote:
> Robert Haas escreveu:
>> On Thu, Oct 1, 2009 at 11:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Euler Taveira de Oliveira <euler@timbira.com> writes:
>>>> Tom Lane escreveu:
>>>>> daveg <daveg@sonic.net> writes:
>>>>>> I'd like to propose adding a new GUC to limit the amount of memory a backend
>>>>>> can allocate for its own use.
>>>>> Use ulimit.
>>>>>
>>>> What about plataforms (Windows) that don't have ulimit?
>>> Get a real operating system ;-)
>>>
>>> Seriously, the proposed patch introduces overhead into a place that is
>>> already a known hot spot, in return for not much of anything.  It will
>>> *not* bound backend memory use very accurately, because there is no way
>>> to track raw malloc() calls.  And I think that 99% of users will not
>>> find it useful.
>>
>> What WOULD be useful is to find a way to provide a way to configure
>> work_mem per backend rather than per executor node.  But that's a much
>> harder problem.
>>
> I see. Tough problem is: how do we get per backend memory usage accurately? Is
> it relying on OS specific API the only way?

As I see it, this is really a planning problem, not an executor
problem, so measuring ACTUAL memory usage is not really important: the
problem is taking memory usage into account during planning.  The
difficulty with adjusting work_mem right now is that the correct value
depends not only on the number of queries that are concurrently
executing (which isn't a constant) but also on the number of sort/hash
operations being performed per query (which is also not a constant).
So if your queries become more complex, a value of work_mem that was
previously OK may start to cause swapping, which encourages setting
work_mem conservatively.  But setting it conservatively can cause the
planner to pick plans that save memory at a LARGE performance cost.

Fixing this isn't simple.  Right now, when planning a particular
joinrel, we only keep track of the best plans for each possible set of
path keys, regardless of how much or little memory they use.  So if we
do something naive, like just track the total amount of memory that
each candidate path is forecast to use and avoid letting it go above
some ceiling, query planning might fail altogether, because the
lower-level joinrels use as much memory as they want and the higher
level nodes, which for some reason can't be done without memory, can't
be planned.  Or we might just end up with a badly suboptimal plan,
because we pick a slightly cheaper plan lower down in the tree that
uses a LOT more memory over a slightly more expensive one that uses
much less.  Later we'll wish we hadn't, but by that point it's too
late.

Another possible angle of attack is to try to give the planner a range
for work_mem rather than a hard limit.  The planner would ordinarily
construct paths as though the lower end of the range was the limit,
but for a sufficiently large cost savings it would be willing to adopt
a path that used more memory.  Potentially this willingness could also
be conditioned on the amount of memory used by the path so far,
although that has the same problems described above in kind if not in
degree.  I'm not really sure whether something like this can be made
to work; I'm not sure there's really enough information available when
constructing paths for any sort of local decision-making to prove
fruitful.

The other idea I have is to adopt a strategy where each plan node has
upper and lower bounds on cost, as I previously suggested here with
respect to index-only scans.

http://archives.postgresql.org/pgsql-hackers/2009-09/msg01379.php

The idea would basically be to estimate the lower-bound for the cost
of a sort based on the idea that we'll have the maximum possible
amount of memory to work with (say, the budget for the whole query)
and the upper-bound cost based on the idea that we'll have the minimum
possible amount of memory (zero, or whatever the minimal amount is).
We can also estimate the most memory we think we can usefully use (for
example, a hash join with a smaller inner rel doesn't benefit from
more memory than the amount required to hold the entire hash table in
memory).

After we complete the first round of planning, we look at the
resulting paths and decide which sorts or hashes will get funded with
how much memory.  I'm hand-waving a little bit here, because there may
be a knapsack problem in here (which is NP-complete), since the cost
as a function of memory probably has sharp cliffs with not much change
in between them - certainly for hashing, and I suspect for sorting as
well, but it might be that in practice N is small enough not to
matter, or we might be able to find an approximation that is good
enough that we can live with it.  Even if we can get past that hurdle,
though, there's still all the caveats from the original email,
principally that it's unclear that the necessary computations can be
done without blowing planning time out of the water.  Plus, if we used
this strategy for multiple purposes, like position of heap fetch nodes
and also allocation of work memory, there could be interdependencies
that would turn the whole thing into a giant mess.

So to reiterate my first comment: a MUCH harder problem.

...Robert