Thread: Understanding Postgres Memory Usage

Understanding Postgres Memory Usage

From

Theron Luhn

Date:

25 August 2016, 15:34:22

I have an application that uses Postgres 9.3 as the primary datastore. Like any real-life application, it's not all roses—There are many ugly, convoluted, and inefficient queries.

Some of these queries use quite a bit of memory. I've observed a "high-water mark" behavior in memory usage: running a query increases the worker memory by many MBs (beyond shared buffers), but the memory is not released until the connection is closed. For example, here's the memory usage on my test server when running a query once and leaving the connection open.

$ free -h # Before the query

total used free shared buffers cached

Mem: 7.8G 5.2G 2.6G 212M 90M 4.9G

-/+ buffers/cache: 248M 7.6G

Swap: 0B 0B 0B

$ free -h # After the query

total used free shared buffers cached

Mem: 7.8G 5.3G 2.5G 212M 90M 4.9G

-/+ buffers/cache: 312M 7.5G

Swap: 0B 0B 0B

$ sudo -u postgres smem -trs uss

PID User Command Swap USS PSS RSS

8263 postgres postgres: postgres souschef 0 200204 203977 209540

8133 postgres /usr/lib/postgresql/9.3/bin 0 50456 61090 74596

8266 postgres /usr/bin/python /usr/bin/sm 0 5840 6261 7460

8138 postgres postgres: autovacuum launch 0 776 1146 2968

8139 postgres postgres: stats collector p 0 300 470 1872

8135 postgres postgres: checkpointer proc 0 148 342 1880

8137 postgres postgres: wal writer proces 0 140 322 1812

8136 postgres postgres: writer process 0 132 6814 15140

-------------------------------------------------------------------------------

8 1 0 257996 280422 315268

This is proving to be very troublesome on my production server because I use connection pooling (so connections remain open indefinitely) and the connection memory seems to rise without end, to the point where 25 open connections OOM'd a 4GB server.

So I have a couple questions: Is this high-water mark memory behavior expected? If so, how can I identify the queries that are using lots of memory and driving the high-water mark upwards?

I understand that this post is rather vague, I didn't want to talk your ear off with specifics in case this was pretty basic, well-understood behavior. If necessary, I can follow up with an email diving into the specifics of what I'm observing.

— Theron

Re: Understanding Postgres Memory Usage

From

"Ilya Kazakevich"

Date:

25 August 2016, 15:57:10

$ free -h # Before the query

total used free shared buffers cached

Mem: 7.8G 5.2G 2.6G 212M 90M 4.9G

-/+ buffers/cache: 248M 7.6G

Swap: 0B 0B 0B

$ free -h # After the query

total used free shared buffers cached

Mem: 7.8G 5.3G 2.5G 212M 90M 4.9G

-/+ buffers/cache: 312M 7.5G

Swap: 0B 0B 0B

[I.K >> ] Are you talking about buffers/cache increased? AFAIK this memory is used by kernel as buffer before any block device (HDD for example).

Postgres does not use this memory directly, it simply reads data from block device, and kernel caches it. Process can’t be OOMed because of it.

I am sure you should configure your Postgres to NEVER exceed available RAM. You may use tools like (http://pgtune.leopard.in.ua/) or calculate it manually.

I do not remember exact formula, but it should be something like “work_mem*max_connections + shared_buffers” and it should be around 80% of your machine RAM (minus RAM used by other processes and kernel).

It will save you from OOM.

If you face performance bottleneck after it, you fix it using tools like “log_min_duration_statement”, “track_io_timing” and system-provided tools.

Ilya Kazakevich

JetBrains

http://www.jetbrains.com

The Drive to Develop

Re: Understanding Postgres Memory Usage

From

Tom Lane

Date:

25 August 2016, 16:28:23

Theron Luhn <theron@luhn.com> writes:
> I have an application that uses Postgres 9.3 as the primary datastore.

9.3.which?  We do fix memory leaks from time to time ...

> Some of these queries use quite a bit of memory.  I've observed a
> "high-water mark" behavior in memory usage:  running a query increases the
> worker memory by many MBs (beyond shared buffers), but the memory is not
> released until the connection is closed.

Hm.  I'm not familiar with smem, but assuming that that USS column
really is process-private space, that definitely looks bad.

If it's not an outright leak, it's probably consumption of cache space.
We cache stuff that we've read from system catalogs, so sessions that
touch lots of tables (like thousands) can grow due to that.  Another
possible source of large cache consumption is calling lots-and-lots of
plpgsql functions.

If the same query, repeated over and over, causes memory to continue
to grow, I'd call it a leak (ie bug).  If repeat executions consume
no additional memory then it's probably intentional caching behavior.

            regards, tom lane

Re: Understanding Postgres Memory Usage

From

Theron Luhn

Date:

25 August 2016, 17:55:53

> 9.3.which? We do fix memory leaks from time to time ...

9.3.14

> If it's not an outright leak, it's probably consumption of cache space.

> We cache stuff that we've read from system catalogs, so sessions that

> touch lots of tables (like thousands) can grow due to that. Another

> possible source of large cache consumption is calling lots-and-lots of

> plpgsql functions.

I have a reasonable number of tables (around 50) and very few plpgsql functions.

> If the same query, repeated over and over, causes memory to continue

> to grow, I'd call it a leak (ie bug). If repeat executions consume

> no additional memory then it's probably intentional caching behavior.

Here's the results of that: https://gist.github.com/luhn/e09522d524354d96d297b153d1479c13

So kind of a combination of the two: Memory usage increases up to a certain point but then plateaus. So... cache? It's ~100MB increase, though, which seems an excessive amount. What could be taking up that much cache?

— Theron

On Thu, Aug 25, 2016 at 9:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Theron Luhn <theron@luhn.com> writes:
> I have an application that uses Postgres 9.3 as the primary datastore.

9.3.which? We do fix memory leaks from time to time ...

> Some of these queries use quite a bit of memory. I've observed a
> "high-water mark" behavior in memory usage: running a query increases the
> worker memory by many MBs (beyond shared buffers), but the memory is not
> released until the connection is closed.

Hm. I'm not familiar with smem, but assuming that that USS column
really is process-private space, that definitely looks bad.

If it's not an outright leak, it's probably consumption of cache space.
We cache stuff that we've read from system catalogs, so sessions that
touch lots of tables (like thousands) can grow due to that. Another
possible source of large cache consumption is calling lots-and-lots of
plpgsql functions.

If the same query, repeated over and over, causes memory to continue
to grow, I'd call it a leak (ie bug). If repeat executions consume
no additional memory then it's probably intentional caching behavior.

regards, tom lane

Re: Understanding Postgres Memory Usage

From

Tom Lane

Date:

25 August 2016, 18:26:31

Theron Luhn <theron@luhn.com> writes:
>> If it's not an outright leak, it's probably consumption of cache space.
>> We cache stuff that we've read from system catalogs, so sessions that
>> touch lots of tables (like thousands) can grow due to that.  Another
>> possible source of large cache consumption is calling lots-and-lots of
>> plpgsql functions.

> I have a reasonable number of tables (around 50) and very few plpgsql
> functions.

Doesn't sound like a lot ...

>> If the same query, repeated over and over, causes memory to continue
>> to grow, I'd call it a leak (ie bug).  If repeat executions consume
>> no additional memory then it's probably intentional caching behavior.

> Here's the results of that:
> https://gist.github.com/luhn/e09522d524354d96d297b153d1479c13

> So kind of a combination of the two:  Memory usage increases up to a
> certain point but then plateaus.  So... cache?  It's ~100MB increase,
> though, which seems an excessive amount.  What could be taking up that much
> cache?

Hmm.  I find it mighty suspicious that the USS, PSS, and RSS numbers are
all increasing to pretty much the same tune, ie from very little to circa
100MB.  I think there is a decent chance that smem is not doing what it
says on the tin, and in fact is including shared memory consumption in
"USS".  In which case the apparent leak just corresponds to the process
gradually touching more and more of the shared buffer arena.  (If your
shared_buffers settings is not somewhere near 100MB, then this theory
breaks down.)

It would be worth using plain old top to watch this process.  We have
enough experience with that to be pretty sure how to interpret its
numbers: "RES minus SHR" is the value to be worried about.

            regards, tom lane

Re: Understanding Postgres Memory Usage

From

Kevin Grittner

Date:

25 August 2016, 18:34:56

On Thu, Aug 25, 2016 at 1:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Hmm.  I find it mighty suspicious that the USS, PSS, and RSS numbers are
> all increasing to pretty much the same tune, ie from very little to circa
> 100MB.  I think there is a decent chance that smem is not doing what it
> says on the tin, and in fact is including shared memory consumption in
> "USS".  In which case the apparent leak just corresponds to the process
> gradually touching more and more of the shared buffer arena.  (If your
> shared_buffers settings is not somewhere near 100MB, then this theory
> breaks down.)

I can't speak to every implementation of smem, but I have used it
quite a bit under SLES and Ubuntu, and it always seemed to do what
it says -- USS is unshared (process-local) memory and PSS is that
plus the process's portion of shared memory.  (The sum of
differences between PSS and USS == total shared memory.)  RSS has
the usual meaning.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Understanding Postgres Memory Usage

From

Theron Luhn

Date:

25 August 2016, 18:46:03

> It would be worth using plain old top to watch this process. We have

> enough experience with that to be pretty sure how to interpret its

> numbers: "RES minus SHR" is the value to be worried about.

Sure thing. https://gist.github.com/luhn/e09522d524354d96d297b153d1479c13#file-top-txt

RES - SHR is showing a similar increase to what smem is reporting.

— Theron

On Thu, Aug 25, 2016 at 11:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Theron Luhn <theron@luhn.com> writes:
>> If it's not an outright leak, it's probably consumption of cache space.
>> We cache stuff that we've read from system catalogs, so sessions that
>> touch lots of tables (like thousands) can grow due to that. Another
>> possible source of large cache consumption is calling lots-and-lots of
>> plpgsql functions.

> I have a reasonable number of tables (around 50) and very few plpgsql
> functions.

Doesn't sound like a lot ...

>> If the same query, repeated over and over, causes memory to continue
>> to grow, I'd call it a leak (ie bug). If repeat executions consume
>> no additional memory then it's probably intentional caching behavior.

> Here's the results of that:
> https://gist.github.com/luhn/e09522d524354d96d297b153d1479c13

> So kind of a combination of the two: Memory usage increases up to a
> certain point but then plateaus. So... cache? It's ~100MB increase,
> though, which seems an excessive amount. What could be taking up that much
> cache?

Hmm. I find it mighty suspicious that the USS, PSS, and RSS numbers are
all increasing to pretty much the same tune, ie from very little to circa
100MB. I think there is a decent chance that smem is not doing what it
says on the tin, and in fact is including shared memory consumption in
"USS". In which case the apparent leak just corresponds to the process
gradually touching more and more of the shared buffer arena. (If your
shared_buffers settings is not somewhere near 100MB, then this theory
breaks down.)

It would be worth using plain old top to watch this process. We have
enough experience with that to be pretty sure how to interpret its
numbers: "RES minus SHR" is the value to be worried about.

regards, tom lane

Re: Understanding Postgres Memory Usage

From

Tom Lane

Date:

25 August 2016, 19:35:12

Theron Luhn <theron@luhn.com> writes:
>> It would be worth using plain old top to watch this process.  We have
>> enough experience with that to be pretty sure how to interpret its
>> numbers: "RES minus SHR" is the value to be worried about.

> Sure thing.  https://gist.github.com/luhn/e09522d524354d96d297b153d1479c13#file-top-txt

> RES - SHR is showing a similar increase to what smem is reporting.

Hm, yeah, and the VIRT column agrees --- so 100MB of non-shared
memory went somewhere.  Seems like a lot.

If you have debug symbols installed for this build, you could try
doing

    gdb /path/to/postgres processID
    gdb> call MemoryContextStats(TopMemoryContext)
    gdb> quit

(when the process has reached an idle but bloated state) and seeing what
gets printed to the process's stderr.  (You need to have launched the
postmaster with its stderr directed to a file, not to /dev/null.)
That would provide a better clue about what's eating space.

            regards, tom lane

Re: Understanding Postgres Memory Usage

From

Theron Luhn

Date:

25 August 2016, 20:01:25

Okay, here's the output: https://gist.github.com/luhn/a39db625ba5eed90946dd4a196d12220

— Theron

On Thu, Aug 25, 2016 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Theron Luhn <theron@luhn.com> writes:
>> It would be worth using plain old top to watch this process. We have
>> enough experience with that to be pretty sure how to interpret its
>> numbers: "RES minus SHR" is the value to be worried about.

> Sure thing. https://gist.github.com/luhn/e09522d524354d96d297b153d1479c13#file-top-txt

> RES - SHR is showing a similar increase to what smem is reporting.

Hm, yeah, and the VIRT column agrees --- so 100MB of non-shared
memory went somewhere. Seems like a lot.

If you have debug symbols installed for this build, you could try
doing

gdb /path/to/postgres processID
gdb> call MemoryContextStats(TopMemoryContext)
gdb> quit

(when the process has reached an idle but bloated state) and seeing what
gets printed to the process's stderr. (You need to have launched the
postmaster with its stderr directed to a file, not to /dev/null.)
That would provide a better clue about what's eating space.

regards, tom lane

Re: Understanding Postgres Memory Usage

From

Theron Luhn

Date:

25 August 2016, 20:42:31

Hi Ilya,

> Are you talking about buffers/cache increased? AFAIK this memory is used by kernel as buffer before any block device (HDD for example).

If I'm reading the output correctly, buffers/cached do not increase. I'm looking at the 248MB -> 312MB under the "used" column in the "-/+ buffers/cache" row. This number excludes the buffer/cached, so that can't explain the ~60MB increase. "Shared" also remains the same (212MB), so the shared buffers filling can't explain the increase either.

> I do not remember exact formula, but it should be something like “work_mem*max_connections + shared_buffers” and it should be around 80% of your machine RAM (minus RAM used by other processes and kernel). It will save you from OOM.

My Postgres is configured with *very* conservative values. work_mem (4MB) * max_connections (100) + shared buffers (512MB) = ~1GB, yet Postgres managed to fill up a 4GB server. I'm seeing workers consuming hundreds of MBs of memory (and not releasing any of it until the connection closes), despite work_mem being 4MB.

— Theron

On Thu, Aug 25, 2016 at 8:57 AM, Ilya Kazakevich <Ilya.Kazakevich@jetbrains.com> wrote:

$ free -h # Before the query
total used free shared buffers cached
Mem: 7.8G 5.2G 2.6G 212M 90M 4.9G
-/+ buffers/cache: 248M 7.6G
Swap: 0B 0B 0B
$ free -h # After the query
total used free shared buffers cached
Mem: 7.8G 5.3G 2.5G 212M 90M 4.9G
-/+ buffers/cache: 312M 7.5G
Swap: 0B 0B 0B

[I.K >> ] Are you talking about buffers/cache increased? AFAIK this memory is used by kernel as buffer before any block device (HDD for example).
Postgres does not use this memory directly, it simply reads data from block device, and kernel caches it. Process can’t be OOMed because of it.

I am sure you should configure your Postgres to NEVER exceed available RAM. You may use tools like (http://pgtune.leopard.in.ua/) or calculate it manually.
I do not remember exact formula, but it should be something like “work_mem*max_connections + shared_buffers” and it should be around 80% of your machine RAM (minus RAM used by other processes and kernel).
It will save you from OOM.

If you face performance bottleneck after it, you fix it using tools like “log_min_duration_statement”, “track_io_timing” and system-provided tools.

Ilya Kazakevich

JetBrains
http://www.jetbrains.com
The Drive to Develop

Re: Understanding Postgres Memory Usage

From

John R Pierce

Date:

25 August 2016, 20:50:26

On 8/25/2016 9:58 AM, Theron Luhn wrote:
> > I do not remember exact formula, but it should be something like
> “work_mem*max_connections + shared_buffers” and it should be around
> 80% of your machine RAM (minus RAM used by other processes and
> kernel).  It will save you from OOM.
>

a single query can use multiple work_mem's if its got subqueries, joins,
etc.

> My Postgres is configured with *very* conservative values.  work_mem
> (4MB) * max_connections (100) + shared buffers (512MB) = ~1GB, yet
> Postgres managed to fill up a 4GB server.  I'm seeing workers
> consuming hundreds of MBs of memory (and not releasing any of it until
> the connection closes), despite work_mem being 4MB.

are you doing queries that return large data sets?


--
john r pierce, recycling bits in santa cruz

Re: Understanding Postgres Memory Usage

From

Tom Lane

Date:

26 August 2016, 00:28:06

Theron Luhn <theron@luhn.com> writes:
> Okay, here's the output:
> https://gist.github.com/luhn/a39db625ba5eed90946dd4a196d12220

Hm, well the only thing there that looks even slightly out of the
ordinary is the amount of free space in TopMemoryContext itself:

TopMemoryContext: 3525712 total in 432 blocks; 3444272 free (12654 chunks); 81440 used

Normally, TopMemoryContext doesn't get to more than a few hundred K,
and in the cases I've seen where it does, it's usually been because of
leaky coding that was allocating stuff there and never cleaning it up.
But you've got no more than the typical amount of space still allocated
there, which seems to kill the "leak in TopMemoryContext" theory.
And in any case there is nowhere near 100MB accounted for by the whole
dump.

Are you using any other PLs besides plpgsql?  We've seen cases where
bloat occurred within plpython or plperl, and wasn't visible in this
dump because those languages don't use PG's memory management code.
Or maybe some nonstandard extension?

If not that, then I'd have to speculate that the query you're running is
triggering some bug or otherwise pathological behavior.  Can you put
together a self-contained test case?

            regards, tom lane

Re: Understanding Postgres Memory Usage

From

Theron Luhn

Date:

26 August 2016, 02:44:47

Okay, I got a semi-reproducible test case:

https://gist.github.com/luhn/2b35a9b31255e3a6a2e6a06d1213dfc9

The one caveat is that the memory rise only happens when using a HashAggregate query plan (included in the gist), which I can't find a way to get Postgres to reliably use.

If you need it, I could probably find another test case.

— Theron

On Thu, Aug 25, 2016 at 5:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Theron Luhn <theron@luhn.com> writes:
> Okay, here's the output:
> https://gist.github.com/luhn/a39db625ba5eed90946dd4a196d12220

Hm, well the only thing there that looks even slightly out of the
ordinary is the amount of free space in TopMemoryContext itself:

TopMemoryContext: 3525712 total in 432 blocks; 3444272 free (12654 chunks); 81440 used

Normally, TopMemoryContext doesn't get to more than a few hundred K,
and in the cases I've seen where it does, it's usually been because of
leaky coding that was allocating stuff there and never cleaning it up.
But you've got no more than the typical amount of space still allocated
there, which seems to kill the "leak in TopMemoryContext" theory.
And in any case there is nowhere near 100MB accounted for by the whole
dump.

Are you using any other PLs besides plpgsql? We've seen cases where
bloat occurred within plpython or plperl, and wasn't visible in this
dump because those languages don't use PG's memory management code.
Or maybe some nonstandard extension?

If not that, then I'd have to speculate that the query you're running is
triggering some bug or otherwise pathological behavior. Can you put
together a self-contained test case?

regards, tom lane

Re: Understanding Postgres Memory Usage

From

Tom Lane

Date:

26 August 2016, 12:41:56

Theron Luhn <theron@luhn.com> writes:
> Okay, I got a semi-reproducible test case:
> https://gist.github.com/luhn/2b35a9b31255e3a6a2e6a06d1213dfc9

> The one caveat is that the memory rise only happens when using a
> HashAggregate query plan (included in the gist), which I can't find a way
> to get Postgres to reliably use.

OK, I can reproduce some memory bloat in 9.3, but not in 9.5 and up.
I believe this was fixed by commit b419865a8, which reduced the overhead
of running a lot of instances of array_agg() concurrently in a HashAgg
plan.  I think your options are to live with it or upgrade.  Or I guess
you could turn off enable_hashagg when using array_agg() plus GROUP BY,
though you'd want to remember to undo that whenever you do upgrade.

            regards, tom lane

Re: Understanding Postgres Memory Usage

From

Theron Luhn

Date:

08 September 2016, 23:36:24

I've done the upgrade to 9.5. Memory bloat has reduced to a more manageable level. Most workers have an overhead of <20MB, with one outlier consuming 60MB.

— Theron

On Fri, Aug 26, 2016 at 5:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Theron Luhn <theron@luhn.com> writes:
> Okay, I got a semi-reproducible test case:
> https://gist.github.com/luhn/2b35a9b31255e3a6a2e6a06d1213dfc9

> The one caveat is that the memory rise only happens when using a
> HashAggregate query plan (included in the gist), which I can't find a way
> to get Postgres to reliably use.

OK, I can reproduce some memory bloat in 9.3, but not in 9.5 and up.
I believe this was fixed by commit b419865a8, which reduced the overhead
of running a lot of instances of array_agg() concurrently in a HashAgg
plan. I think your options are to live with it or upgrade. Or I guess
you could turn off enable_hashagg when using array_agg() plus GROUP BY,
though you'd want to remember to undo that whenever you do upgrade.

regards, tom lane