Thread: Understanding Postgres Memory Usage

Understanding Postgres Memory Usage

From
Theron Luhn
Date:
I have an application that uses Postgres 9.3 as the primary datastore.  Like any real-life application, it's not all roses—There are many ugly, convoluted, and inefficient queries.

Some of these queries use quite a bit of memory.  I've observed a "high-water mark" behavior in memory usage:  running a query increases the worker memory by many MBs (beyond shared buffers), but the memory is not released until the connection is closed.  For example, here's the memory usage on my test server when running a query once and leaving the connection open.

$ free -h  # Before the query
             total       used       free     shared    buffers     cached
Mem:          7.8G       5.2G       2.6G       212M        90M       4.9G
-/+ buffers/cache:       248M       7.6G
Swap:           0B         0B         0B
$ free -h  # After the query
             total       used       free     shared    buffers     cached
Mem:          7.8G       5.3G       2.5G       212M        90M       4.9G
-/+ buffers/cache:       312M       7.5G
Swap:           0B         0B         0B
$ sudo -u postgres smem -trs uss
  PID User     Command                         Swap      USS      PSS      RSS
 8263 postgres postgres: postgres souschef        0   200204   203977   209540
 8133 postgres /usr/lib/postgresql/9.3/bin        0    50456    61090    74596
 8266 postgres /usr/bin/python /usr/bin/sm        0     5840     6261     7460
 8138 postgres postgres: autovacuum launch        0      776     1146     2968
 8139 postgres postgres: stats collector p        0      300      470     1872
 8135 postgres postgres: checkpointer proc        0      148      342     1880
 8137 postgres postgres: wal writer proces        0      140      322     1812
 8136 postgres postgres: writer process           0      132     6814    15140
-------------------------------------------------------------------------------
    8 1                                           0   257996   280422   315268

This is proving to be very troublesome on my production server because I use connection pooling (so connections remain open indefinitely) and the connection memory seems to rise without end, to the point where 25 open connections OOM'd a 4GB server.

So I have a couple questions:  Is this high-water mark memory behavior expected?  If so, how can I identify the queries that are using lots of memory and driving the high-water mark upwards?

I understand that this post is rather vague, I didn't want to talk your ear off with specifics in case this was pretty basic, well-understood behavior.  If necessary, I can follow up with an email diving into the specifics of what I'm observing.

— Theron

Re: Understanding Postgres Memory Usage

From
"Ilya Kazakevich"
Date:

$ free -h  # Before the query

             total       used       free     shared    buffers     cached

Mem:          7.8G       5.2G       2.6G       212M        90M       4.9G

-/+ buffers/cache:       248M       7.6G

Swap:           0B         0B         0B

$ free -h  # After the query

             total       used       free     shared    buffers     cached

Mem:          7.8G       5.3G       2.5G       212M        90M       4.9G

-/+ buffers/cache:       312M       7.5G

Swap:           0B         0B         0B

 

[I.K >> ] Are you talking about buffers/cache increased? AFAIK this memory is used by kernel as buffer before any block device (HDD for example).

Postgres does not use this memory directly, it simply reads data from block device, and kernel caches it. Process can’t be OOMed because of it.

 

 

I am sure you should configure your Postgres to NEVER exceed available RAM. You may use tools like (http://pgtune.leopard.in.ua/) or calculate it manually.

I do not remember exact formula, but it should be something like “work_mem*max_connections + shared_buffers” and it should be around 80% of your machine RAM (minus RAM used by other processes and kernel).

It will save you from OOM.

 

If you  face performance bottleneck after it, you fix it using tools like “log_min_duration_statement”, “track_io_timing” and system-provided tools.

 

 

 

 

Ilya Kazakevich

 

JetBrains

http://www.jetbrains.com

The Drive to Develop

 

 

Re: Understanding Postgres Memory Usage

From
Tom Lane
Date:
Theron Luhn <theron@luhn.com> writes:
> I have an application that uses Postgres 9.3 as the primary datastore.

9.3.which?  We do fix memory leaks from time to time ...

> Some of these queries use quite a bit of memory.  I've observed a
> "high-water mark" behavior in memory usage:  running a query increases the
> worker memory by many MBs (beyond shared buffers), but the memory is not
> released until the connection is closed.

Hm.  I'm not familiar with smem, but assuming that that USS column
really is process-private space, that definitely looks bad.

If it's not an outright leak, it's probably consumption of cache space.
We cache stuff that we've read from system catalogs, so sessions that
touch lots of tables (like thousands) can grow due to that.  Another
possible source of large cache consumption is calling lots-and-lots of
plpgsql functions.

If the same query, repeated over and over, causes memory to continue
to grow, I'd call it a leak (ie bug).  If repeat executions consume
no additional memory then it's probably intentional caching behavior.

            regards, tom lane


Re: Understanding Postgres Memory Usage

From
Theron Luhn
Date:
> 9.3.which?  We do fix memory leaks from time to time ...

9.3.14

> If it's not an outright leak, it's probably consumption of cache space.
> We cache stuff that we've read from system catalogs, so sessions that
> touch lots of tables (like thousands) can grow due to that.  Another
> possible source of large cache consumption is calling lots-and-lots of
> plpgsql functions.

I have a reasonable number of tables (around 50) and very few plpgsql functions.

> If the same query, repeated over and over, causes memory to continue
> to grow, I'd call it a leak (ie bug).  If repeat executions consume
> no additional memory then it's probably intentional caching behavior.


So kind of a combination of the two:  Memory usage increases up to a certain point but then plateaus.  So... cache?  It's ~100MB increase, though, which seems an excessive amount.  What could be taking up that much cache?


— Theron

On Thu, Aug 25, 2016 at 9:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Theron Luhn <theron@luhn.com> writes:
> I have an application that uses Postgres 9.3 as the primary datastore.

9.3.which?  We do fix memory leaks from time to time ...

> Some of these queries use quite a bit of memory.  I've observed a
> "high-water mark" behavior in memory usage:  running a query increases the
> worker memory by many MBs (beyond shared buffers), but the memory is not
> released until the connection is closed.

Hm.  I'm not familiar with smem, but assuming that that USS column
really is process-private space, that definitely looks bad.

If it's not an outright leak, it's probably consumption of cache space.
We cache stuff that we've read from system catalogs, so sessions that
touch lots of tables (like thousands) can grow due to that.  Another
possible source of large cache consumption is calling lots-and-lots of
plpgsql functions.

If the same query, repeated over and over, causes memory to continue
to grow, I'd call it a leak (ie bug).  If repeat executions consume
no additional memory then it's probably intentional caching behavior.

                        regards, tom lane

Re: Understanding Postgres Memory Usage

From
Tom Lane
Date:
Theron Luhn <theron@luhn.com> writes:
>> If it's not an outright leak, it's probably consumption of cache space.
>> We cache stuff that we've read from system catalogs, so sessions that
>> touch lots of tables (like thousands) can grow due to that.  Another
>> possible source of large cache consumption is calling lots-and-lots of
>> plpgsql functions.

> I have a reasonable number of tables (around 50) and very few plpgsql
> functions.

Doesn't sound like a lot ...


>> If the same query, repeated over and over, causes memory to continue
>> to grow, I'd call it a leak (ie bug).  If repeat executions consume
>> no additional memory then it's probably intentional caching behavior.

> Here's the results of that:
> https://gist.github.com/luhn/e09522d524354d96d297b153d1479c13

> So kind of a combination of the two:  Memory usage increases up to a
> certain point but then plateaus.  So... cache?  It's ~100MB increase,
> though, which seems an excessive amount.  What could be taking up that much
> cache?

Hmm.  I find it mighty suspicious that the USS, PSS, and RSS numbers are
all increasing to pretty much the same tune, ie from very little to circa
100MB.  I think there is a decent chance that smem is not doing what it
says on the tin, and in fact is including shared memory consumption in
"USS".  In which case the apparent leak just corresponds to the process
gradually touching more and more of the shared buffer arena.  (If your
shared_buffers settings is not somewhere near 100MB, then this theory
breaks down.)

It would be worth using plain old top to watch this process.  We have
enough experience with that to be pretty sure how to interpret its
numbers: "RES minus SHR" is the value to be worried about.

            regards, tom lane


Re: Understanding Postgres Memory Usage

From
Kevin Grittner
Date:
On Thu, Aug 25, 2016 at 1:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Hmm.  I find it mighty suspicious that the USS, PSS, and RSS numbers are
> all increasing to pretty much the same tune, ie from very little to circa
> 100MB.  I think there is a decent chance that smem is not doing what it
> says on the tin, and in fact is including shared memory consumption in
> "USS".  In which case the apparent leak just corresponds to the process
> gradually touching more and more of the shared buffer arena.  (If your
> shared_buffers settings is not somewhere near 100MB, then this theory
> breaks down.)

I can't speak to every implementation of smem, but I have used it
quite a bit under SLES and Ubuntu, and it always seemed to do what
it says -- USS is unshared (process-local) memory and PSS is that
plus the process's portion of shared memory.  (The sum of
differences between PSS and USS == total shared memory.)  RSS has
the usual meaning.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Understanding Postgres Memory Usage

From
Theron Luhn
Date:
> It would be worth using plain old top to watch this process.  We have
> enough experience with that to be pretty sure how to interpret its
> numbers: "RES minus SHR" is the value to be worried about.


RES - SHR is showing a similar increase to what smem is reporting.

— Theron

On Thu, Aug 25, 2016 at 11:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Theron Luhn <theron@luhn.com> writes:
>> If it's not an outright leak, it's probably consumption of cache space.
>> We cache stuff that we've read from system catalogs, so sessions that
>> touch lots of tables (like thousands) can grow due to that.  Another
>> possible source of large cache consumption is calling lots-and-lots of
>> plpgsql functions.

> I have a reasonable number of tables (around 50) and very few plpgsql
> functions.

Doesn't sound like a lot ...


>> If the same query, repeated over and over, causes memory to continue
>> to grow, I'd call it a leak (ie bug).  If repeat executions consume
>> no additional memory then it's probably intentional caching behavior.

> Here's the results of that:
> https://gist.github.com/luhn/e09522d524354d96d297b153d1479c13

> So kind of a combination of the two:  Memory usage increases up to a
> certain point but then plateaus.  So... cache?  It's ~100MB increase,
> though, which seems an excessive amount.  What could be taking up that much
> cache?

Hmm.  I find it mighty suspicious that the USS, PSS, and RSS numbers are
all increasing to pretty much the same tune, ie from very little to circa
100MB.  I think there is a decent chance that smem is not doing what it
says on the tin, and in fact is including shared memory consumption in
"USS".  In which case the apparent leak just corresponds to the process
gradually touching more and more of the shared buffer arena.  (If your
shared_buffers settings is not somewhere near 100MB, then this theory
breaks down.)

It would be worth using plain old top to watch this process.  We have
enough experience with that to be pretty sure how to interpret its
numbers: "RES minus SHR" is the value to be worried about.

                        regards, tom lane

Re: Understanding Postgres Memory Usage

From
Tom Lane
Date:
Theron Luhn <theron@luhn.com> writes:
>> It would be worth using plain old top to watch this process.  We have
>> enough experience with that to be pretty sure how to interpret its
>> numbers: "RES minus SHR" is the value to be worried about.

> Sure thing.  https://gist.github.com/luhn/e09522d524354d96d297b153d1479c13#file-top-txt

> RES - SHR is showing a similar increase to what smem is reporting.

Hm, yeah, and the VIRT column agrees --- so 100MB of non-shared
memory went somewhere.  Seems like a lot.

If you have debug symbols installed for this build, you could try
doing

    gdb /path/to/postgres processID
    gdb> call MemoryContextStats(TopMemoryContext)
    gdb> quit

(when the process has reached an idle but bloated state) and seeing what
gets printed to the process's stderr.  (You need to have launched the
postmaster with its stderr directed to a file, not to /dev/null.)
That would provide a better clue about what's eating space.

            regards, tom lane


Re: Understanding Postgres Memory Usage

From
Theron Luhn
Date:


— Theron

On Thu, Aug 25, 2016 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Theron Luhn <theron@luhn.com> writes:
>> It would be worth using plain old top to watch this process.  We have
>> enough experience with that to be pretty sure how to interpret its
>> numbers: "RES minus SHR" is the value to be worried about.

> Sure thing.  https://gist.github.com/luhn/e09522d524354d96d297b153d1479c13#file-top-txt

> RES - SHR is showing a similar increase to what smem is reporting.

Hm, yeah, and the VIRT column agrees --- so 100MB of non-shared
memory went somewhere.  Seems like a lot.

If you have debug symbols installed for this build, you could try
doing

        gdb /path/to/postgres processID
        gdb> call MemoryContextStats(TopMemoryContext)
        gdb> quit

(when the process has reached an idle but bloated state) and seeing what
gets printed to the process's stderr.  (You need to have launched the
postmaster with its stderr directed to a file, not to /dev/null.)
That would provide a better clue about what's eating space.

                        regards, tom lane

Re: Understanding Postgres Memory Usage

From
Theron Luhn
Date:
Hi Ilya,

> Are you talking about buffers/cache increased? AFAIK this memory is used by kernel as buffer before any block device (HDD for example).

If I'm reading the output correctly, buffers/cached do not increase.  I'm looking at the 248MB -> 312MB under the "used" column in the "-/+ buffers/cache" row.  This number excludes the buffer/cached, so that can't explain the ~60MB increase.  "Shared" also remains the same (212MB), so the shared buffers filling can't explain the increase either.

> I do not remember exact formula, but it should be something like “work_mem*max_connections + shared_buffers” and it should be around 80% of your machine RAM (minus RAM used by other processes and kernel).  It will save you from OOM.

My Postgres is configured with *very* conservative values.  work_mem (4MB) * max_connections (100) + shared buffers (512MB) = ~1GB, yet Postgres managed to fill up a 4GB server.  I'm seeing workers consuming hundreds of MBs of memory (and not releasing any of it until the connection closes), despite work_mem being 4MB.


— Theron

On Thu, Aug 25, 2016 at 8:57 AM, Ilya Kazakevich <Ilya.Kazakevich@jetbrains.com> wrote:

$ free -h  # Before the query

             total       used       free     shared    buffers     cached

Mem:          7.8G       5.2G       2.6G       212M        90M       4.9G

-/+ buffers/cache:       248M       7.6G

Swap:           0B         0B         0B

$ free -h  # After the query

             total       used       free     shared    buffers     cached

Mem:          7.8G       5.3G       2.5G       212M        90M       4.9G

-/+ buffers/cache:       312M       7.5G

Swap:           0B         0B         0B

 

[I.K >> ] Are you talking about buffers/cache increased? AFAIK this memory is used by kernel as buffer before any block device (HDD for example).

Postgres does not use this memory directly, it simply reads data from block device, and kernel caches it. Process can’t be OOMed because of it.

 

 

I am sure you should configure your Postgres to NEVER exceed available RAM. You may use tools like (http://pgtune.leopard.in.ua/) or calculate it manually.

I do not remember exact formula, but it should be something like “work_mem*max_connections + shared_buffers” and it should be around 80% of your machine RAM (minus RAM used by other processes and kernel).

It will save you from OOM.

 

If you  face performance bottleneck after it, you fix it using tools like “log_min_duration_statement”, “track_io_timing” and system-provided tools.

 

 

 

 

Ilya Kazakevich

 

JetBrains

http://www.jetbrains.com

The Drive to Develop

 

 


Re: Understanding Postgres Memory Usage

From
John R Pierce
Date:
On 8/25/2016 9:58 AM, Theron Luhn wrote:
> > I do not remember exact formula, but it should be something like
> “work_mem*max_connections + shared_buffers” and it should be around
> 80% of your machine RAM (minus RAM used by other processes and
> kernel).  It will save you from OOM.
>

a single query can use multiple work_mem's if its got subqueries, joins,
etc.

> My Postgres is configured with *very* conservative values.  work_mem
> (4MB) * max_connections (100) + shared buffers (512MB) = ~1GB, yet
> Postgres managed to fill up a 4GB server.  I'm seeing workers
> consuming hundreds of MBs of memory (and not releasing any of it until
> the connection closes), despite work_mem being 4MB.

are you doing queries that return large data sets?


--
john r pierce, recycling bits in santa cruz



Re: Understanding Postgres Memory Usage

From
Tom Lane
Date:
Theron Luhn <theron@luhn.com> writes:
> Okay, here's the output:
> https://gist.github.com/luhn/a39db625ba5eed90946dd4a196d12220

Hm, well the only thing there that looks even slightly out of the
ordinary is the amount of free space in TopMemoryContext itself:

TopMemoryContext: 3525712 total in 432 blocks; 3444272 free (12654 chunks); 81440 used

Normally, TopMemoryContext doesn't get to more than a few hundred K,
and in the cases I've seen where it does, it's usually been because of
leaky coding that was allocating stuff there and never cleaning it up.
But you've got no more than the typical amount of space still allocated
there, which seems to kill the "leak in TopMemoryContext" theory.
And in any case there is nowhere near 100MB accounted for by the whole
dump.

Are you using any other PLs besides plpgsql?  We've seen cases where
bloat occurred within plpython or plperl, and wasn't visible in this
dump because those languages don't use PG's memory management code.
Or maybe some nonstandard extension?

If not that, then I'd have to speculate that the query you're running is
triggering some bug or otherwise pathological behavior.  Can you put
together a self-contained test case?

            regards, tom lane


Re: Understanding Postgres Memory Usage

From
Theron Luhn
Date:
Okay, I got a semi-reproducible test case:

The one caveat is that the memory rise only happens when using a HashAggregate query plan (included in the gist), which I can't find a way to get Postgres to reliably use.

If you need it, I could probably find another test case.



— Theron

On Thu, Aug 25, 2016 at 5:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Theron Luhn <theron@luhn.com> writes:
> Okay, here's the output:
> https://gist.github.com/luhn/a39db625ba5eed90946dd4a196d12220

Hm, well the only thing there that looks even slightly out of the
ordinary is the amount of free space in TopMemoryContext itself:

TopMemoryContext: 3525712 total in 432 blocks; 3444272 free (12654 chunks); 81440 used

Normally, TopMemoryContext doesn't get to more than a few hundred K,
and in the cases I've seen where it does, it's usually been because of
leaky coding that was allocating stuff there and never cleaning it up.
But you've got no more than the typical amount of space still allocated
there, which seems to kill the "leak in TopMemoryContext" theory.
And in any case there is nowhere near 100MB accounted for by the whole
dump.

Are you using any other PLs besides plpgsql?  We've seen cases where
bloat occurred within plpython or plperl, and wasn't visible in this
dump because those languages don't use PG's memory management code.
Or maybe some nonstandard extension?

If not that, then I'd have to speculate that the query you're running is
triggering some bug or otherwise pathological behavior.  Can you put
together a self-contained test case?

                        regards, tom lane

Re: Understanding Postgres Memory Usage

From
Tom Lane
Date:
Theron Luhn <theron@luhn.com> writes:
> Okay, I got a semi-reproducible test case:
> https://gist.github.com/luhn/2b35a9b31255e3a6a2e6a06d1213dfc9

> The one caveat is that the memory rise only happens when using a
> HashAggregate query plan (included in the gist), which I can't find a way
> to get Postgres to reliably use.

OK, I can reproduce some memory bloat in 9.3, but not in 9.5 and up.
I believe this was fixed by commit b419865a8, which reduced the overhead
of running a lot of instances of array_agg() concurrently in a HashAgg
plan.  I think your options are to live with it or upgrade.  Or I guess
you could turn off enable_hashagg when using array_agg() plus GROUP BY,
though you'd want to remember to undo that whenever you do upgrade.

            regards, tom lane


Re: Understanding Postgres Memory Usage

From
Theron Luhn
Date:
I've done the upgrade to 9.5.  Memory bloat has reduced to a more manageable level.  Most workers have an overhead of <20MB, with one outlier consuming 60MB.


— Theron

On Fri, Aug 26, 2016 at 5:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Theron Luhn <theron@luhn.com> writes:
> Okay, I got a semi-reproducible test case:
> https://gist.github.com/luhn/2b35a9b31255e3a6a2e6a06d1213dfc9

> The one caveat is that the memory rise only happens when using a
> HashAggregate query plan (included in the gist), which I can't find a way
> to get Postgres to reliably use.

OK, I can reproduce some memory bloat in 9.3, but not in 9.5 and up.
I believe this was fixed by commit b419865a8, which reduced the overhead
of running a lot of instances of array_agg() concurrently in a HashAgg
plan.  I think your options are to live with it or upgrade.  Or I guess
you could turn off enable_hashagg when using array_agg() plus GROUP BY,
though you'd want to remember to undo that whenever you do upgrade.

                        regards, tom lane