Thread: Understanding Postgres Memory Usage
$ free -h # Before the query
total used free shared buffers cached
Mem: 7.8G 5.2G 2.6G 212M 90M 4.9G
-/+ buffers/cache: 248M 7.6G
Swap: 0B 0B 0B
$ free -h # After the query
total used free shared buffers cached
Mem: 7.8G 5.3G 2.5G 212M 90M 4.9G
-/+ buffers/cache: 312M 7.5G
Swap: 0B 0B 0B
[I.K >> ] Are you talking about buffers/cache increased? AFAIK this memory is used by kernel as buffer before any block device (HDD for example).
Postgres does not use this memory directly, it simply reads data from block device, and kernel caches it. Process can’t be OOMed because of it.
I am sure you should configure your Postgres to NEVER exceed available RAM. You may use tools like (http://pgtune.leopard.in.ua/) or calculate it manually.
I do not remember exact formula, but it should be something like “work_mem*max_connections + shared_buffers” and it should be around 80% of your machine RAM (minus RAM used by other processes and kernel).
It will save you from OOM.
If you face performance bottleneck after it, you fix it using tools like “log_min_duration_statement”, “track_io_timing” and system-provided tools.
Ilya Kazakevich
JetBrains
The Drive to Develop
Theron Luhn <theron@luhn.com> writes: > I have an application that uses Postgres 9.3 as the primary datastore. 9.3.which? We do fix memory leaks from time to time ... > Some of these queries use quite a bit of memory. I've observed a > "high-water mark" behavior in memory usage: running a query increases the > worker memory by many MBs (beyond shared buffers), but the memory is not > released until the connection is closed. Hm. I'm not familiar with smem, but assuming that that USS column really is process-private space, that definitely looks bad. If it's not an outright leak, it's probably consumption of cache space. We cache stuff that we've read from system catalogs, so sessions that touch lots of tables (like thousands) can grow due to that. Another possible source of large cache consumption is calling lots-and-lots of plpgsql functions. If the same query, repeated over and over, causes memory to continue to grow, I'd call it a leak (ie bug). If repeat executions consume no additional memory then it's probably intentional caching behavior. regards, tom lane
Theron Luhn <theron@luhn.com> writes:
> I have an application that uses Postgres 9.3 as the primary datastore.
9.3.which? We do fix memory leaks from time to time ...
> Some of these queries use quite a bit of memory. I've observed a
> "high-water mark" behavior in memory usage: running a query increases the
> worker memory by many MBs (beyond shared buffers), but the memory is not
> released until the connection is closed.
Hm. I'm not familiar with smem, but assuming that that USS column
really is process-private space, that definitely looks bad.
If it's not an outright leak, it's probably consumption of cache space.
We cache stuff that we've read from system catalogs, so sessions that
touch lots of tables (like thousands) can grow due to that. Another
possible source of large cache consumption is calling lots-and-lots of
plpgsql functions.
If the same query, repeated over and over, causes memory to continue
to grow, I'd call it a leak (ie bug). If repeat executions consume
no additional memory then it's probably intentional caching behavior.
regards, tom lane
Theron Luhn <theron@luhn.com> writes: >> If it's not an outright leak, it's probably consumption of cache space. >> We cache stuff that we've read from system catalogs, so sessions that >> touch lots of tables (like thousands) can grow due to that. Another >> possible source of large cache consumption is calling lots-and-lots of >> plpgsql functions. > I have a reasonable number of tables (around 50) and very few plpgsql > functions. Doesn't sound like a lot ... >> If the same query, repeated over and over, causes memory to continue >> to grow, I'd call it a leak (ie bug). If repeat executions consume >> no additional memory then it's probably intentional caching behavior. > Here's the results of that: > https://gist.github.com/luhn/e09522d524354d96d297b153d1479c13 > So kind of a combination of the two: Memory usage increases up to a > certain point but then plateaus. So... cache? It's ~100MB increase, > though, which seems an excessive amount. What could be taking up that much > cache? Hmm. I find it mighty suspicious that the USS, PSS, and RSS numbers are all increasing to pretty much the same tune, ie from very little to circa 100MB. I think there is a decent chance that smem is not doing what it says on the tin, and in fact is including shared memory consumption in "USS". In which case the apparent leak just corresponds to the process gradually touching more and more of the shared buffer arena. (If your shared_buffers settings is not somewhere near 100MB, then this theory breaks down.) It would be worth using plain old top to watch this process. We have enough experience with that to be pretty sure how to interpret its numbers: "RES minus SHR" is the value to be worried about. regards, tom lane
On Thu, Aug 25, 2016 at 1:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hmm. I find it mighty suspicious that the USS, PSS, and RSS numbers are > all increasing to pretty much the same tune, ie from very little to circa > 100MB. I think there is a decent chance that smem is not doing what it > says on the tin, and in fact is including shared memory consumption in > "USS". In which case the apparent leak just corresponds to the process > gradually touching more and more of the shared buffer arena. (If your > shared_buffers settings is not somewhere near 100MB, then this theory > breaks down.) I can't speak to every implementation of smem, but I have used it quite a bit under SLES and Ubuntu, and it always seemed to do what it says -- USS is unshared (process-local) memory and PSS is that plus the process's portion of shared memory. (The sum of differences between PSS and USS == total shared memory.) RSS has the usual meaning. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Theron Luhn <theron@luhn.com> writes:
>> If it's not an outright leak, it's probably consumption of cache space.
>> We cache stuff that we've read from system catalogs, so sessions that
>> touch lots of tables (like thousands) can grow due to that. Another
>> possible source of large cache consumption is calling lots-and-lots of
>> plpgsql functions.
> I have a reasonable number of tables (around 50) and very few plpgsql
> functions.
Doesn't sound like a lot ...
>> If the same query, repeated over and over, causes memory to continue
>> to grow, I'd call it a leak (ie bug). If repeat executions consume
>> no additional memory then it's probably intentional caching behavior.
> Here's the results of that:
> https://gist.github.com/luhn/e09522d524354d96d297b153d1479c1 3
> So kind of a combination of the two: Memory usage increases up to a
> certain point but then plateaus. So... cache? It's ~100MB increase,
> though, which seems an excessive amount. What could be taking up that much
> cache?
Hmm. I find it mighty suspicious that the USS, PSS, and RSS numbers are
all increasing to pretty much the same tune, ie from very little to circa
100MB. I think there is a decent chance that smem is not doing what it
says on the tin, and in fact is including shared memory consumption in
"USS". In which case the apparent leak just corresponds to the process
gradually touching more and more of the shared buffer arena. (If your
shared_buffers settings is not somewhere near 100MB, then this theory
breaks down.)
It would be worth using plain old top to watch this process. We have
enough experience with that to be pretty sure how to interpret its
numbers: "RES minus SHR" is the value to be worried about.
regards, tom lane
Theron Luhn <theron@luhn.com> writes: >> It would be worth using plain old top to watch this process. We have >> enough experience with that to be pretty sure how to interpret its >> numbers: "RES minus SHR" is the value to be worried about. > Sure thing. https://gist.github.com/luhn/e09522d524354d96d297b153d1479c13#file-top-txt > RES - SHR is showing a similar increase to what smem is reporting. Hm, yeah, and the VIRT column agrees --- so 100MB of non-shared memory went somewhere. Seems like a lot. If you have debug symbols installed for this build, you could try doing gdb /path/to/postgres processID gdb> call MemoryContextStats(TopMemoryContext) gdb> quit (when the process has reached an idle but bloated state) and seeing what gets printed to the process's stderr. (You need to have launched the postmaster with its stderr directed to a file, not to /dev/null.) That would provide a better clue about what's eating space. regards, tom lane
Theron Luhn <theron@luhn.com> writes:
>> It would be worth using plain old top to watch this process. We have
>> enough experience with that to be pretty sure how to interpret its
>> numbers: "RES minus SHR" is the value to be worried about.
> Sure thing. https://gist.github.com/luhn/e09522d524354d96d297b153d1479c 13#file-top-txt
> RES - SHR is showing a similar increase to what smem is reporting.
Hm, yeah, and the VIRT column agrees --- so 100MB of non-shared
memory went somewhere. Seems like a lot.
If you have debug symbols installed for this build, you could try
doing
gdb /path/to/postgres processID
gdb> call MemoryContextStats(TopMemoryContext)
gdb> quit
(when the process has reached an idle but bloated state) and seeing what
gets printed to the process's stderr. (You need to have launched the
postmaster with its stderr directed to a file, not to /dev/null.)
That would provide a better clue about what's eating space.
regards, tom lane
$ free -h # Before the query
total used free shared buffers cached
Mem: 7.8G 5.2G 2.6G 212M 90M 4.9G
-/+ buffers/cache: 248M 7.6G
Swap: 0B 0B 0B
$ free -h # After the query
total used free shared buffers cached
Mem: 7.8G 5.3G 2.5G 212M 90M 4.9G
-/+ buffers/cache: 312M 7.5G
Swap: 0B 0B 0B
[I.K >> ] Are you talking about buffers/cache increased? AFAIK this memory is used by kernel as buffer before any block device (HDD for example).
Postgres does not use this memory directly, it simply reads data from block device, and kernel caches it. Process can’t be OOMed because of it.
I am sure you should configure your Postgres to NEVER exceed available RAM. You may use tools like (http://pgtune.leopard.in.ua/) or calculate it manually.
I do not remember exact formula, but it should be something like “work_mem*max_connections + shared_buffers” and it should be around 80% of your machine RAM (minus RAM used by other processes and kernel).
It will save you from OOM.
If you face performance bottleneck after it, you fix it using tools like “log_min_duration_statement”, “track_io_timing” and system-provided tools.
Ilya Kazakevich
JetBrains
The Drive to Develop
On 8/25/2016 9:58 AM, Theron Luhn wrote: > > I do not remember exact formula, but it should be something like > “work_mem*max_connections + shared_buffers” and it should be around > 80% of your machine RAM (minus RAM used by other processes and > kernel). It will save you from OOM. > a single query can use multiple work_mem's if its got subqueries, joins, etc. > My Postgres is configured with *very* conservative values. work_mem > (4MB) * max_connections (100) + shared buffers (512MB) = ~1GB, yet > Postgres managed to fill up a 4GB server. I'm seeing workers > consuming hundreds of MBs of memory (and not releasing any of it until > the connection closes), despite work_mem being 4MB. are you doing queries that return large data sets? -- john r pierce, recycling bits in santa cruz
Theron Luhn <theron@luhn.com> writes: > Okay, here's the output: > https://gist.github.com/luhn/a39db625ba5eed90946dd4a196d12220 Hm, well the only thing there that looks even slightly out of the ordinary is the amount of free space in TopMemoryContext itself: TopMemoryContext: 3525712 total in 432 blocks; 3444272 free (12654 chunks); 81440 used Normally, TopMemoryContext doesn't get to more than a few hundred K, and in the cases I've seen where it does, it's usually been because of leaky coding that was allocating stuff there and never cleaning it up. But you've got no more than the typical amount of space still allocated there, which seems to kill the "leak in TopMemoryContext" theory. And in any case there is nowhere near 100MB accounted for by the whole dump. Are you using any other PLs besides plpgsql? We've seen cases where bloat occurred within plpython or plperl, and wasn't visible in this dump because those languages don't use PG's memory management code. Or maybe some nonstandard extension? If not that, then I'd have to speculate that the query you're running is triggering some bug or otherwise pathological behavior. Can you put together a self-contained test case? regards, tom lane
Theron Luhn <theron@luhn.com> writes:
> Okay, here's the output:
> https://gist.github.com/luhn/a39db625ba5eed90946dd4a196d122 20
Hm, well the only thing there that looks even slightly out of the
ordinary is the amount of free space in TopMemoryContext itself:
TopMemoryContext: 3525712 total in 432 blocks; 3444272 free (12654 chunks); 81440 used
Normally, TopMemoryContext doesn't get to more than a few hundred K,
and in the cases I've seen where it does, it's usually been because of
leaky coding that was allocating stuff there and never cleaning it up.
But you've got no more than the typical amount of space still allocated
there, which seems to kill the "leak in TopMemoryContext" theory.
And in any case there is nowhere near 100MB accounted for by the whole
dump.
Are you using any other PLs besides plpgsql? We've seen cases where
bloat occurred within plpython or plperl, and wasn't visible in this
dump because those languages don't use PG's memory management code.
Or maybe some nonstandard extension?
If not that, then I'd have to speculate that the query you're running is
triggering some bug or otherwise pathological behavior. Can you put
together a self-contained test case?
regards, tom lane
Theron Luhn <theron@luhn.com> writes: > Okay, I got a semi-reproducible test case: > https://gist.github.com/luhn/2b35a9b31255e3a6a2e6a06d1213dfc9 > The one caveat is that the memory rise only happens when using a > HashAggregate query plan (included in the gist), which I can't find a way > to get Postgres to reliably use. OK, I can reproduce some memory bloat in 9.3, but not in 9.5 and up. I believe this was fixed by commit b419865a8, which reduced the overhead of running a lot of instances of array_agg() concurrently in a HashAgg plan. I think your options are to live with it or upgrade. Or I guess you could turn off enable_hashagg when using array_agg() plus GROUP BY, though you'd want to remember to undo that whenever you do upgrade. regards, tom lane
Theron Luhn <theron@luhn.com> writes:
> Okay, I got a semi-reproducible test case:
> https://gist.github.com/luhn/2b35a9b31255e3a6a2e6a06d1213df c9
> The one caveat is that the memory rise only happens when using a
> HashAggregate query plan (included in the gist), which I can't find a way
> to get Postgres to reliably use.
OK, I can reproduce some memory bloat in 9.3, but not in 9.5 and up.
I believe this was fixed by commit b419865a8, which reduced the overhead
of running a lot of instances of array_agg() concurrently in a HashAgg
plan. I think your options are to live with it or upgrade. Or I guess
you could turn off enable_hashagg when using array_agg() plus GROUP BY,
though you'd want to remember to undo that whenever you do upgrade.
regards, tom lane