Thread: PostgreSQL processes use large amount of private memory on Windows
I have a product that uses PostgreSQL (v9.6) as database on Windows, and we often observe that the "shared working set" memory reported for the individual connection processes is quite high (around 200MB), but the "private working set" is relatively low (around 5 MB). I assume this is due to the processes utilizing the shared buffers of Postgres, and in these cases everything works fine as the total memory consumption on the system is quite low. But for some deployments we have observed that the "private working set" is around 200 MB as well, and in these cases the server quickly runs out of memory when the number of connections rises. I have never been able to reproduce this behavior myself.
So my question is: When does a postgres process forked for a connection use private memory instead of shared, and what can I do to avoid this?
The reference documentation (https://www.postgresql.org/docs/9.6/runtime-config-resource.html) describes the settings affecting memory usage including the "shared_buffers" setting which I can see has a clear impact on the "shared working set" utilization, but I find no mention of potential "private working set" consumers. This page also gives clues about performance tuning, but again no mention of private memory: https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
All settings typically referred to (like "work_mem" and "temp_buffers") are at default settings, and I have a very hard time seeing how those settings could add up to memory in the magnitude of 200MB. The "max_connections" settings is quite high (>1000), but there are nowhere near that many connections active in the system. The only way I have been able to trigger growth of processes in my tests is to extract large amounts of data, but when I do that, it is always the "shared working set" that grows. "private working set" remains stable at around 5 MB. --
Øystein Kolsrud schrieb am 17.09.2020 um 11:19: > All settings typically referred to (like "work_mem" and > "temp_buffers") are at default settings, and I have a very hard time > seeing how those settings could add up to memory in the magnitude of > 200MB. The "max_connections" settings is quite high (>1000), but > there are nowhere near that many connections active in the system. > The only way I have been able to trigger growth of processes in my > tests is to extract large amounts of data, but when I do that, it is > always the "shared working set" that grows. "private working set" > remains stable at around 5 MB. It's not clear to me how many open (vs. "active") connections you have. An "active" connection is one that is currently _running_ a statement, an "open" is one that is "idle", i.e. not doing anything. To my knowledge, a single backend process (=connection) will not release the memory it once allocated. The only way to free that memory is to physically close the connection. So if all your > 1000 connections are currently idle/open (but not "active"), they will still consume the memory they have allocated _when_ they where active. With that high number of connections it isn't really surprising that you run out of memory at one point or another then.
=?UTF-8?Q?=C3=98ystein_Kolsrud?= <kolsrud@gmail.com> writes: > So my question is: When does a postgres process forked for a connection use > private memory instead of shared, and what can I do to avoid this? The only significant long-term consumption of private memory is for caches. There are catalog caches, which can get large if the session accesses a whole lot of database objects (e.g., thousands of different tables). Some of the PLs maintain caches with parsed versions of any function that's been executed. (An ex-employer of mine had a lot of trouble in that regard, because they had hundreds of thousands of lines worth of plpgsql functions.) There isn't any user-accessible knob for limiting the size of those caches. If you have a problem of that sort, about the only way to mitigate it is to use fewer backends so that the total memory consumption stays in bounds, or redesign your application. In some cases it might help to restart your sessions when they get too big, but that seems like at best a band-aid. regards, tom lane
On 2020-09-17 11:19:07 +0200, Øystein Kolsrud wrote: > I have a product that uses PostgreSQL (v9.6) as database on Windows, and we > often observe that the "shared working set" memory reported for the individual > connection processes is quite high (around 200MB), but the "private working > set" is relatively low (around 5 MB). I assume this is due to the processes > utilizing the shared buffers of Postgres, and in these cases everything works > fine as the total memory consumption on the system is quite low. But for some > deployments we have observed that the "private working set" is around 200 MB as > well, and in these cases the server quickly runs out of memory when the number > of connections rises. I have never been able to reproduce this behavior myself. [...] > All settings typically referred to (like "work_mem" and "temp_buffers") are at > default settings, and I have a very hard time seeing how those settings could > add up to memory in the magnitude of 200MB. The work_mem setting is per operation (e.g. sort, hash join, etc.), so a complex query may use a multiple of it. However, the default is just 4 MB, so a query would have to be very complex to use 50 times as much. Also, that memory is freed after the query (however "freed" does not necessarily mean "returned to the OS", it can just mean "marked as availiable for reuse" - that depends on the C library and the OS). What I would do: * Set log_statement to all (warning: that can be a lot of log messages. It can also be a privacy/security hazard, depending on who has access to the server and how sensitive queries are). * Frequently (at least once per minute) record the size of all postgres processes. Send an alert if one of them is "too large". This should give you a good idea what the processes were doing at the time they allocated that memory, so that you can reproduce the problem. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On Thu, Sep 17, 2020 at 10:06:07AM -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: > =?UTF-8?Q?=C3=98ystein_Kolsrud?= <kolsrud@gmail.com> writes: > > So my question is: When does a postgres process forked for a connection use > > private memory instead of shared, and what can I do to avoid this? > > The only significant long-term consumption of private memory is for > caches. There are catalog caches, which can get large if the session > accesses a whole lot of database objects (e.g., thousands of different > tables). Some of the PLs maintain caches with parsed versions of any > function that's been executed. (An ex-employer of mine had a lot of > trouble in that regard, because they had hundreds of thousands of lines > worth of plpgsql functions.) There isn't any user-accessible knob for > limiting the size of those caches. If you have a problem of that sort, > about the only way to mitigate it is to use fewer backends so that the > total memory consumption stays in bounds, or redesign your application. > In some cases it might help to restart your sessions when they get too > big, but that seems like at best a band-aid. > > regards, tom lane For what it's worth, I have 171305 lines of plpgsql/sql functions and it hasn't caused any problem on a server with 4GB RAM. With a small number of long-lived connections. cheers, raf
What I would do:
* Set log_statement to all (warning: that can be a lot of log messages.
It can also be a privacy/security hazard, depending on who has access
to the server and how sensitive queries are).
* Frequently (at least once per minute) record the size of all postgres
processes. Send an alert if one of them is "too large".
This should give you a good idea what the processes were doing at the
time they allocated that memory, so that you can reproduce the problem.
On 2020-09-17 11:19:07 +0200, Øystein Kolsrud wrote:
> I have a product that uses PostgreSQL (v9.6) as database on Windows, and we
> often observe that the "shared working set" memory reported for the individual
> connection processes is quite high (around 200MB), but the "private working
> set" is relatively low (around 5 MB). I assume this is due to the processes
> utilizing the shared buffers of Postgres, and in these cases everything works
> fine as the total memory consumption on the system is quite low. But for some
> deployments we have observed that the "private working set" is around 200 MB as
> well, and in these cases the server quickly runs out of memory when the number
> of connections rises. I have never been able to reproduce this behavior myself.
[...]
> All settings typically referred to (like "work_mem" and "temp_buffers") are at
> default settings, and I have a very hard time seeing how those settings could
> add up to memory in the magnitude of 200MB.
The work_mem setting is per operation (e.g. sort, hash join, etc.), so a
complex query may use a multiple of it. However, the default is just 4 MB,
so a query would have to be very complex to use 50 times as much. Also,
that memory is freed after the query (however "freed" does not
necessarily mean "returned to the OS", it can just mean "marked as
availiable for reuse" - that depends on the C library and the OS).
What I would do:
* Set log_statement to all (warning: that can be a lot of log messages.
It can also be a privacy/security hazard, depending on who has access
to the server and how sensitive queries are).
* Frequently (at least once per minute) record the size of all postgres
processes. Send an alert if one of them is "too large".
This should give you a good idea what the processes were doing at the
time they allocated that memory, so that you can reproduce the problem.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
--
On 17/09/2020 15:06, Tom Lane wrote: > =?UTF-8?Q?=C3=98ystein_Kolsrud?= <kolsrud@gmail.com> writes: >> So my question is: When does a postgres process forked for a connection use >> private memory instead of shared, and what can I do to avoid this? > The only significant long-term consumption of private memory is for > caches. There are catalog caches, which can get large if the session > accesses a whole lot of database objects (e.g., thousands of different > tables). Some of the PLs maintain caches with parsed versions of any > function that's been executed. (An ex-employer of mine had a lot of > trouble in that regard, because they had hundreds of thousands of lines > worth of plpgsql functions.) There isn't any user-accessible knob for > limiting the size of those caches. If you have a problem of that sort, > about the only way to mitigate it is to use fewer backends so that the > total memory consumption stays in bounds, or redesign your application. > In some cases it might help to restart your sessions when they get too > big, but that seems like at best a band-aid. > > regards, tom lane > > Would executing DISCARD ALL release the PL cache? Regards, Chris Sterritt