Thread: Linux server connection process consumes all memory
Hi all,
This is my first post here and as I wanted to play it safe my first option was the novice list. My problem is as follows:
I have a simple select command that I need for some reason to execute it around 22million times. In fact it is a prepared command from a .net environment. I have used both ado.net postgres options (the freeware NPGSQL and the commercial from Devart) and the behaviour is the same. What happens is that once I establish the connection with the corresponding connection object, a new process is created on the server as expected. When I start iterating executing the select command as time goes by the memory footprint of the process grows bigger and bigger. Sooner or later the server either collapses or becomes unworkable as the physical memory gets exhausted and the swap is heavily used. This is just a simple select, with a data reader that I make sure to close after each iteration. I find no reason why the connection process on the server gets so out of control. Is it caching data? If yes I don't want to. Do I need to close the connection every so often to free up the memory?
Any help will be much appreciated
Kind Regards
Yiannis
On Saturday, December 03, 2011 02:45:11 PM Ioannis Anagnostopoulos wrote: > When I start iterating executing the select command as time goes by the > memory footprint of the process grows bigger and bigger. Sooner or later > the server either collapses or becomes unworkable as the physical memory It's not clear from this what process is continuing to grow. Is it the PostgreSQL server process? Or your client process?
Hi thank you for the answer, I am referring to the postgres process
on the Linux server corresponding to the connection I have established.
After doing a TOP you may see quite a few postgres processes. One of them
belongs to the connection established by the client application. It is this
process that grows consuming memory.
Kind regards
Yiannis
-----Original Message-----
From: pgsql-novice-owner@postgresql.org on behalf of Alan Hodgson
Sent: Sat 03/12/2011 23:58
To: pgsql-novice@postgresql.org
Subject: Re: [NOVICE] Linux server connection process consumes all memory
On Saturday, December 03, 2011 02:45:11 PM Ioannis Anagnostopoulos wrote:
> When I start iterating executing the select command as time goes by the
> memory footprint of the process grows bigger and bigger. Sooner or later
> the server either collapses or becomes unworkable as the physical memory
It's not clear from this what process is continuing to grow. Is it the
PostgreSQL server process? Or your client process?
--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice
"Ioannis Anagnostopoulos" <ioannis@anatec.com> writes: > This is my first post here and as I wanted to play it safe my first option was the novice list. My problem is as follows: > I have a simple select command that I need for some reason to execute it around 22million times. In fact it is a preparedcommand from a .net environment. I have used both ado.net postgres options (the freeware NPGSQL and the commercialfrom Devart) and the behaviour is the same. What happens is that once I establish the connection with the correspondingconnection object, a new process is created on the server as expected. When I start iterating executing theselect command as time goes by the memory footprint of the process grows bigger and bigger. Sooner or later the servereither collapses or becomes unworkable as the physical memory gets exhausted and the swap is heavily used. This isjust a simple select, with a data reader that I make sure to close after each iteration. I find no reason why the connectionprocess on the server gets so out of control. Is it caching data? If yes I don't want to. Do I need to close theconnection every so often to free up the memory? This problem is not resolvable with the amount of information you've provided. Your description makes it sound like you've run into a server-internal memory leak. While those are certainly possible, we don't find one very often, which is why Alan was skeptical about whether the memory growth was in the server process or not. It seems at least as likely that you've misdiagnosed what's happening. Also, even if there is a memory leak, a well-configured system will normally fail with an "out of memory" error before it gets into the kind of distress that you're describing. So at this point I'm wondering which PG version you're running, what non-default configuration settings you've selected, and how much RAM the box has got. It might also be useful to see the specific query you're running and the output that EXPLAIN gives for it. regards, tom lane
On Saturday, December 03, 2011 05:43:30 PM Ioannis Anagnostopoulos wrote: > Hi thank you for the answer, I am referring to the postgres process > on the Linux server corresponding to the connection I have established. > After doing a TOP you may see quite a few postgres processes. One of them > belongs to the connection established by the client application. It is this > process that grows consuming memory. > Tom has posted a request for some follow-up information that might help track this down. One thought, though, is it possible you are creating a new prepared statement for each query instead of re-using one? And then not relelasing them? I did a quick test and creating millions of prepared statements will chew up a lot of memory pretty quick in the backend.
Hello all, and thanks again for the effort. So here are the details:
This is a dual core machine with 8G of Ram running Ubuntu server.
It runs Postgres 9.0 with its Postgis extension. Now it is true that
the linux's OutOfMemory Daemon killer was killing the connection before as
the connection process indeed was consuming the memory. To fight against this
we follow instructions and we reduced to 25% of the machine's physical memory the
shared_buffers (i.e 2GB) while we also set the effective_cache_size to 6GB.
After this we no longer run out of memory but of course we have the issue described
before.
Some background of this database, this is a "readonly" database which contain millions of
rows regarding shipping positions. The main database that receives the data
updates/adds about 15000 rows per minute with a processing time of 1000 lines per 1.16 sec.
Now imagine that the "readonly" database we are working on has monthly data so you can get
an idea of how many rows it contains in its major tables.
I am not sure if I can but I will try to attach my server's configuration for
you to take a look. I am also attaching the vb.net code that is executed. In case
that you are wondering for the code, you will have the same issue whether or not
the command is prepared or not. In fact the only way to get this issue resolved
is to close and open the connection for every call/instance of this class. But then
of course as I have to query 22million rows my speed degrades (for example this code
returns 1 row every 0.001 sec while If I close and open connection to get rid of the
memory issue it returns 1 row every 0.01 sec.
Thank you in advance
Yiannis
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Sun 04/12/2011 04:45
To: Ioannis Anagnostopoulos
Cc: pgsql-novice@postgresql.org
Subject: Re: [NOVICE] Linux server connection process consumes all memory
"Ioannis Anagnostopoulos" <ioannis@anatec.com> writes:
> This is my first post here and as I wanted to play it safe my first option was the novice list. My problem is as follows:
> I have a simple select command that I need for some reason to execute it around 22million times. In fact it is a prepared command from a .net environment. I have used both ado.net postgres options (the freeware NPGSQL and the commercial from Devart) and the behaviour is the same. What happens is that once I establish the connection with the corresponding connection object, a new process is created on the server as expected. When I start iterating executing the select command as time goes by the memory footprint of the process grows bigger and bigger. Sooner or later the server either collapses or becomes unworkable as the physical memory gets exhausted and the swap is heavily used. This is just a simple select, with a data reader that I make sure to close after each iteration. I find no reason why the connection process on the server gets so out of control. Is it caching data? If yes I don't want to. Do I need to close the connection every so often to free up the memory?
This problem is not resolvable with the amount of information you've
provided. Your description makes it sound like you've run into a
server-internal memory leak. While those are certainly possible, we
don't find one very often, which is why Alan was skeptical about whether
the memory growth was in the server process or not. It seems at least
as likely that you've misdiagnosed what's happening. Also, even if
there is a memory leak, a well-configured system will normally fail with
an "out of memory" error before it gets into the kind of distress that
you're describing.
So at this point I'm wondering which PG version you're running, what
non-default configuration settings you've selected, and how much RAM the
box has got. It might also be useful to see the specific query you're
running and the output that EXPLAIN gives for it.
regards, tom lane
Attachment
Hello again, here is an extract of my TOP:
top - 10:38:57 up 95 days, 1:34, 1 user, load average: 0.00, 0.01, 0.05
Tasks: 87 total, 2 running, 85 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.1%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8070900k total, 2797004k used, 5273896k free, 46532k buffers
Swap: 7811068k total, 5336k used, 7805732k free, 2477508k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18059 postgres 20 0 2186m 102m 99m S 0 1.3 0:03.19 postgres
17028 postgres 20 0 2180m 61m 60m S 0 0.8 0:08.06 postgres
11631 postgres 20 0 2181m 13m 13m S 0 0.2 0:15.20 postgres
PID 18059 is the connection process. It is already running at 1.3% of memory and its
RES is already 102m. If I leave it run for the rest of the day, it will go up
to 96% of Mem, the physical memory will get down down to 50M (from 5G available at this stage)
and then it will start swapping eventually exhausting even the swap.
Kind Regards
Yiannis
-----Original Message-----
From: pgsql-novice-owner@postgresql.org on behalf of Ioannis Anagnostopoulos
Sent: Sun 04/12/2011 09:38
To: Tom Lane
Cc: pgsql-novice@postgresql.org; ahodgson@simkin.ca
Subject: Re: [NOVICE] Linux server connection process consumes all memory
Hello all, and thanks again for the effort. So here are the details:
This is a dual core machine with 8G of Ram running Ubuntu server.
It runs Postgres 9.0 with its Postgis extension. Now it is true that
the linux's OutOfMemory Daemon killer was killing the connection before as
the connection process indeed was consuming the memory. To fight against this
we follow instructions and we reduced to 25% of the machine's physical memory the
shared_buffers (i.e 2GB) while we also set the effective_cache_size to 6GB.
After this we no longer run out of memory but of course we have the issue described
before.
Some background of this database, this is a "readonly" database which contain millions of
rows regarding shipping positions. The main database that receives the data
updates/adds about 15000 rows per minute with a processing time of 1000 lines per 1.16 sec.
Now imagine that the "readonly" database we are working on has monthly data so you can get
an idea of how many rows it contains in its major tables.
I am not sure if I can but I will try to attach my server's configuration for
you to take a look. I am also attaching the vb.net code that is executed. In case
that you are wondering for the code, you will have the same issue whether or not
the command is prepared or not. In fact the only way to get this issue resolved
is to close and open the connection for every call/instance of this class. But then
of course as I have to query 22million rows my speed degrades (for example this code
returns 1 row every 0.001 sec while If I close and open connection to get rid of the
memory issue it returns 1 row every 0.01 sec.
Thank you in advance
Yiannis
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Sun 04/12/2011 04:45
To: Ioannis Anagnostopoulos
Cc: pgsql-novice@postgresql.org
Subject: Re: [NOVICE] Linux server connection process consumes all memory
"Ioannis Anagnostopoulos" <ioannis@anatec.com> writes:
> This is my first post here and as I wanted to play it safe my first option was the novice list. My problem is as follows:
> I have a simple select command that I need for some reason to execute it around 22million times. In fact it is a prepared command from a .net environment. I have used both ado.net postgres options (the freeware NPGSQL and the commercial from Devart) and the behaviour is the same. What happens is that once I establish the connection with the corresponding connection object, a new process is created on the server as expected. When I start iterating executing the select command as time goes by the memory footprint of the process grows bigger and bigger. Sooner or later the server either collapses or becomes unworkable as the physical memory gets exhausted and the swap is heavily used. This is just a simple select, with a data reader that I make sure to close after each iteration. I find no reason why the connection process on the server gets so out of control. Is it caching data? If yes I don't want to. Do I need to close the connection every so often to free up the memory?
This problem is not resolvable with the amount of information you've
provided. Your description makes it sound like you've run into a
server-internal memory leak. While those are certainly possible, we
don't find one very often, which is why Alan was skeptical about whether
the memory growth was in the server process or not. It seems at least
as likely that you've misdiagnosed what's happening. Also, even if
there is a memory leak, a well-configured system will normally fail with
an "out of memory" error before it gets into the kind of distress that
you're describing.
So at this point I'm wondering which PG version you're running, what
non-default configuration settings you've selected, and how much RAM the
box has got. It might also be useful to see the specific query you're
running and the output that EXPLAIN gives for it.
regards, tom lane
Hello Alan,
From the code I posted I am pretty sure that my prepared statement is
a const/shared thus it is the same among the instances of the class. However
as I said you may try it with a non prepared command. The result will be exactly the same.
What I tend to believe is that this is some kind of memory that is
held ON the server via the connection by the client application. For this reason when the client
exits all memory on the server returns to normal. On the JAVA world I had a similar happening when
the connection was accumulating too many "server notices" that were not consumed by the client. These noticed
had to be cleared manually in order to reduce the memory usage. To this day I have not seen something similar
in the .net world (in both flavours of ado.net driver i.e comercial and NPGSQL).
Kind Regards
Yiannis
-----Original Message-----
From: pgsql-novice-owner@postgresql.org on behalf of Alan Hodgson
Sent: Sun 04/12/2011 06:17
To: pgsql-novice@postgresql.org
Subject: Re: [NOVICE] Linux server connection process consumes all memory
On Saturday, December 03, 2011 05:43:30 PM Ioannis Anagnostopoulos wrote:
> Hi thank you for the answer, I am referring to the postgres process
> on the Linux server corresponding to the connection I have established.
> After doing a TOP you may see quite a few postgres processes. One of them
> belongs to the connection established by the client application. It is this
> process that grows consuming memory.
>
Tom has posted a request for some follow-up information that might help track
this down.
One thought, though, is it possible you are creating a new prepared statement
for each query instead of re-using one? And then not relelasing them? I did a
quick test and creating millions of prepared statements will chew up a lot of
memory pretty quick in the backend.
--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice
"Ioannis Anagnostopoulos" <ioannis@anatec.com> writes: > I am not sure if I can but I will try to attach my server's configuration for > you to take a look. I am also attaching the vb.net code that is > executed. I don't know much of anything about VB, but I am filled with suspicion that something in the VB infrastructure you're using is not doing what you think. In particular, I'm betting that repeated executions of this code are in fact generating new prepared statements without deallocating old ones. You could investigate that theory by setting log_statement = all in the server configuration and then watching the server log to see exactly what SQL commands are actually getting sent. regards, tom lane
Hi Tom,
I am pretty sure that at least for my example there is only ONE
prepared statement created once when the static variable is NULL/Nothing
during the first iteration. Thereafter the same prepared statement is
executed over and over again. As I said it may well be a bug on the ado.net
driver. In any case I will set log_statement = all and will revert with my
findings.
Kind Regards
Yiannis
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Sun 04/12/2011 16:30
To: Ioannis Anagnostopoulos
Cc: pgsql-novice@postgresql.org; ahodgson@simkin.ca
Subject: Re: [NOVICE] Linux server connection process consumes all memory
"Ioannis Anagnostopoulos" <ioannis@anatec.com> writes:
> I am not sure if I can but I will try to attach my server's configuration for
> you to take a look. I am also attaching the vb.net code that is
> executed.
I don't know much of anything about VB, but I am filled with suspicion
that something in the VB infrastructure you're using is not doing what
you think. In particular, I'm betting that repeated executions of this
code are in fact generating new prepared statements without deallocating
old ones. You could investigate that theory by setting log_statement =
all in the server configuration and then watching the server log to see
exactly what SQL commands are actually getting sent.
regards, tom lane
Below is the forum discussion I have opened with Devart's ado.net driver and the NPGSQL (opensource) driver. http://www.devart.com/forums/viewtopic.php?p=76192#76192 http://pgfoundry.org/forum/forum.php?thread_id=10481&forum_id=519 As I say there, I managed to reduce my queries to 55.000 from 22 million. The server in this scenario is not running out of memory but it is still apparent that memory consumption is high (8.9%!!). So if 20 people try to run the same query we are going to be back in square 1. For one more time let me assure that there is ONLY one prepared statement that is created at the beginning and is executed over and over again. My question to postgres people is IF there is any bug that becomes apparent as a memory leak after a lot of executions of the same statement. My next test will be to convert my .net code to PGSql and execute it on the server without involving any client. The top below show the "updated" query running. This time the load is at 8.9% right at the end of the run. Nowhere near the 96% but please keep in mind that this happened when I reduced to iterations to 55.000. top - 10:35:23 up 96 days, 1:30, 1 user, load average: 0.00, 0.01, 0.05 Tasks: 87 total, 1 running, 86 sleeping, 0 stopped, 0 zombie Cpu(s): 1.5%us, 0.0%sy, 0.0%ni, 98.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8070900k total, 8017768k used, 53132k free, 56800k buffers Swap: 7811068k total, 4336k used, 7806732k free, 7671980k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 22181 postgres 20 0 2187m 701m 697m S 6 8.9 1:29.33 postgres Kind Regards Yiannis On 04/12/2011 16:30, Tom Lane wrote: > "Ioannis Anagnostopoulos"<ioannis@anatec.com> writes: >> I am not sure if I can but I will try to attach my server's configuration for >> you to take a look. I am also attaching the vb.net code that is >> executed. > I don't know much of anything about VB, but I am filled with suspicion > that something in the VB infrastructure you're using is not doing what > you think. In particular, I'm betting that repeated executions of this > code are in fact generating new prepared statements without deallocating > old ones. You could investigate that theory by setting log_statement = > all in the server configuration and then watching the server log to see > exactly what SQL commands are actually getting sent. > > regards, tom lane >
On Monday, December 05, 2011 02:41:36 AM Ioannis Anagnostopoulos wrote: > The top below show the "updated" query running. This time the load is at > 8.9% right at the end of the run. Nowhere near the 96% but please > keep in mind that this happened when I reduced to iterations to 55.000. > Reduce it to 3 or 4. Turn on statement logging per Tom's request. Empty the logs. Restart PostgreSQL. Do one run. Let's see everything PostgreSQL is actually getting from the client. Also ... please don't top-post. > On 04/12/2011 16:30, Tom Lane wrote: > > > > I don't know much of anything about VB, but I am filled with suspicion > > that something in the VB infrastructure you're using is not doing what > > you think. In particular, I'm betting that repeated executions of this > > code are in fact generating new prepared statements without deallocating > > old ones. You could investigate that theory by setting log_statement = > > all in the server configuration and then watching the server log to see > > exactly what SQL commands are actually getting sent. > > > > regards, tom lane
Hello, I found such note in wiki of Postgresql (http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server ): Note that on Windows (and on PostgreSQL versions before 8.1), large values for shared_buffers aren't as effective, and you may find better results keeping it relatively low and using the OS cache more instead. Can anyone tell how OS cache should be increased on windows server? thx Lukas
On Mon, Dec 05, 2011 at 10:24:17PM +0200, Lukas wrote: > Hello, > > I found such note in wiki of Postgresql > (http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server ): > Note that on Windows (and on PostgreSQL versions before 8.1), large values > for shared_buffers aren't as effective, and you may find better results > keeping it relatively low and using the OS cache more instead. > > Can anyone tell how OS cache should be increased on windows server? > > > thx > Lukas > I think that like Linux it is not a tunable setting. The OS will us as much memory as it can while still meeting the memory resource needs of the applications running on the machine. The upshot is -- buy more memory. The take-away from the above doc quote is to drop the amount of shared_buffers for PostgreSQL, since you can control that. Cheers, Ken
On Mon, Dec 5, 2011 at 4:41 AM, Ioannis Anagnostopoulos <ioannis@anatec.com> wrote: > Below is the forum discussion I have opened with Devart's ado.net driver and > the NPGSQL (opensource) driver. > > http://www.devart.com/forums/viewtopic.php?p=76192#76192 > http://pgfoundry.org/forum/forum.php?thread_id=10481&forum_id=519 > > As I say there, I managed to reduce my queries to 55.000 from 22 million. > The server in this scenario is not running out of memory > but it is still apparent that memory consumption is high (8.9%!!). So if 20 > people try to run the same query we are going to be back > in square 1. For one more time let me assure that there is ONLY one prepared > statement that is created at the beginning and is executed > over and over again. My question to postgres people is IF there is any bug > that becomes apparent as a memory leak after a lot of executions > of the same statement. My next test will be to convert my .net code to > PGSql and execute it on the server without involving any client. > > The top below show the "updated" query running. This time the load is at > 8.9% right at the end of the run. Nowhere near the 96% but please > keep in mind that this happened when I reduced to iterations to 55.000. > > top - 10:35:23 up 96 days, 1:30, 1 user, load average: 0.00, 0.01, 0.05 > Tasks: 87 total, 1 running, 86 sleeping, 0 stopped, 0 zombie > Cpu(s): 1.5%us, 0.0%sy, 0.0%ni, 98.5%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Mem: 8070900k total, 8017768k used, 53132k free, 56800k buffers > Swap: 7811068k total, 4336k used, 7806732k free, 7671980k cached > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 22181 postgres 20 0 2187m 701m 697m S 6 8.9 1:29.33 postgres You are misinterpreting what top is telling you (it's a pretty common mistake). Most of the memory postgres is using as you can see is inside SHR, or is shared memory that is shared between all backend processes. What have you set shared_buffers to? My money is on you having oversubscribed your box. merlin
Hello, well, according the document it is not like that: "It's likely you will have to increase the amount of memory your operating system allows you to allocate at once..." I understand it as I need to set that value somewhere in OS..? I have problem, that our database is big, server has 4GB of ram, and about 70% of ram is free... At the same time I have some 20-30 busy connections to Postgres, and it works slow... For me seems like tunable problem..? Lukas > On Mon, Dec 05, 2011 at 10:24:17PM +0200, Lukas wrote: >> Hello, >> >> I found such note in wiki of Postgresql >> (http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server ): >> Note that on Windows (and on PostgreSQL versions before 8.1), large >> values >> for shared_buffers aren't as effective, and you may find better results >> keeping it relatively low and using the OS cache more instead. >> >> Can anyone tell how OS cache should be increased on windows server? >> >> >> thx >> Lukas >> > > I think that like Linux it is not a tunable setting. The OS will us > as much memory as it can while still meeting the memory resource needs > of the applications running on the machine. The upshot is -- buy more > memory. The take-away from the above doc quote is to drop the amount > of shared_buffers for PostgreSQL, since you can control that. > > Cheers, > Ken > > -- > Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-novice >
On Mon, Dec 5, 2011 at 4:41 AM, Ioannis Anagnostopoulos
<ioannis@anatec.com> wrote:
> Below is the forum discussion I have opened with Devart's ado.net driver and
> the NPGSQL (opensource) driver.
>
> http://www.devart.com/forums/viewtopic.php?p=76192#76192
> http://pgfoundry.org/forum/forum.php?thread_id=10481&forum_id=519
>
> As I say there, I managed to reduce my queries to 55.000 from 22 million.
> The server in this scenario is not running out of memory
> but it is still apparent that memory consumption is high (8.9%!!). So if 20
> people try to run the same query we are going to be back
> in square 1. For one more time let me assure that there is ONLY one prepared
> statement that is created at the beginning and is executed
> over and over again. My question to postgres people is IF there is any bug
> that becomes apparent as a memory leak after a lot of executions
> of the same statement. My next test will be to convert my .net code to
> PGSql and execute it on the server without involving any client.
>
> The top below show the "updated" query running. This time the load is at
> 8.9% right at the end of the run. Nowhere near the 96% but please
> keep in mind that this happened when I reduced to iterations to 55.000.
>
> top - 10:35:23 up 96 days, 1:30, 1 user, load average: 0.00, 0.01, 0.05
> Tasks: 87 total, 1 running, 86 sleeping, 0 stopped, 0 zombie
> Cpu(s): 1.5%us, 0.0%sy, 0.0%ni, 98.5%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Mem: 8070900k total, 8017768k used, 53132k free, 56800k buffers
> Swap: 7811068k total, 4336k used, 7806732k free, 7671980k cached
>
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 22181 postgres 20 0 2187m 701m 697m S 6 8.9 1:29.33 postgres
You are misinterpreting what top is telling you (it's a pretty common
mistake). Most of the memory postgres is using as you can see is
inside SHR, or is shared memory that is shared between all backend
processes.
Hello Merlin, thank you for the answer. Well the example that you quote is from
a sunshine scenario were I had reduced the iteration to something more down to earth
and the process concluded. However if you followed the other examples I listed in earlier
post you will see cases like this:
top - 11:46:05 up 85 days, 2:41, 1 user, load average: 0.67, 1.03, 1.19
Tasks: 89 total, 1 running, 88 sleeping, 0 stopped, 0 zombie
Cpu(s): 10.3%us, 0.7%sy, 0.0%ni, 69.8%id, 19.2%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8070900k total, 8017232k used, 53668k free, 1508k buffers
Swap: 7811068k total, 4283720k used, 3527348k free, 2088528k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15965 postgres 20 0 12.9g 6.4g 1.4g S 11 83.4 13:59.15 postgres
15961 postgres 20 0 4285m 1.8g 1.8g D 31 23.2 9:35.03 postgres
15827 postgres 20 0 4276m 52m 51m S 0 0.7 0:00.50 postgres
15830 postgres 20 0 4278m 25m 24m S 0 0.3 0:01.27 postgres
15959 postgres 20 0 4288m 4376 4288 S 0 0.1 0:00.25 postgres
15832 postgres 20 0 4279m 1388 888 S 0 0.0 0:00.19 postgres
This is an extract from top and below is an extract of iostat:
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 158.00 1802.00 1602.00 3604 3204
sdb 70.50 590.00 40.00 1180 80
sdc 16.00 528.00 0.00 1056 0
Please note that process 15965 is my connection process. In that example we have
VIRT = 12.9G
RES = 6.4G
SHR=1.4G
And swap of course is almost exhausted. Maybe you are right but in the light of this example
can you provide any further explanation?
Shared_buffers = 2GB and according to this http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_ServerWhat have you set shared_buffers to? My money is on you having
oversubscribed your box.
merlin
I am spot on since my machine has 8G of RAM.
Thank you
Yiannis
On Tue, Dec 6, 2011 at 3:14 AM, Ioannis Anagnostopoulos <ioannis@anatec.com> wrote: > On 05/12/2011 21:13, Merlin Moncure wrote: > > On Mon, Dec 5, 2011 at 4:41 AM, Ioannis Anagnostopoulos > <ioannis@anatec.com> wrote: >> Below is the forum discussion I have opened with Devart's ado.net driver >> and >> the NPGSQL (opensource) driver. >> >> http://www.devart.com/forums/viewtopic.php?p=76192#76192 >> http://pgfoundry.org/forum/forum.php?thread_id=10481&forum_id=519 >> >> As I say there, I managed to reduce my queries to 55.000 from 22 million. >> The server in this scenario is not running out of memory >> but it is still apparent that memory consumption is high (8.9%!!). So if >> 20 >> people try to run the same query we are going to be back >> in square 1. For one more time let me assure that there is ONLY one >> prepared >> statement that is created at the beginning and is executed >> over and over again. My question to postgres people is IF there is any bug >> that becomes apparent as a memory leak after a lot of executions >> of the same statement. My next test will be to convert my .net code to >> PGSql and execute it on the server without involving any client. >> >> The top below show the "updated" query running. This time the load is at >> 8.9% right at the end of the run. Nowhere near the 96% but please >> keep in mind that this happened when I reduced to iterations to 55.000. >> >> top - 10:35:23 up 96 days, 1:30, 1 user, load average: 0.00, 0.01, 0.05 >> Tasks: 87 total, 1 running, 86 sleeping, 0 stopped, 0 zombie >> Cpu(s): 1.5%us, 0.0%sy, 0.0%ni, 98.5%id, 0.0%wa, 0.0%hi, 0.0%si, >> 0.0%st >> Mem: 8070900k total, 8017768k used, 53132k free, 56800k buffers >> Swap: 7811068k total, 4336k used, 7806732k free, 7671980k cached >> >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 22181 postgres 20 0 2187m 701m 697m S 6 8.9 1:29.33 postgres > > You are misinterpreting what top is telling you (it's a pretty common > mistake). Most of the memory postgres is using as you can see is > inside SHR, or is shared memory that is shared between all backend > processes. > > Hello Merlin, thank you for the answer. Well the example that you quote is > from > a sunshine scenario were I had reduced the iteration to something more down > to earth > and the process concluded. However if you followed the other examples I > listed in earlier > post you will see cases like this: > > top - 11:46:05 up 85 days, 2:41, 1 user, load average: 0.67, 1.03, 1.19 > Tasks: 89 total, 1 running, 88 sleeping, 0 stopped, 0 zombie > Cpu(s): 10.3%us, 0.7%sy, 0.0%ni, 69.8%id, 19.2%wa, 0.0%hi, 0.0%si, 0.0%st > Mem: 8070900k total, 8017232k used, 53668k free, 1508k buffers > Swap: 7811068k total, 4283720k used, 3527348k free, 2088528k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 15965 postgres 20 0 12.9g 6.4g 1.4g S 11 83.4 13:59.15 postgres > 15961 postgres 20 0 4285m 1.8g 1.8g D 31 23.2 9:35.03 postgres > 15827 postgres 20 0 4276m 52m 51m S 0 0.7 0:00.50 postgres > 15830 postgres 20 0 4278m 25m 24m S 0 0.3 0:01.27 postgres > 15959 postgres 20 0 4288m 4376 4288 S 0 0.1 0:00.25 postgres > 15832 postgres 20 0 4279m 1388 888 S 0 0.0 0:00.19 postgres > > This is an extract from top and below is an extract of iostat: > > Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn > sda 158.00 1802.00 1602.00 3604 3204 > sdb 70.50 590.00 40.00 1180 80 > sdc 16.00 528.00 0.00 1056 0 > > Please note that process 15965 is my connection process. In that example we > have > VIRT = 12.9G > RES = 6.4G > SHR=1.4G Interesting. Well, in your 'sunshine scenario', postgres non shared memory usage is basically zero -- growth of memory consumption in SHR does not in any suggest runaway growth -- so memory consumption is not in fact 8.9% but is < 1% and is basically a completely normal memory profile. This also tends to rule out incremental memory consumption issues that are a general class of problem when you are dealing with huge numbers of system objects, like prepared statements, tables, schemas, etc (but not records). This means that something changed in your bigger job that caused memory consumption to go completely out of whack....5 gigs of resident memory is definitely not normal. Given the above. my first suspicion is around work_mem -- what's it set to? Also, are you using any third party sever-side modules (like PostGIS?) C functions? pl/java? A more unlikely but possible scenario is an absolutely pathological query plan that has gone totally haywire. What we would need to know is exactly what is happening at the precise point int time your resident memory starts running away from SHR. Helpful information might include: *) if your program is continually processing queries or is stuck at a single query (pg_stat_activity system view is your friend here) *) if we luck out and a single query is doing it, we'd need to see the query and the plan (explain the query). *) your setting for work_mem and maintenance_work_mem *) anything else that's interesting...large numbers of tables/views/etc? *) You may want to consider changing your vm over commit settings and/or reducing swap in order to get your server to more aggressively return OOM to postgres memory allocation. The specific error returned to postgres for an OOM of course would be very helpful. *) Using the binary protocol? custom C types? A forged/invalid datum size can cause postgres to grab a lot of memory (data corruption can also do this) -- but it would take more than one consecutive one to do it. merlin
Merlin Moncure <mmoncure@gmail.com> writes: > *) You may want to consider changing your vm over commit settings > and/or reducing swap in order to get your server to more aggressively > return OOM to postgres memory allocation. The specific error returned > to postgres for an OOM of course would be very helpful. Yeah. I would try starting the postmaster under smaller ulimit settings so that the kernel gives it ENOMEM before you start getting swapped. When that happens, the backend will dump a memory map into the postmaster log that would be very useful for seeing what is actually happening here. regards, tom lane
On 06/12/2011 17:10, Tom Lane wrote: > Merlin Moncure<mmoncure@gmail.com> writes: >> *) You may want to consider changing your vm over commit settings >> and/or reducing swap in order to get your server to more aggressively >> return OOM to postgres memory allocation. The specific error returned >> to postgres for an OOM of course would be very helpful. > Yeah. I would try starting the postmaster under smaller ulimit settings > so that the kernel gives it ENOMEM before you start getting swapped. > When that happens, the backend will dump a memory map into the > postmaster log that would be very useful for seeing what is actually > happening here. > > regards, tom lane > Hello all, I think I have solved the problem. Many thanks for the support and the time you spend. The solution/bug/problem is as follows: 1. There was one connection that as I described was used IN A LOOP 22million times. This connection was assigned a PID x (on the linux server) 2. Nested within this LOOP there was another connection that had been forgotten from past code and the linux server was assigning to it a PID y 3. PID y was of course called also 22million times (since it was in the loop). However it had a nasty bug and it was creating constantly prepared commands! (opps my mistake). So PID y was creating 22million prepared commands! 4. As I had no clue that that there was at all PID y, monitoring the TOP on the server I was presented with the misbehaving PID y but I was of the impression that it was PID x. In fact PID x was below in the list happy doing its own job. So the healthy PID X had a top signature as follows (please note the difference between RES and SHR as well as the magnitude in Mb as Merlin suggested): PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 30475 postgres 20 0 2187m 746m 741m S 31 9.5 0:41.48 postgres While the unhealthy PID Y had a TOP signature (please note that RES memory is at 12.9g! and SHR 1.4g as well as the magnitude in Gb!): PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15965 postgres 20 0 12.9g 6.4g 1.4g S 11 83.4 13:59.15 postgres As I said I had no clue about the existence of PID Y and since it was coming top at the TOP list I had wrongfully assumed that it was the PID X. It gets more complicated by the fact that the test code I sent you, which should have been working fine as it had no nested buggy loop, was mainly running from home over the DSL line thus I never let it conclude its 22million iterations (it would have been still running!) instead I was monitoring the TOP and since the memory was going UP I was wrongfully assuming that I had the same issue (if I had let it run for 2 -3 hours I would have noticed what Merlin suggested about RES/SHR ratio). So it was a misdiagnosis after all :) I hope this explains everything. Kind Regards and sorry for the misunderstanding. Yiannis
On Dec 7, 2011 4:26 PM, "Ioannis Anagnostopoulos" <ioannis@anatec.com> wrote:
>
> On 06/12/2011 17:10, Tom Lane wrote:
>>
>> Merlin Moncure<mmoncure@gmail.com> writes:
>>>
>>> *) You may want to consider changing your vm over commit settings
>>> and/or reducing swap in order to get your server to more aggressively
>>> return OOM to postgres memory allocation. The specific error returned
>>> to postgres for an OOM of course would be very helpful.
>>
>> Yeah. I would try starting the postmaster under smaller ulimit settings
>> so that the kernel gives it ENOMEM before you start getting swapped.
>> When that happens, the backend will dump a memory map into the
>> postmaster log that would be very useful for seeing what is actually
>> happening here.
>>
>> regards, tom lane
>>
> Hello all,
>
> I think I have solved the problem. Many thanks for the support and the time you spend. The solution/bug/problem is as follows:
>
> 1. There was one connection that as I described was used IN A LOOP 22million times. This connection was assigned a PID x (on the linux server)
> 2. Nested within this LOOP there was another connection that had been forgotten from past code and the linux server was assigning to it a PID y
> 3. PID y was of course called also 22million times (since it was in the loop). However it had a nasty bug and it was creating constantly prepared commands! (opps my mistake). So PID y was creating 22million prepared commands!
> 4. As I had no clue that that there was at all PID y, monitoring the TOP on the server I was presented with the misbehaving PID y but I was of the impression that it was PID x. In fact PID x was below in the list happy doing its own job.
>
> So the healthy PID X had a top signature as follows (please note the difference between RES and SHR as well as the magnitude in Mb as Merlin suggested):
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 30475 postgres 20 0 2187m 746m 741m S 31 9.5 0:41.48 postgres
>
> While the unhealthy PID Y had a TOP signature (please note that RES memory is at 12.9g! and SHR 1.4g as well as the magnitude in Gb!):
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 15965 postgres 20 0 12.9g 6.4g 1.4g S 11 83.4 13:59.15 postgres
>
> As I said I had no clue about the existence of PID Y and since it was coming top at the TOP list I had wrongfully assumed that it was the PID X. It gets more complicated by the fact that the test code I sent you, which should have been working fine as it had no nested buggy loop, was mainly running from home over the DSL line thus I never let it conclude its 22million iterations (it would have been still running!) instead I was monitoring the TOP and since the memory was going UP I was wrongfully assuming that I had the same issue (if I had let it run for 2 -3 hours I would have noticed what Merlin suggested about RES/SHR ratio). So it was a misdiagnosis after all :)
>
> I hope this explains everything.
> Kind Regards and sorry for the misunderstanding.
All's well that ends well.
May I ask how you finally found the culprit? Through profiling or code review?
That said, I can't help admiring that PostgreSQL still survives 22 million prepared commands without committing suicide. Yes, it's severely impacted, but still survives.
Rgds,
On Dec 7, 2011 4:26 PM, "Ioannis Anagnostopoulos" <ioannis@anatec.com> wrote:
>
> On 06/12/2011 17:10, Tom Lane wrote:
>>
>> Merlin Moncure<mmoncure@gmail.com> writes:
>>>
>>> *) You may want to consider changing your vm over commit settings
>>> and/or reducing swap in order to get your server to more aggressively
>>> return OOM to postgres memory allocation. The specific error returned
>>> to postgres for an OOM of course would be very helpful.
>>
>> Yeah. I would try starting the postmaster under smaller ulimit settings
>> so that the kernel gives it ENOMEM before you start getting swapped.
>> When that happens, the backend will dump a memory map into the
>> postmaster log that would be very useful for seeing what is actually
>> happening here.
>>
>> regards, tom lane
>>
> Hello all,
>
> I think I have solved the problem. Many thanks for the support and the time you spend. The solution/bug/problem is as follows:
>
> 1. There was one connection that as I described was used IN A LOOP 22million times. This connection was assigned a PID x (on the linux server)
> 2. Nested within this LOOP there was another connection that had been forgotten from past code and the linux server was assigning to it a PID y
> 3. PID y was of course called also 22million times (since it was in the loop). However it had a nasty bug and it was creating constantly prepared commands! (opps my mistake). So PID y was creating 22million prepared commands!
> 4. As I had no clue that that there was at all PID y, monitoring the TOP on the server I was presented with the misbehaving PID y but I was of the impression that it was PID x. In fact PID x was below in the list happy doing its own job.
>
> So the healthy PID X had a top signature as follows (please note the difference between RES and SHR as well as the magnitude in Mb as Merlin suggested):
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 30475 postgres 20 0 2187m 746m 741m S 31 9.5 0:41.48 postgres
>
> While the unhealthy PID Y had a TOP signature (please note that RES memory is at 12.9g! and SHR 1.4g as well as the magnitude in Gb!):
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 15965 postgres 20 0 12.9g 6.4g 1.4g S 11 83.4 13:59.15 postgres
>
> As I said I had no clue about the existence of PID Y and since it was coming top at the TOP list I had wrongfully assumed that it was the PID X. It gets more complicated by the fact that the test code I sent you, which should have been working fine as it had no nested buggy loop, was mainly running from home over the DSL line thus I never let it conclude its 22million iterations (it would have been still running!) instead I was monitoring the TOP and since the memory was going UP I was wrongfully assuming that I had the same issue (if I had let it run for 2 -3 hours I would have noticed what Merlin suggested about RES/SHR ratio). So it was a misdiagnosis after all :)
>
> I hope this explains everything.
> Kind Regards and sorry for the misunderstanding.All's well that ends well.
May I ask how you finally found the culprit? Through profiling or code review?
That said, I can't help admiring that PostgreSQL still survives 22 million prepared commands without committing suicide. Yes, it's severely impacted, but still survives.
Rgds,
When I compared the "demo program" (the one I though it was not working as I was not letting it run for long enough) and the actual code base I realised that the actual code base was running much slower than the demo even though they were, to my mind doing the same things. It took me then a few "step into" to find somewhere deep the nasty .prepare() which was placed in a function that once was declared static and now it was called for every object created (22million of them). In fact it is a little bit more complicated as I was in the middle of a major re factoring reducing the 22million iteration to 55000 by grouping and creating a few prepared instead of executing 22million times a command. So what I am not sure still is what will happen if I execute 22million times non-prepared statements, will it crash or will it survive since I removed the buggy "prepared" that were nested.
Regards
On Wednesday, December 07, 2011 04:14:21 AM Ioannis Anagnostopoulos wrote: >. So what I am not sure > still is what will happen if I execute 22million times non-prepared > statements, will it crash or will it survive since I removed the buggy > "prepared" that were nested. PostgreSQL? It'll happily serve up queries for years if you aren't stacking up prepared query objects in your session :p
On Wed, 07 Dec 2011 12:14:21 +0000 Ioannis Anagnostopoulos <ioannis@anatec.com> wrote: > So what I am not sure > still is what will happen if I execute 22million times non-prepared > statements, will it crash or will it survive since I removed the buggy > "prepared" that were nested. It would take "some" time but wouldn't crash at all, but avoid to do this kinda exercise daily on a SSD, it wouldn't like. Think about Pg as the same thing as oracle, except it doesn't try to hijack your assets each time you have a question to ask about it ;) -- If you are a police dog, where's your badge? -- Question James Thurber used to drive his German Shepherd crazy.