Re: Linux server connection process consumes all memory - Mailing list pgsql-novice
From | Ioannis Anagnostopoulos |
---|---|
Subject | Re: Linux server connection process consumes all memory |
Date | |
Msg-id | 4EDF30D6.20705@anatec.com Whole thread Raw |
In response to | Re: Linux server connection process consumes all memory (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Linux server connection process consumes all memory
|
List | pgsql-novice |
On 06/12/2011 17:10, Tom Lane wrote: > Merlin Moncure<mmoncure@gmail.com> writes: >> *) You may want to consider changing your vm over commit settings >> and/or reducing swap in order to get your server to more aggressively >> return OOM to postgres memory allocation. The specific error returned >> to postgres for an OOM of course would be very helpful. > Yeah. I would try starting the postmaster under smaller ulimit settings > so that the kernel gives it ENOMEM before you start getting swapped. > When that happens, the backend will dump a memory map into the > postmaster log that would be very useful for seeing what is actually > happening here. > > regards, tom lane > Hello all, I think I have solved the problem. Many thanks for the support and the time you spend. The solution/bug/problem is as follows: 1. There was one connection that as I described was used IN A LOOP 22million times. This connection was assigned a PID x (on the linux server) 2. Nested within this LOOP there was another connection that had been forgotten from past code and the linux server was assigning to it a PID y 3. PID y was of course called also 22million times (since it was in the loop). However it had a nasty bug and it was creating constantly prepared commands! (opps my mistake). So PID y was creating 22million prepared commands! 4. As I had no clue that that there was at all PID y, monitoring the TOP on the server I was presented with the misbehaving PID y but I was of the impression that it was PID x. In fact PID x was below in the list happy doing its own job. So the healthy PID X had a top signature as follows (please note the difference between RES and SHR as well as the magnitude in Mb as Merlin suggested): PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 30475 postgres 20 0 2187m 746m 741m S 31 9.5 0:41.48 postgres While the unhealthy PID Y had a TOP signature (please note that RES memory is at 12.9g! and SHR 1.4g as well as the magnitude in Gb!): PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15965 postgres 20 0 12.9g 6.4g 1.4g S 11 83.4 13:59.15 postgres As I said I had no clue about the existence of PID Y and since it was coming top at the TOP list I had wrongfully assumed that it was the PID X. It gets more complicated by the fact that the test code I sent you, which should have been working fine as it had no nested buggy loop, was mainly running from home over the DSL line thus I never let it conclude its 22million iterations (it would have been still running!) instead I was monitoring the TOP and since the memory was going UP I was wrongfully assuming that I had the same issue (if I had let it run for 2 -3 hours I would have noticed what Merlin suggested about RES/SHR ratio). So it was a misdiagnosis after all :) I hope this explains everything. Kind Regards and sorry for the misunderstanding. Yiannis
pgsql-novice by date: