Thread: memory issues when running with mod_perl

memory issues when running with mod_perl

From

Jonathan Vanasco

Date:

27 September 2006, 21:03:29

Someone posted an issue to the mod-perl list a few weeks ago about
their machine losing a ton of memory under a mod-perl2/apache/
postgres system - and only being able to reclaim it from reboots

A few weeks later I ran into some memory related problems, and
noticed a similar issue.  Starting / stopping the clients had no
effect on memory  (which was expected... the new ones just pulled in
from the shared cache )

But stopping the daemon didn't affect memory either.  running ipcs, i
saw the shared memory lock freed.  but the kernel never seemed to get
it back.  and then I'd run into swap.

I felt this on freebsd 6.x + pg 8.1.x , seperate people on the list
had it under the 2.4 / 2.6 kernels  with 7.x and 8.x pgs.

someone just posted that they have the same issue on one machine
under  7.4.9, but not (yet) under 7.4.13

does anyone have some suggestions on how to test this to make sure
its a pg issue ?  it often takes a few days for pg to consume enough
memory for this behavior to set in place.

Re: memory issues when running with mod_perl

From

Martijn van Oosterhout

Date:

28 September 2006, 10:24:57

On Wed, Sep 27, 2006 at 05:03:15PM -0400, Jonathan Vanasco wrote:
>
> Someone posted an issue to the mod-perl list a few weeks ago about
> their machine losing a ton of memory under a mod-perl2/apache/
> postgres system - and only being able to reclaim it from reboots

Are you sure you're looking at the right numbers? Disk cache should be
counted as part of free memory, for example.

Could you provide some actual output of your tests, so we can see
exactly what you mean?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

signature.asc

Re: memory issues when running with mod_perl

From

Andreas Rieke

Date:

30 September 2006, 08:20:47

Martijn,

> Are you sure you're looking at the right numbers? Disk cache should be
> counted as part of free memory, for example.

I am the guy who posted the problem to mod_perl, and yes, I am quite
sure that we are talking about the right numbers. The best argument is
that the machine in fact starts swapping when memory is gone - and this
means there is neither free nor cached memory left.

> Could you provide some actual output of your tests, so we can see
> exactly what you mean?

Several examples have been discussed in the mod_perl mailing list, just
have a look at
http://mail-archives.apache.org/mod_mbox/perl-modperl/200609.mbox/browser
However, maybe my configuration is quite a good example as I am using
very current software packets.

Linux 2.6.13-15 (SuSE 10.0 i586)
Apache 2.0.59
Postgresql 8.1.4
mod_perl 2.0.2
perl 5.8.8
DBI 1.52
DBD::Pg 1.49
Embperl 2.2.0
OpenSSL 0.9.8b.

When I boot my machine with 1 G physical RAM and about 4 G of swap
memory, it uses about 300 M for the os, apache and pg, and the rest is
free. As usual, the free memory goes to cached memory after some time,
no reason to worry.
However, within two weeks, the cached memory becomes less and less, and
the machine starts to swap. When I stop apache at that time, I get some
100-150 M back, however, when I stop pg, I get nearly nothing back; on
the other hand, it does not cost my anything if I restart pg, whereas
apache of course takes back the 100-150 M. When swapping increases, the
machine gets slower and slower, until it does no longer answer anything
except ping.

I thougt about a kernel memory leak for a long time, however, in the
mod_perl mailing list we heard about someone reporting the same problem
in BSD, and for that reason, this is maybe the wrong way. Using ipcs, I
see that the postmaster uses 10 M of shared memory, why do I not see any
increase of free memory (or cache) when stopping pg?

Thanks in advance,

Andreas

P.S.: Jonathan has described nearly the same thing in the mod_perl list
on Wed, 06 Sep, 21:36.

Re: memory issues when running with mod_perl

From

Tom Lane

Date:

30 September 2006, 16:28:35

Andreas Rieke <andreas.rieke@isl.de> writes:
> I am the guy who posted the problem to mod_perl, and yes, I am quite
> sure that we are talking about the right numbers. The best argument is
> that the machine in fact starts swapping when memory is gone - and this
> means there is neither free nor cached memory left.

Andreas, what it sounds like to me is a kernel memory leak probably
triggered by Postgres' use of SysV shared memory (which is not a heavily
used kernel feature these days, so bugs in it are hardly out of the
question).

A couple of facts that might help you narrow your theories:

1. When the postmaster starts up, it allocates one, count 'em one,
shared memory segment that is never thereafter changed in size.

2. When the postmaster shuts down, it issues a shmctl(IPC_RMID)
call against that segment.  The kernel should thereupon mark the
segment for destruction, and then actually destroy it when the
last process connected to it is gone.  In a normal shutdown that
would mean immediately (because the postmaster waits for all its
child processes to die first), but in an "immediate mode" shutdown
there might still be children alive at the instant of the shmctl.

Within this context, the only way to cause a memory leak is to
"kill -9" the postmaster instead of giving it a chance to exit
gracefully.  In that case the shmctl(IPC_RMID) never happens and
the memory segment isn't reclaimed.  However, if that were your
problem then the evidence would be real clear in "ipcs -m -a"
output: lots of postgres-owned segments with zero attached processes.
(There actually is code in the postmaster to try to find and
destroy such orphaned segments during postmaster restart, but
it's not 100% guaranteed to find everything.)

If the shared segment is no longer present according to ipcs,
and there are no postgres processes still running, then it's
simply not possible for it to be postgres' fault if memory has
not been reclaimed.  So you're looking at a kernel bug.

As to the nature of the bug ... we saw something similar in older
versions of OS X:
http://archives.postgresql.org/pgsql-general/2004-08/msg00972.php
Since Darwin is BSD-derived, an ancient common bug seems possible.
(BTW, I just repeated the above experiment in OS X 10.4.8, and see
no leak, so Apple did fix it somewhere along the line.)

Anyway I'd suggest trying to duplicate the problem without apache
by firing new backends rapidly as in the above message. If you can,
file a kernel bug report.

            regards, tom lane

Re: memory issues when running with mod_perl

From

Jonathan Vanasco

Date:

30 September 2006, 23:33:16

On Sep 30, 2006, at 12:28 PM, Tom Lane wrote:

> If the shared segment is no longer present according to ipcs,
> and there are no postgres processes still running, then it's
> simply not possible for it to be postgres' fault if memory has
> not been reclaimed.  So you're looking at a kernel bug.

thats got to be it then.  i've been running ipcs *hoping* to see
something, but its all 'freed'.  the mem just disappears.

> As to the nature of the bug ... we saw something similar in older
> versions of OS X:
> http://archives.postgresql.org/pgsql-general/2004-08/msg00972.php

thanks for the link!

Re: memory issues when running with mod_perl

From

Andreas Rieke

Date:

01 October 2006, 13:57:19

Tom,

thanks for all the facts first.

Tom Lane wrote:

>If the shared segment is no longer present according to ipcs,
>and there are no postgres processes still running, then it's
>simply not possible for it to be postgres' fault if memory has
>not been reclaimed.  So you're looking at a kernel bug.
>
>
>
Before we switch this item to the linux kernel mailing list, let me add
two results and two more questions.

R1: First of all, I tried the loop from your older OS X problem:
    while true
    do
        psql -c "select count(*) from tenk1" regression
    done
Even after running the psql command for more than a million times over
quite a small table with about 10 000 entries, I can NOT see any lost
memory. Thus, we have another problem as the OS X people.

R2: After having a look at the linux kernel mailing list, it seems that
this problem is not yet known there.

So far, so good.

Q1: The first question is quite easy: Is there any way to tell pg NOT to
use shmem? Although I expect minor performance with that configuration,
I would like to try that out.

Q2: You say that pg allocates one shared memory block which is never
changed in size, and I can see with ipcs that we talk about 10 MBytes on
my machine (which uses the default configuration). Although I usually do
not kill -9 the postmaster, the maximum loss of memory seems to be 10 M
for that reason.
However, my machine looses between 500 M and 800 M in two weeks, and
within that time, I restart pg only very few times, say 3-4 times.
Does pg allocate other shmem blocks? If there is really a kernel memory
problem in shmem, how can I loose so much memory?

Thanks in advance,

Andreas

Re: memory issues when running with mod_perl

From

"Fred Tyler"

Date:

01 October 2006, 14:47:25

> However, my machine looses between 500 M and 800 M in two weeks, and
> within that time, I restart pg only very few times, say 3-4 times.
> Does pg allocate other shmem blocks? If there is really a kernel memory
> problem in shmem, how can I loose so much memory?

This is the same thing I am seeing -- 500-1GB memory lost every two
weeks -- and I don't restart pg at all. So, whatever is causing this
is not due to pg restarts. I am running the same software as everyone
else who has had this problem though: Apache, Postgres, and mod_perl
on Linux (I know there's the guy seeing it on BSD also).

As I mentioned on the mod_perl list, I'm seeing the loss on a machine
with ~350 vhosted domains all running a mod_perl CMS:

Apache 1.3.37
Postgres 7.4.9
Linux 2.6.12.6
mod_perl 1.29

However, I am not seeing any loss at all on another machine with ~100
vhosted domains running the same CMS, but with the following software:

Apache 1.3.37
Postgres 7.4.13
Linux 2.6.16.27
mod_perl 1.29

I cannot be certain that it's not just due to the ligher load (100 vs
350 vhosts), but I have not seen a single MB of lost memory on the
second machine, and am inclined to believe that the problem is fixed
with that setup (aside from the kernel and postgres, the two machines
are running the same software).

Tonight I am going to upgrade postgres on the first machine and see if
it makes any difference, but it'll be about a week before I know for
sure if memory is still being lost (it's such a slow leak that you
cannot tell with just a couple days).

Re: memory issues when running with mod_perl

From

Andreas Rieke

Date:

01 October 2006, 14:59:43

Fred,

Fred Tyler wrote:

> Tonight I am going to upgrade postgres on the first machine and see if
> it makes any difference, but it'll be about a week before I know for
> sure if memory is still being lost (it's such a slow leak that you
> cannot tell with just a couple days).

I use the latest 8.1.4 postgres software on my machine, this does not help.

You say that you are using Linux 2.6.12.6 on the machine with the
problem and Linux 2.6.16.27 on the one where you do not loose memory.
Are there any other differences (RAID, Reiser FS, DBI, DBD::Pg, Embperl,
physical memory or swap size, ...)?

Andreas

Re: memory issues when running with mod_perl

From

Tom Lane

Date:

01 October 2006, 15:57:02

Andreas Rieke <andreas.rieke@isl.de> writes:
> R1: First of all, I tried the loop from your older OS X problem:
>     while true
>     do
>         psql -c "select count(*) from tenk1" regression
>     done
> Even after running the psql command for more than a million times over
> quite a small table with about 10 000 entries, I can NOT see any lost
> memory. Thus, we have another problem as the OS X people.

OK, that kills the theory that the leak is triggered by subprocess exit.
Another thing that would be worth trying is to just stop and start the
postmaster a large number of times, to see if the leak occurs at
postmaster exit.

Also, do you have any problems with backends crashing (ie, forced
database restarts)?  That scenario should be equivalent to a postmaster
restart, but it might be worth trying a few cycles of kill -9'ing a
backend (not the postmaster) to check for a leak in that path.

> R2: After having a look at the linux kernel mailing list, it seems that
> this problem is not yet known there.

It's premature to complain to them until we have a clearly reproducible
way of causing the leak.

> Q1: The first question is quite easy: Is there any way to tell pg NOT to
> use shmem?

No.

> Does pg allocate other shmem blocks?

No.

            regards, tom lane

Re: memory issues when running with mod_perl

From

"Fred Tyler"

Date:

01 October 2006, 16:06:13

> > Tonight I am going to upgrade postgres on the first machine and see if
> > it makes any difference, but it'll be about a week before I know for
> > sure if memory is still being lost (it's such a slow leak that you
> > cannot tell with just a couple days).
>
> I use the latest 8.1.4 postgres software on my machine, this does not help.

Actually, I was going to upgrade to 7.4.13 thinking that maybe some
bugfix got into the 7.4 series that didn't make it into the 8.1
series. That said, however, I think this is a long shot and I'm not
really expecting it to help. All my research a few months ago pointed
to this being a memory leak in the 2.6.12 kernel, and I've been
patiently waiting for a good time to upgrade the kernel, cross my
fingers, and hope everything is magically fixed. But since it came up
on the mod_perl list recently and I saw that others with very similar
configurations were having the same problems, it's renewed my interest
in trying to pinpoint the problem.


> You say that you are using Linux 2.6.12.6 on the machine with the
> problem and Linux 2.6.16.27 on the one where you do not loose memory.
> Are there any other differences (RAID, Reiser FS, DBI, DBD::Pg, Embperl,
> physical memory or swap

As far as software goes, they are identical with the exception of the
kernel version and postgres version. Both run reiserfs. No RAID. The
physical memory is the same (2GB). The differences are:

Leaky machine: 256MB swap, P4 2.8GHz
Non-leaky machine: 512MB swap, Pentium D (dual core) 3.0GHz.

What is your kernel version?

Re: memory issues when running with mod_perl

From

"Fred Tyler"

Date:

01 October 2006, 16:24:14

> OK, that kills the theory that the leak is triggered by subprocess exit.
> Another thing that would be worth trying is to just stop and start the
> postmaster a large number of times, to see if the leak occurs at
> postmaster exit.

It is not from the exit. I see the exact same problem and I never
restart postgres and it never crashes. It runs constanty and with no
crashes for 20-30 days until the box is out of memory and I have to
reboot.


> > R2: After having a look at the linux kernel mailing list, it seems that
> > this problem is not yet known there.

It is possible that it has already been fixed. I am seeing this memory
leak quite clearly on 2.6.12.6, but there's no evidence of it at all
on 2.6.16.27. The changelogs between those two versions show a lot of
bugfixes for memory leaks.

Also, that message from Will Glenn references a discussion where
someone else running 2.6.12 was having a memory leak issue.

Re: memory issues when running with mod_perl

From

Tom Lane

Date:

01 October 2006, 16:44:55

"Fred Tyler" <fredty8@gmail.com> writes:
>>> R2: After having a look at the linux kernel mailing list, it seems that
>>> this problem is not yet known there.

> It is possible that it has already been fixed. I am seeing this memory
> leak quite clearly on 2.6.12.6, but there's no evidence of it at all
> on 2.6.16.27. The changelogs between those two versions show a lot of
> bugfixes for memory leaks.

We should not forget the possibility that there's more than one bug
here.  The people who were seeing leakage on BSD might be facing a
different problem than what you are dealing with on Linux.  It'd be
worth trying the psql-in-a-tight-loop example on the BSD boxes that
are showing the problem.

            regards, tom lane

Re: memory issues when running with mod_perl

From

Andreas Rieke

Date:

01 October 2006, 17:06:18

Fred,

>
> What is your kernel version?

It's 2.6.13-15. Thus, if we have a kernel bug, the newest known leaky
version is 2.6.13-15, whereas the oldest fixed version should be 2.6.16.27.

As many people run pg on older kernel versions, I would expect many
others having memory problems in that case.

However, we are going to a dead end. Although I am very sure that I can
stop the postmaster without getting the 10 M shmem back, my real problem
is worse, because I loose up to 800 M in a continuous way within 14
days. Tom Lane raised the question whether there is more than one bug.

Any ideas?

Andreas

Re: memory issues when running with mod_perl

From

Jorge Godoy

Date:

01 October 2006, 18:48:42

Andreas Rieke <andreas.rieke@isl.de> writes:

> It's 2.6.13-15. Thus, if we have a kernel bug, the newest known leaky
> version is 2.6.13-15, whereas the oldest fixed version should be 2.6.16.27.

I have a few servers with 2.6.16.21 and I don't see the problem as well.

--
Jorge Godoy      <jgodoy@gmail.com>

Re: memory issues when running with mod_perl

From

Jonathan Vanasco

Date:

03 October 2006, 16:39:17

On Oct 1, 2006, at 11:56 AM, Tom Lane wrote:

> OK, that kills the theory that the leak is triggered by subprocess
> exit.
> Another thing that would be worth trying is to just stop and start the
> postmaster a large number of times, to see if the leak occurs at
> postmaster exit.

On FreeBSD I'm not seeing any leak on subprocesses exit.   Multiple
psql clients just consume memory - often shared - then toss it back
nicely.  The consumed memory / available memory does grow - but its
all allocated for, and within expected constraints.

I believe, however, I'm seeing a leak on postmaster exit.

Can someone suggest to me a SQL query I can loop a few thousand times
to drive up shared memory use?  Basically, to test I'd like to do
something like what was suggested in the archived osx thread:

    http://archives.postgresql.org/pgsql-general/2004-08/msg00972.php
    ====
    while true
    do
        psql -c "select count(*) from tenk1" regression
    done
    ====

except instead of relying on a leak to increase memory, I'd like a
rather intensive large function with a dataset to consumer massive
amounts of ram.  I just can't think of any function to do that.

Re: memory issues when running with mod_perl

From

Tom Lane

Date:

03 October 2006, 16:43:56

Jonathan Vanasco <postgres@2xlp.com> writes:
> except instead of relying on a leak to increase memory, I'd like a
> rather intensive large function with a dataset to consumer massive
> amounts of ram.  I just can't think of any function to do that.

Sort a big chunk of data with a high work_mem setting, eg

    select random() from generate_series(1,1000000) order by 1;

or

    select count(*) from
      (select random() from generate_series(1,1000000) order by 1) ss;

The former will drive psql's memory usage up too, the latter not.

            regards, tom lane

Re: memory issues when running with mod_perl

From

Jonathan Vanasco

Date:

03 October 2006, 19:34:26

On Oct 1, 2006, at 12:24 PM, Fred Tyler wrote:

> It is not from the exit. I see the exact same problem and I never
> restart postgres and it never crashes. It runs constanty and with no
> crashes for 20-30 days until the box is out of memory and I have to
> reboot.

my theory, which i hope to prove/disprove tonight, is this is
happening to us:

    postgres is slurping all that memory because it should- there's
probably a setting on your box that is letting it consumer more
shared memory than you want.
    some issue with the kernel/postgres is not truly freeing the shared
memory when postmaster exits

in other words, i think the leak isn't in those 20-30 days-- i think
thats  likely a configuration issue.  but i think there is a leak
when you stop the daemon

Re: memory issues when running with mod_perl

From

"Fred Tyler"

Date:

10 October 2006, 18:07:09

On 10/1/06, Fred Tyler <fredty8@gmail.com> wrote:
> > However, my machine looses between 500 M and 800 M in two weeks, and
> > within that time, I restart pg only very few times, say 3-4 times.
> > Does pg allocate other shmem blocks? If there is really a kernel memory
> > problem in shmem, how can I loose so much memory?
>
> This is the same thing I am seeing -- 500-1GB memory lost every two
> weeks -- and I don't restart pg at all. So, whatever is causing this
> is not due to pg restarts. I am running the same software as everyone
> else who has had this problem though: Apache, Postgres, and mod_perl
> on Linux (I know there's the guy seeing it on BSD also).

For the record, after upgrading the leaky machine from postgres 7.4.9
to 7.4.13, the memory leak is still very much present. (My original
configuration details are quoted below.)

From a fresh reboot of the server, with no postgres restarts, crashes,
or anything abnormal, I'm seeing a memory loss of around 50-100MB per
day.

As things now stand, I have two machines with the exact same software
configuration, but one machine (the leaky one) is running kernel
2.6.12.6, and the other machine (non-leaky one) is running 2.6.16.27.

The hardware is slightly different (both servers are Intel, but one is
dual core), and the non-leaky machine has about 1/2 the load of the
leaky machine, but all evidence right now is pointing to a kernel leak
that was fixed somewhere between 2.6.12.6 and 2.6.16.12.

> As I mentioned on the mod_perl list, I'm seeing the loss on a machine
> with ~350 vhosted domains all running a mod_perl CMS:
>
> Apache 1.3.37
> Postgres 7.4.9
> Linux 2.6.12.6
> mod_perl 1.29
>
> However, I am not seeing any loss at all on another machine with ~100
> vhosted domains running the same CMS, but with the following software:
>
> Apache 1.3.37
> Postgres 7.4.13
> Linux 2.6.16.27
> mod_perl 1.29
>
> I cannot be certain that it's not just due to the ligher load (100 vs
> 350 vhosts), but I have not seen a single MB of lost memory on the
> second machine, and am inclined to believe that the problem is fixed
> with that setup (aside from the kernel and postgres, the two machines
> are running the same software).
>
> Tonight I am going to upgrade postgres on the first machine and see if
> it makes any difference, but it'll be about a week before I know for
> sure if memory is still being lost (it's such a slow leak that you
> cannot tell with just a couple days).
>