Thread: Re: Memory Leakage Problem

Re: Memory Leakage Problem

From

Scott Marlowe

Date:

07 December 2005, 17:33:27

Please keep replies on list, this may help others in the future, and
also, don't top post (i.e. put your responses after my responses...
Thanks)

On Tue, 2005-12-06 at 20:16, Kathy Lo wrote:
> For a back-end database server running Postgresql 8.0.3, it's OK. But,
> this problem seriously affects the performance of my application
> server.
>
> I upgraded my application server from
>
>     Redhat 7.3
>     unixODBC 2.2.4
>     Postgresql 7.2.1 with ODBC driver
>
> to
>
>     Redhat 9.0
>     unixODBC 2.2.11
>     Postgresql 8.0.3
>     psqlodbc-08.01.0101
>     pg_autovacuum runs as background job
>
> Before upgrading, the application server runs perfectly. After
> upgrade, this problem appears.
>
> When the application server receives the request from a client, it
> will access the back-end database server using both simple and complex
> query. Then, it will create a database locally to store the matched
> rows for data processing. After some data processing, it will return
> the result to the requested client. If the client finishes browsing
> the result, it will drop the local database.

OK, there could be a lot of problems here.  Are you actually doing
"create database ..." for each of these things?  I'm not sure that's a
real good idea.  Even create schema, which would be better, strikes me
as not the best way to handle this.

> At the same time, this application server can serve many many clients
> so the application server has many many local databases at the same
> time.

Are you sure that you're better off with databases on your application
server?  You might be better off with either running these temp dbs on
the backend server in the same cluster, or creating a cluster just for
these jobs that is somewhat more conservative in its memory usage.  I
would lean towards doing this all on the backend server in one database
using multiple schemas.

> After running the application server for a few days, the memory of the
> application server nearly used up and start to use the swap memory
> and, as a result, the application server runs very very slow and the
> users complain.

Could you provide us with your evidence that the memory is "used up?"
What is the problem, and what you perceive as the problem, may not be
the same thing.  Is it the output of top / free, and if so, could we see
it, or whatever output is convincing you you're running out of memory?

> I tested the application server without accessing the local database
> (not store matched rows). The testing program running in the
> application server just retrieved rows from the back-end database
> server and then returned to the requested client directly. The memory
> usage of the application server becomes normally and it can run for a
> long time.

Again, what you think is normal, and what normal really are may not be
the same thing.  Evidence.  Please show us the output of top / free or
whatever that is showing this.

> I found this problem after I upgrading the application server.
>
> On 12/7/05, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> > On Tue, 2005-12-06 at 03:22, Kathy Lo wrote:
> > > Hi,
> >
> > >
> > > In this program, it will access this database server using simple and
> > > complex (joining tables) SQL Select statement and retrieve the matched
> > > rows. For each access, it will connect the database and disconnect it.
> > >
> > > I found that the memory of the databaser server nearly used up (total 2G
> > RAM).
> > >
> > > After I stop the program, the used memory did not free.
> >
> > Ummmm.  What exactly do you mean?  Can we see the output of top and / or
> > free?  I'm guessing that what Tom said is right, you're just seeing a
> > normal state of how unix does things.
> >
> > If your output of free looks like this:
> >
> > -bash-2.05b$ free
> >       total       used       free     shared    buffers     cached
> > Mem:6096912    6069588      27324          0     260728    5547264
> > -/+ buffers/cache:     261596    5835316
> > Swap:      4192880      16320    4176560
> >
> > Then that's normal.
> >
> > That's the output of free on a machine with 6 gigs that runs a reporting
> > database.  Note that while it shows almost ALL the memory as used, it is
> > being used by the kernel, which is a good thing.  Note that 5547264 or
> > about 90% of memory is being used as kernel cache.  That's a good thing.
> >
> > Note you can also get yourself in trouble with top.  It's not uncommon
> > for someone to see a bunch of postgres processes each eating up 50 or
> > more megs of ram, and panic and think that they're running out of
> > memory, when, in fact, 44 meg for each of those processes is shared, and
> > the real usage per backend is 6 megs or less.
> >
> > Definitely grab yourself a good unix / linux sysadmin guide.  The "in a
> > nutshell" books from O'Reilley (sp?) are a good starting point.
> >
>
>
> --
> Kathy Lo

Re: Memory Leakage Problem

From

Kathy Lo

Date:

07 December 2005, 22:25:06

On 12/8/05, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> Please keep replies on list, this may help others in the future, and
> also, don't top post (i.e. put your responses after my responses...
> Thanks)
>
> On Tue, 2005-12-06 at 20:16, Kathy Lo wrote:
> > For a back-end database server running Postgresql 8.0.3, it's OK. But,
> > this problem seriously affects the performance of my application
> > server.
> >
> > I upgraded my application server from
> >
> >     Redhat 7.3
> >     unixODBC 2.2.4
> >     Postgresql 7.2.1 with ODBC driver
> >
> > to
> >
> >     Redhat 9.0
> >     unixODBC 2.2.11
> >     Postgresql 8.0.3
> >     psqlodbc-08.01.0101
> >     pg_autovacuum runs as background job
> >
> > Before upgrading, the application server runs perfectly. After
> > upgrade, this problem appears.
> >
> > When the application server receives the request from a client, it
> > will access the back-end database server using both simple and complex
> > query. Then, it will create a database locally to store the matched
> > rows for data processing. After some data processing, it will return
> > the result to the requested client. If the client finishes browsing
> > the result, it will drop the local database.
>
> OK, there could be a lot of problems here.  Are you actually doing
> "create database ..." for each of these things?  I'm not sure that's a
> real good idea.  Even create schema, which would be better, strikes me
> as not the best way to handle this.
>
Actually, my program is written using C++ so I use "create database"
SQL to create database. If not the best way, please tell me another
method to create database in C++ program.

> > At the same time, this application server can serve many many clients
> > so the application server has many many local databases at the same
> > time.
>
> Are you sure that you're better off with databases on your application
> server?  You might be better off with either running these temp dbs on
> the backend server in the same cluster, or creating a cluster just for
> these jobs that is somewhat more conservative in its memory usage.  I
> would lean towards doing this all on the backend server in one database
> using multiple schemas.
>
Because the data are distributed in many back-end database servers
(physically, in different hardware machines), I need to use
Application server to temporarily store the data retrieved from
different machines and then do the data processing. And, for security
reason, all the users cannot directly access the back-end database
servers. So, I use the database in application server to keep the
result of data processing.

> > After running the application server for a few days, the memory of the
> > application server nearly used up and start to use the swap memory
> > and, as a result, the application server runs very very slow and the
> > users complain.
>
> Could you provide us with your evidence that the memory is "used up?"
> What is the problem, and what you perceive as the problem, may not be
> the same thing.  Is it the output of top / free, and if so, could we see
> it, or whatever output is convincing you you're running out of memory?
>
When the user complains the system becomes very slow, I use top to
view the memory statistics.
In top, I cannot find any processes that use so many memory. I just
found that all the memory was used up and the Swap memory nearly used
up.

I said it is the problem because, before upgrading the application
server, no memory problem even running the application server for 1
month. After upgrading the application server, this problem appears
just after running the application server for 1 week. Why having this
BIG difference between postgresql 7.2.1 on Redhat 7.3 and postgresql
8.0.3 on Redhat 9.0? I only upgrade the OS, postgresql, unixODBC and
postgresql ODBC driver. The program I written IS THE SAME.

> > I tested the application server without accessing the local database
> > (not store matched rows). The testing program running in the
> > application server just retrieved rows from the back-end database
> > server and then returned to the requested client directly. The memory
> > usage of the application server becomes normally and it can run for a
> > long time.
>
> Again, what you think is normal, and what normal really are may not be
> the same thing.  Evidence.  Please show us the output of top / free or
> whatever that is showing this.
>
After I received the user's complain, I just use top to view the
memory statistic. I forgot to save the output. But, I am running a
test to get back the problem. So, after running the test, I will give
you the output of the top/free.

> > I found this problem after I upgrading the application server.
> >
> > On 12/7/05, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> > > On Tue, 2005-12-06 at 03:22, Kathy Lo wrote:
> > > > Hi,
> > >
> > > >
> > > > In this program, it will access this database server using simple and
> > > > complex (joining tables) SQL Select statement and retrieve the matched
> > > > rows. For each access, it will connect the database and disconnect it.
> > > >
> > > > I found that the memory of the databaser server nearly used up (total
> 2G
> > > RAM).
> > > >
> > > > After I stop the program, the used memory did not free.
> > >
> > > Ummmm.  What exactly do you mean?  Can we see the output of top and / or
> > > free?  I'm guessing that what Tom said is right, you're just seeing a
> > > normal state of how unix does things.
> > >
> > > If your output of free looks like this:
> > >
> > > -bash-2.05b$ free
> > >       total       used       free     shared    buffers     cached
> > > Mem:6096912    6069588      27324          0     260728    5547264
> > > -/+ buffers/cache:     261596    5835316
> > > Swap:      4192880      16320    4176560
> > >
> > > Then that's normal.
> > >
> > > That's the output of free on a machine with 6 gigs that runs a reporting
> > > database.  Note that while it shows almost ALL the memory as used, it is
> > > being used by the kernel, which is a good thing.  Note that 5547264 or
> > > about 90% of memory is being used as kernel cache.  That's a good thing.
> > >
> > > Note you can also get yourself in trouble with top.  It's not uncommon
> > > for someone to see a bunch of postgres processes each eating up 50 or
> > > more megs of ram, and panic and think that they're running out of
> > > memory, when, in fact, 44 meg for each of those processes is shared, and
> > > the real usage per backend is 6 megs or less.
> > >
> > > Definitely grab yourself a good unix / linux sysadmin guide.  The "in a
> > > nutshell" books from O'Reilley (sp?) are a good starting point.
> > >
> >
> >
> > --
> > Kathy Lo
>


--
Kathy Lo

Re: Memory Leakage Problem

From

Mike Rylander

Date:

07 December 2005, 23:46:08

On 12/8/05, Kathy Lo <kathy.lo.ky@gmail.com> wrote:
[snip]

> When the user complains the system becomes very slow, I use top to
> view the memory statistics.
> In top, I cannot find any processes that use so many memory. I just
> found that all the memory was used up and the Swap memory nearly used
> up.

Not to add fuel to the fire, but I'm seeing something similar to this
on my 4xOpteron with 32GB of RAM running Pg 8.1RC1 on Linux (kernel
2.6.12).  I don't see this happening on a similar box with 16GB of RAM
running Pg 8.0.3.  This is a lightly used box (until it goes into
production), so it's not "out of memory", but the memory usage is
climbing without any obvious culprit.  To cut to the chase, here are
some numbers for everyone to digest:

           total gnu ps resident size
# ps ax -o rss|perl -e '$x += $_ for (<>);print "$x\n";'
5810492

           total gnu ps virual size
# ps ax -o vsz|perl -e '$x += $_ for (<>);print "$x\n";'
10585400

           total gnu ps "if all pages were dirtied and swapped" size
# ps ax -o size|perl -e '$x += $_ for (<>);print "$x\n";'
1970952

           ipcs -m
# ipcs -m

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x0052e2c1 1802240    postgres  600        176054272  26

(that's the entire ipcs -m output)

           and the odd man out, free
# free
             total       used       free     shared    buffers     cached
Mem:      32752268   22498448   10253820          0     329776    8289360
-/+ buffers/cache:   13879312   18872956
Swap:     31248712        136   31248576

I guess dstat is getting it's info from the same source as free, because:

# dstat -m 1
------memory-usage-----
_used _buff _cach _free
  13G  322M 8095M  9.8G

Now, I'm not blaming Pg for the apparent discrepancy in calculated vs.
reported-by-free memory usage, but I only noticed this after upgrading
to 8.1.  I'll collect any more info that anyone would like to see,
just let me know.

If anyone has any ideas on what is actually happening here I'd love to
hear them!

--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

Re: Memory Leakage Problem

From

Tom Lane

Date:

08 December 2005, 00:38:58

Mike Rylander <mrylander@gmail.com> writes:
> To cut to the chase, here are
> some numbers for everyone to digest:
>            total gnu ps resident size
> # ps ax -o rss|perl -e '$x += $_ for (<>);print "$x\n";'
> 5810492
>            total gnu ps virual size
> # ps ax -o vsz|perl -e '$x += $_ for (<>);print "$x\n";'
> 10585400
>            total gnu ps "if all pages were dirtied and swapped" size
> # ps ax -o size|perl -e '$x += $_ for (<>);print "$x\n";'
> 1970952

I wouldn't put any faith in those numbers at all, because you'll be
counting the PG shared memory multiple times.

On the Linux versions I've used lately, ps and top report a process'
memory size as including all its private memory, plus all the pages
of shared memory that it has touched since it started.  So if you run
say a seqscan over a large table in a freshly-started backend, the
reported memory usage will ramp up from a couple meg to the size of
your shared_buffer arena plus a couple meg --- but in reality the
space used by the process is staying constant at a couple meg.

Now, multiply that effect by N backends doing this at once, and you'll
have a very skewed view of what's happening in your system.

I'd trust the totals reported by free and dstat a lot more than summing
per-process numbers from ps or top.

> Now, I'm not blaming Pg for the apparent discrepancy in calculated vs.
> reported-by-free memory usage, but I only noticed this after upgrading
> to 8.1.

I don't know of any reason to think that 8.1 would act differently from
older PG versions in this respect.

            regards, tom lane

Re: Memory Leakage Problem

From

Mike Rylander

Date:

08 December 2005, 10:00:08

On 12/8/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Mike Rylander <mrylander@gmail.com> writes:
> > To cut to the chase, here are
> > some numbers for everyone to digest:
> >            total gnu ps resident size
> > # ps ax -o rss|perl -e '$x += $_ for (<>);print "$x\n";'
> > 5810492
> >            total gnu ps virual size
> > # ps ax -o vsz|perl -e '$x += $_ for (<>);print "$x\n";'
> > 10585400
> >            total gnu ps "if all pages were dirtied and swapped" size
> > # ps ax -o size|perl -e '$x += $_ for (<>);print "$x\n";'
> > 1970952
>
> I wouldn't put any faith in those numbers at all, because you'll be
> counting the PG shared memory multiple times.
>
> On the Linux versions I've used lately, ps and top report a process'
> memory size as including all its private memory, plus all the pages
> of shared memory that it has touched since it started.  So if you run
> say a seqscan over a large table in a freshly-started backend, the
> reported memory usage will ramp up from a couple meg to the size of
> your shared_buffer arena plus a couple meg --- but in reality the
> space used by the process is staying constant at a couple meg.

Right, I can definitely see that happening.  Some backends are upwards
of 200M, some are just a few since they haven't been touched yet.

>
> Now, multiply that effect by N backends doing this at once, and you'll
> have a very skewed view of what's happening in your system.

Absolutely ...
>
> I'd trust the totals reported by free and dstat a lot more than summing
> per-process numbers from ps or top.
>

And there's the part that's confusing me:  the numbers for used memory
produced by free and dstat, after subtracting the buffers/cache
amounts, are /larger/ than those that ps and top report. (top says the
same thing as ps, on the whole.)


> > Now, I'm not blaming Pg for the apparent discrepancy in calculated vs.
> > reported-by-free memory usage, but I only noticed this after upgrading
> > to 8.1.
>
> I don't know of any reason to think that 8.1 would act differently from
> older PG versions in this respect.
>

Neither can I, which is why I don't blame it. ;)  I'm just reporting
when/where I noticed the issue.

>                         regards, tom lane
>


--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

Re: Memory Leakage Problem

From

Will Glynn

Date:

12 December 2005, 16:19:28

Mike Rylander wrote:

>Right, I can definitely see that happening.  Some backends are upwards
>of 200M, some are just a few since they haven't been touched yet.
>
>
>>Now, multiply that effect by N backends doing this at once, and you'll
>>have a very skewed view of what's happening in your system.
>>
>
>Absolutely ...
>
>>I'd trust the totals reported by free and dstat a lot more than summing
>>per-process numbers from ps or top.
>>
>
>And there's the part that's confusing me:  the numbers for used memory
>produced by free and dstat, after subtracting the buffers/cache
>amounts, are /larger/ than those that ps and top report. (top says the
>same thing as ps, on the whole.)
>

I'm seeing the same thing on one of our 8.1 servers. Summing RSS from
`ps` or RES from `top` accounts for about 1 GB, but `free` says:

             total       used       free     shared    buffers     cached
Mem:       4060968    3870328     190640          0      14788     432048
-/+ buffers/cache:    3423492     637476
Swap:      2097144     175680    1921464

That's 3.4 GB/170 MB in RAM/swap, up from 2.7 GB/0 last Thursday, 2.2
GB/0 last Monday, or 1.9 GB after a reboot ten days ago. Stopping
Postgres brings down the number, but not all the way -- it drops to
about 2.7 GB, even though the next most memory-intensive process is
`ntpd` at 5 MB. (Before Postgres starts, there's less than 30 MB of
stuff running.) The only way I've found to get this box back to normal
is to reboot it.

>>>Now, I'm not blaming Pg for the apparent discrepancy in calculated vs.
>>>reported-by-free memory usage, but I only noticed this after upgrading
>>>to 8.1.
>>>
>>I don't know of any reason to think that 8.1 would act differently from
>>older PG versions in this respect.
>>
>
>Neither can I, which is why I don't blame it. ;)  I'm just reporting
>when/where I noticed the issue.
>
I can't offer any explanation for why this server is starting to swap --
where'd the memory go? -- but I know it started after upgrading to
PostgreSQL 8.1. I'm not saying it's something in the PostgreSQL code,
but this server definitely didn't do this in the months under 7.4.

Mike: is your system AMD64, by any chance? The above system is, as is
another similar story I heard.

--Will Glynn
Freedom Healthcare

Re: Memory Leakage Problem

From

Mike Rylander

Date:

12 December 2005, 16:48:50

On 12/12/05, Will Glynn <wglynn@freedomhealthcare.org> wrote:
> Mike Rylander wrote:
>
> >Right, I can definitely see that happening.  Some backends are upwards
> >of 200M, some are just a few since they haven't been touched yet.
> >
> >
> >>Now, multiply that effect by N backends doing this at once, and you'll
> >>have a very skewed view of what's happening in your system.
> >>
> >
> >Absolutely ...
> >
> >>I'd trust the totals reported by free and dstat a lot more than summing
> >>per-process numbers from ps or top.
> >>
> >
> >And there's the part that's confusing me:  the numbers for used memory
> >produced by free and dstat, after subtracting the buffers/cache
> >amounts, are /larger/ than those that ps and top report. (top says the
> >same thing as ps, on the whole.)
> >
>
> I'm seeing the same thing on one of our 8.1 servers. Summing RSS from
> `ps` or RES from `top` accounts for about 1 GB, but `free` says:
>
>              total       used       free     shared    buffers     cached
> Mem:       4060968    3870328     190640          0      14788     432048
> -/+ buffers/cache:    3423492     637476
> Swap:      2097144     175680    1921464
>
> That's 3.4 GB/170 MB in RAM/swap, up from 2.7 GB/0 last Thursday, 2.2
> GB/0 last Monday, or 1.9 GB after a reboot ten days ago. Stopping
> Postgres brings down the number, but not all the way -- it drops to
> about 2.7 GB, even though the next most memory-intensive process is
> `ntpd` at 5 MB. (Before Postgres starts, there's less than 30 MB of
> stuff running.) The only way I've found to get this box back to normal
> is to reboot it.
>
> >>>Now, I'm not blaming Pg for the apparent discrepancy in calculated vs.
> >>>reported-by-free memory usage, but I only noticed this after upgrading
> >>>to 8.1.
> >>>
> >>I don't know of any reason to think that 8.1 would act differently from
> >>older PG versions in this respect.
> >>
> >
> >Neither can I, which is why I don't blame it. ;)  I'm just reporting
> >when/where I noticed the issue.
> >
> I can't offer any explanation for why this server is starting to swap --
> where'd the memory go? -- but I know it started after upgrading to
> PostgreSQL 8.1. I'm not saying it's something in the PostgreSQL code,
> but this server definitely didn't do this in the months under 7.4.
>
> Mike: is your system AMD64, by any chance? The above system is, as is
> another similar story I heard.
>

It sure is.  Gentoo with kernel version 2.6.12, built for x86_64.
Looks like we have a contender for the common factor.   :)

> --Will Glynn
> Freedom Healthcare
>


--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

Re: Memory Leakage Problem

From

Tom Lane

Date:

13 December 2005, 00:17:57

Mike Rylander <mrylander@gmail.com> writes:
> On 12/12/05, Will Glynn <wglynn@freedomhealthcare.org> wrote:
>> Mike: is your system AMD64, by any chance? The above system is, as is
>> another similar story I heard.

> It sure is.  Gentoo with kernel version 2.6.12, built for x86_64.
> Looks like we have a contender for the common factor.   :)

Please tell me you're *not* running a production database on Gentoo.

            regards, tom lane

Re: Memory Leakage Problem

From

"Joshua D. Drake"

Date:

13 December 2005, 00:31:55

>
>> It sure is.  Gentoo with kernel version 2.6.12, built for x86_64.
>> Looks like we have a contender for the common factor.   :)
>>
>
> Please tell me you're *not* running a production database on Gentoo.
>
>
>             regards, tom lane
>
You don't even want to know how many companies I know that are doing
this very thing and no, it was not my suggestion.

Joshua D. Drake

> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

Re: Memory Leakage Problem

From

John Sidney-Woollett

Date:

13 December 2005, 03:36:59

We're seeing memory problems on one of our postgres databases. We're
using 7.4.6, and I suspect the kernel version is a key factor with this
problem.

One running under Redhat Linux 2.4.18-14smp #1 SMP and the other Debian
Linux 2.6.8.1-4-686-smp #1 SMP

The second Debian server is a replicated slave using Slony.

We NEVER see any problems on the "older" Redhat (our master) DB, whereas
the Debian slave database requires slony and postgres to be stopped
every 2-3 weeks.

This server just consumes more and more memory until it goes swap crazy
and the load averages start jumping through the roof.

Stopping the two services restores the server to some sort of normality
- the load averages drop dramatically and remain low. But the memory is
only fully recovered by a server reboot.

Over time memory gets used up, until you get to the point where those
services require another stop and start.

Just my 2 cents...

John

Will Glynn wrote:
> Mike Rylander wrote:
>
>> Right, I can definitely see that happening.  Some backends are upwards
>> of 200M, some are just a few since they haven't been touched yet.
>>
>>
>>> Now, multiply that effect by N backends doing this at once, and you'll
>>> have a very skewed view of what's happening in your system.
>>>
>>
>> Absolutely ...
>>
>>> I'd trust the totals reported by free and dstat a lot more than summing
>>> per-process numbers from ps or top.
>>>
>>
>> And there's the part that's confusing me:  the numbers for used memory
>> produced by free and dstat, after subtracting the buffers/cache
>> amounts, are /larger/ than those that ps and top report. (top says the
>> same thing as ps, on the whole.)
>>
>
> I'm seeing the same thing on one of our 8.1 servers. Summing RSS from
> `ps` or RES from `top` accounts for about 1 GB, but `free` says:
>
>             total       used       free     shared    buffers     cached
> Mem:       4060968    3870328     190640          0      14788     432048
> -/+ buffers/cache:    3423492     637476
> Swap:      2097144     175680    1921464
>
> That's 3.4 GB/170 MB in RAM/swap, up from 2.7 GB/0 last Thursday, 2.2
> GB/0 last Monday, or 1.9 GB after a reboot ten days ago. Stopping
> Postgres brings down the number, but not all the way -- it drops to
> about 2.7 GB, even though the next most memory-intensive process is
> `ntpd` at 5 MB. (Before Postgres starts, there's less than 30 MB of
> stuff running.) The only way I've found to get this box back to normal
> is to reboot it.
>
>>>> Now, I'm not blaming Pg for the apparent discrepancy in calculated vs.
>>>> reported-by-free memory usage, but I only noticed this after upgrading
>>>> to 8.1.
>>>>
>>> I don't know of any reason to think that 8.1 would act differently from
>>> older PG versions in this respect.
>>>
>>
>> Neither can I, which is why I don't blame it. ;)  I'm just reporting
>> when/where I noticed the issue.
>>
> I can't offer any explanation for why this server is starting to swap --
> where'd the memory go? -- but I know it started after upgrading to
> PostgreSQL 8.1. I'm not saying it's something in the PostgreSQL code,
> but this server definitely didn't do this in the months under 7.4.
>
> Mike: is your system AMD64, by any chance? The above system is, as is
> another similar story I heard.
>
> --Will Glynn
> Freedom Healthcare
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match

Re: Memory Leakage Problem

From

Tom Lane

Date:

13 December 2005, 03:47:54

John Sidney-Woollett <johnsw@wardbrook.com> writes:
> This server just consumes more and more memory until it goes swap crazy
> and the load averages start jumping through the roof.

*What* is consuming memory, exactly --- which processes?

            regards, tom lane

Re: Memory Leakage Problem

From

John Sidney-Woollett

Date:

13 December 2005, 03:59:16

Sorry but I don't know how to determine that.

We stopped and started postgres yesterday so the server is behaving well
at the moment.

top shows

top - 07:51:48 up 34 days, 6 min, 1 user, load average: 0.00, 0.02, 0.00
Tasks:  85 total,  1 running,  84 sleeping,  0 stopped,  0 zombie
Cpu(s):  0.6% us, 0.2% sy, 0.0% ni, 99.1% id, 0.2% wa, 0.0% hi, 0.0% si
Mem:   1035612k total,  1030380k used,     5232k free,    48256k buffers
Swap:   497972k total,   122388k used,   375584k free,    32716k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
27852 postgres  16   0 17020  11m  14m S  1.0  1.2  18:00.34 postmaster
27821 postgres  15   0 16236 6120  14m S  0.3  0.6   1:30.68 postmaster
  4367 root      16   0  2040 1036 1820 R  0.3  0.1   0:00.05 top
     1 root      16   0  1492  148 1340 S  0.0  0.0   0:04.75 init
     2 root      RT   0     0    0    0 S  0.0  0.0   0:02.00 migration/0
     3 root      34  19     0    0    0 S  0.0  0.0   0:00.01 ksoftirqd/0
     4 root      RT   0     0    0    0 S  0.0  0.0   0:04.78 migration/1
     5 root      34  19     0    0    0 S  0.0  0.0   0:00.04 ksoftirqd/1
     6 root      RT   0     0    0    0 S  0.0  0.0   0:04.58 migration/2
     7 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/2
     8 root      RT   0     0    0    0 S  0.0  0.0   0:21.28 migration/3
     9 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/3
    10 root       5 -10     0    0    0 S  0.0  0.0   0:00.14 events/0
    11 root       5 -10     0    0    0 S  0.0  0.0   0:00.04 events/1
    12 root       5 -10     0    0    0 S  0.0  0.0   0:00.01 events/2
    13 root       5 -10     0    0    0 S  0.0  0.0   0:00.00 events/3
    14 root       8 -10     0    0    0 S  0.0  0.0   0:00.00 khelper


This server only has postgres and slon running on it. There is also
postfix but it is only used to relay emails from the root account to
another server - it isn't really doing anything (I hope).

ps shows

UID        PID  PPID  C STIME     TIME CMD
root         1     0  0 Nov09 00:00:04 init [2]
root         2     1  0 Nov09 00:00:02 [migration/0]
root         3     1  0 Nov09 00:00:00 [ksoftirqd/0]
root         4     1  0 Nov09 00:00:04 [migration/1]
root         5     1  0 Nov09 00:00:00 [ksoftirqd/1]
root         6     1  0 Nov09 00:00:04 [migration/2]
root         7     1  0 Nov09 00:00:00 [ksoftirqd/2]
root         8     1  0 Nov09 00:00:21 [migration/3]
root         9     1  0 Nov09 00:00:00 [ksoftirqd/3]
root        10     1  0 Nov09 00:00:00 [events/0]
root        11     1  0 Nov09 00:00:00 [events/1]
root        12     1  0 Nov09 00:00:00 [events/2]
root        13     1  0 Nov09 00:00:00 [events/3]
root        14    11  0 Nov09 00:00:00 [khelper]
root        15    10  0 Nov09 00:00:00 [kacpid]
root        67    11  0 Nov09 00:17:10 [kblockd/0]
root        68    10  0 Nov09 00:00:52 [kblockd/1]
root        69    11  0 Nov09 00:00:07 [kblockd/2]
root        70    10  0 Nov09 00:00:09 [kblockd/3]
root        82     1  1 Nov09 09:08:14 [kswapd0]
root        83    11  0 Nov09 00:00:00 [aio/0]
root        84    10  0 Nov09 00:00:00 [aio/1]
root        85    11  0 Nov09 00:00:00 [aio/2]
root        86    10  0 Nov09 00:00:00 [aio/3]
root       222     1  0 Nov09 00:00:00 [kseriod]
root       245     1  0 Nov09 00:00:00 [scsi_eh_0]
root       278     1  0 Nov09 00:00:37 [kjournald]
root       359     1  0 Nov09 00:00:00 udevd
root      1226     1  0 Nov09 00:00:00 [kjournald]
root      1229    10  0 Nov09 00:00:16 [reiserfs/0]
root      1230    11  0 Nov09 00:00:08 [reiserfs/1]
root      1231    10  0 Nov09 00:00:00 [reiserfs/2]
root      1232    11  0 Nov09 00:00:00 [reiserfs/3]
root      1233     1  0 Nov09 00:00:00 [kjournald]
root      1234     1  0 Nov09 00:00:13 [kjournald]
root      1235     1  0 Nov09 00:00:24 [kjournald]
root      1583     1  0 Nov09 00:00:00 [pciehpd_event]
root      1598     1  0 Nov09 00:00:00 [shpchpd_event]
root      1669     1  0 Nov09 00:00:00 [khubd]
daemon    2461     1  0 Nov09 00:00:00 /sbin/portmap
root      2726     1  0 Nov09 00:00:10 /sbin/syslogd
root      2737     1  0 Nov09 00:00:00 /sbin/klogd
message   2768     1  0 Nov09 00:00:00 /usr/bin/dbus-daemon-1 --system
root      2802     1  0 Nov09 00:04:38 [nfsd]
root      2804     1  0 Nov09 00:03:32 [nfsd]
root      2803     1  0 Nov09 00:04:58 [nfsd]
root      2806     1  0 Nov09 00:04:40 [nfsd]
root      2807     1  0 Nov09 00:04:41 [nfsd]
root      2805     1  0 Nov09 00:03:51 [nfsd]
root      2808     1  0 Nov09 00:04:36 [nfsd]
root      2809     1  0 Nov09 00:03:20 [nfsd]
root      2811     1  0 Nov09 00:00:00 [lockd]
root      2812     1  0 Nov09 00:00:00 [rpciod]
root      2815     1  0 Nov09 00:00:00 /usr/sbin/rpc.mountd
root      2933     1  0 Nov09 00:00:17 /usr/lib/postfix/master
postfix   2938  2933  0 Nov09 00:00:11 qmgr -l -t fifo -u -c
root      2951     1  0 Nov09 00:00:09 /usr/sbin/sshd
root      2968     1  0 Nov09 00:00:00 /sbin/rpc.statd
root      2969     1  0 Nov09 00:01:41 /usr/sbin/xinetd -pidfile /var/r
root      2980     1  0 Nov09 00:00:07 /usr/sbin/ntpd -p /var/run/ntpd.
root      2991     1  0 Nov09 00:00:01 /sbin/mdadm -F -m root -s
daemon    3002     1  0 Nov09 00:00:00 /usr/sbin/atd
root      3013     1  0 Nov09 00:00:03 /usr/sbin/cron
root      3029     1  0 Nov09 00:00:00 /sbin/getty 38400 tty1
root      3031     1  0 Nov09 00:00:00 /sbin/getty 38400 tty2
root      3032     1  0 Nov09 00:00:00 /sbin/getty 38400 tty3
root      3033     1  0 Nov09 00:00:00 /sbin/getty 38400 tty4
root      3034     1  0 Nov09 00:00:00 /sbin/getty 38400 tty5
root      3035     1  0 Nov09 00:00:00 /sbin/getty 38400 tty6
postgres 27806     1  0 Dec12 00:00:00 /usr/local/pgsql/bin/postmaster
postgres 27809 27806  0 Dec12 00:00:00 postgres: stats buffer process
postgres 27810 27809  0 Dec12 00:00:00 postgres: stats collector proces
postgres 27821 27806  0 Dec12 00:01:30 postgres: postgres bp_live
postgres 27842     1  0 Dec12 00:00:00 /usr/local/pgsql/bin/slon -d 1 b
postgres 27844 27842  0 Dec12 00:00:00 /usr/local/pgsql/bin/slon -d 1 b
postgres 27847 27806  0 Dec12 00:00:50 postgres: postgres bp_live
postgres 27852 27806  1 Dec12 00:18:00 postgres: postgres bp_live
postgres 27853 27806  0 Dec12 00:00:33 postgres: postgres bp_live
postgres 27854 27806  0 Dec12 00:00:18 postgres: postgres bp_live
root     32735    10  0 05:35 00:00:00 [pdflush]
postfix   2894  2933  0 07:04 00:00:00 pickup -l -t fifo -u -c
root      3853    10  0 07:37 00:00:00 [pdflush]


All I know is that stopping postgres brings the server back to
normality. Stopping slon on its own is not enough.

John

Tom Lane wrote:
> John Sidney-Woollett <johnsw@wardbrook.com> writes:
>
>>This server just consumes more and more memory until it goes swap crazy
>>and the load averages start jumping through the roof.
>
>
> *What* is consuming memory, exactly --- which processes?
>
>             regards, tom lane

Re: Memory Leakage Problem

From

"Jim C. Nasby"

Date:

13 December 2005, 04:22:49

On Mon, Dec 12, 2005 at 08:31:52PM -0800, Joshua D. Drake wrote:
> >
> >>It sure is.  Gentoo with kernel version 2.6.12, built for x86_64.
> >>Looks like we have a contender for the common factor.   :)
> >>
> >
> >Please tell me you're *not* running a production database on Gentoo.
> >
> >
> >            regards, tom lane
> >
> You don't even want to know how many companies I know that are doing
> this very thing and no, it was not my suggestion.

"Like the annoying teenager next door with a 90hp import sporting a 6
foot tall bolt-on wing, Gentoo users are proof that society is best
served by roving gangs of armed vigilantes, dishing out swift, cold
justice with baseball bats..."
http://funroll-loops.org/

Re: Memory Leakage Problem

From

Tom Lane

Date:

13 December 2005, 11:13:13

John Sidney-Woollett <johnsw@wardbrook.com> writes:
> Tom Lane wrote:
>> *What* is consuming memory, exactly --- which processes?

> Sorry but I don't know how to determine that.

Try "ps auxw", or some other incantation if you prefer, so long as it
includes some statistics about process memory use.  What you showed us
is certainly not helpful.

            regards, tom lane

Re: Memory Leakage Problem

From

"John Sidney-Woollett"

Date:

13 December 2005, 12:40:35

Tom Lane said:
> John Sidney-Woollett <johnsw@wardbrook.com> writes:
>> Tom Lane wrote:
>>> *What* is consuming memory, exactly --- which processes?
>
>> Sorry but I don't know how to determine that.
>
> Try "ps auxw", or some other incantation if you prefer, so long as it
> includes some statistics about process memory use.  What you showed us
> is certainly not helpful.

At the moment not one process's VSZ is over 16Mb with the exception of one
of the slon processes which is at 66Mb.

I'll run this over the next few days and especially as the server starts
bogging down to see if it identifies the culprit.

Is it possible to grab memory outsize of a processes space? Or would a
leak always show up by an ever increasing VSZ amount?

Thanks

John

Re: Memory Leakage Problem

From

Scott Marlowe

Date:

13 December 2005, 13:40:15

On Tue, 2005-12-13 at 09:13, Tom Lane wrote:
> John Sidney-Woollett <johnsw@wardbrook.com> writes:
> > Tom Lane wrote:
> >> *What* is consuming memory, exactly --- which processes?
>
> > Sorry but I don't know how to determine that.
>
> Try "ps auxw", or some other incantation if you prefer, so long as it
> includes some statistics about process memory use.  What you showed us
> is certainly not helpful.

Or run top and hit M while it's running, and it'll sort according to
what uses the most memory.

Re: Memory Leakage Problem

From

Tom Lane

Date:

13 December 2005, 13:58:31

"John Sidney-Woollett" <johnsw@wardbrook.com> writes:
> Is it possible to grab memory outsize of a processes space?

Not unless there's a kernel bug involved.

            regards, tom lane

Re: Memory Leakage Problem

From

Martijn van Oosterhout

Date:

14 December 2005, 05:21:54

On Tue, Dec 13, 2005 at 04:37:42PM -0000, John Sidney-Woollett wrote:
> I'll run this over the next few days and especially as the server starts
> bogging down to see if it identifies the culprit.
>
> Is it possible to grab memory outsize of a processes space? Or would a
> leak always show up by an ever increasing VSZ amount?

The only way to know what a process can access is by looking in
/proc/<pid>/maps. This lists all the memory ranges a process can
access. The thing about postgres is that each backend dies when the
connection closes, so only a handful of processes are going to be
around long enough to cause a problem.

The ones you need to look at are the number of mappings with a
zero-inode excluding the shared memory segment. A diff between two days
might tell you which segments are growing. Must be for exactly the same
process to be meaningful.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

msg-32091-109065.dat

Re: Memory Leakage Problem

From

John Sidney-Woollett

Date:

14 December 2005, 09:46:04

Martijn

Thanks for the tip.

Since the connections on this server are from slon, I'm hoping that they
hand around for a *long* time, and long enough to take a look to see
what is going on.

John

Martijn van Oosterhout wrote:
> On Tue, Dec 13, 2005 at 04:37:42PM -0000, John Sidney-Woollett wrote:
>
>>I'll run this over the next few days and especially as the server starts
>>bogging down to see if it identifies the culprit.
>>
>>Is it possible to grab memory outsize of a processes space? Or would a
>>leak always show up by an ever increasing VSZ amount?
>
>
> The only way to know what a process can access is by looking in
> /proc/<pid>/maps. This lists all the memory ranges a process can
> access. The thing about postgres is that each backend dies when the
> connection closes, so only a handful of processes are going to be
> around long enough to cause a problem.
>
> The ones you need to look at are the number of mappings with a
> zero-inode excluding the shared memory segment. A diff between two days
> might tell you which segments are growing. Must be for exactly the same
> process to be meaningful.
>
> Have a nice day,