Thread: Re: Memory Leakage Problem
Please keep replies on list, this may help others in the future, and also, don't top post (i.e. put your responses after my responses... Thanks) On Tue, 2005-12-06 at 20:16, Kathy Lo wrote: > For a back-end database server running Postgresql 8.0.3, it's OK. But, > this problem seriously affects the performance of my application > server. > > I upgraded my application server from > > Redhat 7.3 > unixODBC 2.2.4 > Postgresql 7.2.1 with ODBC driver > > to > > Redhat 9.0 > unixODBC 2.2.11 > Postgresql 8.0.3 > psqlodbc-08.01.0101 > pg_autovacuum runs as background job > > Before upgrading, the application server runs perfectly. After > upgrade, this problem appears. > > When the application server receives the request from a client, it > will access the back-end database server using both simple and complex > query. Then, it will create a database locally to store the matched > rows for data processing. After some data processing, it will return > the result to the requested client. If the client finishes browsing > the result, it will drop the local database. OK, there could be a lot of problems here. Are you actually doing "create database ..." for each of these things? I'm not sure that's a real good idea. Even create schema, which would be better, strikes me as not the best way to handle this. > At the same time, this application server can serve many many clients > so the application server has many many local databases at the same > time. Are you sure that you're better off with databases on your application server? You might be better off with either running these temp dbs on the backend server in the same cluster, or creating a cluster just for these jobs that is somewhat more conservative in its memory usage. I would lean towards doing this all on the backend server in one database using multiple schemas. > After running the application server for a few days, the memory of the > application server nearly used up and start to use the swap memory > and, as a result, the application server runs very very slow and the > users complain. Could you provide us with your evidence that the memory is "used up?" What is the problem, and what you perceive as the problem, may not be the same thing. Is it the output of top / free, and if so, could we see it, or whatever output is convincing you you're running out of memory? > I tested the application server without accessing the local database > (not store matched rows). The testing program running in the > application server just retrieved rows from the back-end database > server and then returned to the requested client directly. The memory > usage of the application server becomes normally and it can run for a > long time. Again, what you think is normal, and what normal really are may not be the same thing. Evidence. Please show us the output of top / free or whatever that is showing this. > I found this problem after I upgrading the application server. > > On 12/7/05, Scott Marlowe <smarlowe@g2switchworks.com> wrote: > > On Tue, 2005-12-06 at 03:22, Kathy Lo wrote: > > > Hi, > > > > > > > > In this program, it will access this database server using simple and > > > complex (joining tables) SQL Select statement and retrieve the matched > > > rows. For each access, it will connect the database and disconnect it. > > > > > > I found that the memory of the databaser server nearly used up (total 2G > > RAM). > > > > > > After I stop the program, the used memory did not free. > > > > Ummmm. What exactly do you mean? Can we see the output of top and / or > > free? I'm guessing that what Tom said is right, you're just seeing a > > normal state of how unix does things. > > > > If your output of free looks like this: > > > > -bash-2.05b$ free > > total used free shared buffers cached > > Mem:6096912 6069588 27324 0 260728 5547264 > > -/+ buffers/cache: 261596 5835316 > > Swap: 4192880 16320 4176560 > > > > Then that's normal. > > > > That's the output of free on a machine with 6 gigs that runs a reporting > > database. Note that while it shows almost ALL the memory as used, it is > > being used by the kernel, which is a good thing. Note that 5547264 or > > about 90% of memory is being used as kernel cache. That's a good thing. > > > > Note you can also get yourself in trouble with top. It's not uncommon > > for someone to see a bunch of postgres processes each eating up 50 or > > more megs of ram, and panic and think that they're running out of > > memory, when, in fact, 44 meg for each of those processes is shared, and > > the real usage per backend is 6 megs or less. > > > > Definitely grab yourself a good unix / linux sysadmin guide. The "in a > > nutshell" books from O'Reilley (sp?) are a good starting point. > > > > > -- > Kathy Lo
On 12/8/05, Scott Marlowe <smarlowe@g2switchworks.com> wrote: > Please keep replies on list, this may help others in the future, and > also, don't top post (i.e. put your responses after my responses... > Thanks) > > On Tue, 2005-12-06 at 20:16, Kathy Lo wrote: > > For a back-end database server running Postgresql 8.0.3, it's OK. But, > > this problem seriously affects the performance of my application > > server. > > > > I upgraded my application server from > > > > Redhat 7.3 > > unixODBC 2.2.4 > > Postgresql 7.2.1 with ODBC driver > > > > to > > > > Redhat 9.0 > > unixODBC 2.2.11 > > Postgresql 8.0.3 > > psqlodbc-08.01.0101 > > pg_autovacuum runs as background job > > > > Before upgrading, the application server runs perfectly. After > > upgrade, this problem appears. > > > > When the application server receives the request from a client, it > > will access the back-end database server using both simple and complex > > query. Then, it will create a database locally to store the matched > > rows for data processing. After some data processing, it will return > > the result to the requested client. If the client finishes browsing > > the result, it will drop the local database. > > OK, there could be a lot of problems here. Are you actually doing > "create database ..." for each of these things? I'm not sure that's a > real good idea. Even create schema, which would be better, strikes me > as not the best way to handle this. > Actually, my program is written using C++ so I use "create database" SQL to create database. If not the best way, please tell me another method to create database in C++ program. > > At the same time, this application server can serve many many clients > > so the application server has many many local databases at the same > > time. > > Are you sure that you're better off with databases on your application > server? You might be better off with either running these temp dbs on > the backend server in the same cluster, or creating a cluster just for > these jobs that is somewhat more conservative in its memory usage. I > would lean towards doing this all on the backend server in one database > using multiple schemas. > Because the data are distributed in many back-end database servers (physically, in different hardware machines), I need to use Application server to temporarily store the data retrieved from different machines and then do the data processing. And, for security reason, all the users cannot directly access the back-end database servers. So, I use the database in application server to keep the result of data processing. > > After running the application server for a few days, the memory of the > > application server nearly used up and start to use the swap memory > > and, as a result, the application server runs very very slow and the > > users complain. > > Could you provide us with your evidence that the memory is "used up?" > What is the problem, and what you perceive as the problem, may not be > the same thing. Is it the output of top / free, and if so, could we see > it, or whatever output is convincing you you're running out of memory? > When the user complains the system becomes very slow, I use top to view the memory statistics. In top, I cannot find any processes that use so many memory. I just found that all the memory was used up and the Swap memory nearly used up. I said it is the problem because, before upgrading the application server, no memory problem even running the application server for 1 month. After upgrading the application server, this problem appears just after running the application server for 1 week. Why having this BIG difference between postgresql 7.2.1 on Redhat 7.3 and postgresql 8.0.3 on Redhat 9.0? I only upgrade the OS, postgresql, unixODBC and postgresql ODBC driver. The program I written IS THE SAME. > > I tested the application server without accessing the local database > > (not store matched rows). The testing program running in the > > application server just retrieved rows from the back-end database > > server and then returned to the requested client directly. The memory > > usage of the application server becomes normally and it can run for a > > long time. > > Again, what you think is normal, and what normal really are may not be > the same thing. Evidence. Please show us the output of top / free or > whatever that is showing this. > After I received the user's complain, I just use top to view the memory statistic. I forgot to save the output. But, I am running a test to get back the problem. So, after running the test, I will give you the output of the top/free. > > I found this problem after I upgrading the application server. > > > > On 12/7/05, Scott Marlowe <smarlowe@g2switchworks.com> wrote: > > > On Tue, 2005-12-06 at 03:22, Kathy Lo wrote: > > > > Hi, > > > > > > > > > > > In this program, it will access this database server using simple and > > > > complex (joining tables) SQL Select statement and retrieve the matched > > > > rows. For each access, it will connect the database and disconnect it. > > > > > > > > I found that the memory of the databaser server nearly used up (total > 2G > > > RAM). > > > > > > > > After I stop the program, the used memory did not free. > > > > > > Ummmm. What exactly do you mean? Can we see the output of top and / or > > > free? I'm guessing that what Tom said is right, you're just seeing a > > > normal state of how unix does things. > > > > > > If your output of free looks like this: > > > > > > -bash-2.05b$ free > > > total used free shared buffers cached > > > Mem:6096912 6069588 27324 0 260728 5547264 > > > -/+ buffers/cache: 261596 5835316 > > > Swap: 4192880 16320 4176560 > > > > > > Then that's normal. > > > > > > That's the output of free on a machine with 6 gigs that runs a reporting > > > database. Note that while it shows almost ALL the memory as used, it is > > > being used by the kernel, which is a good thing. Note that 5547264 or > > > about 90% of memory is being used as kernel cache. That's a good thing. > > > > > > Note you can also get yourself in trouble with top. It's not uncommon > > > for someone to see a bunch of postgres processes each eating up 50 or > > > more megs of ram, and panic and think that they're running out of > > > memory, when, in fact, 44 meg for each of those processes is shared, and > > > the real usage per backend is 6 megs or less. > > > > > > Definitely grab yourself a good unix / linux sysadmin guide. The "in a > > > nutshell" books from O'Reilley (sp?) are a good starting point. > > > > > > > > > -- > > Kathy Lo > -- Kathy Lo
On 12/8/05, Kathy Lo <kathy.lo.ky@gmail.com> wrote: [snip] > When the user complains the system becomes very slow, I use top to > view the memory statistics. > In top, I cannot find any processes that use so many memory. I just > found that all the memory was used up and the Swap memory nearly used > up. Not to add fuel to the fire, but I'm seeing something similar to this on my 4xOpteron with 32GB of RAM running Pg 8.1RC1 on Linux (kernel 2.6.12). I don't see this happening on a similar box with 16GB of RAM running Pg 8.0.3. This is a lightly used box (until it goes into production), so it's not "out of memory", but the memory usage is climbing without any obvious culprit. To cut to the chase, here are some numbers for everyone to digest: total gnu ps resident size # ps ax -o rss|perl -e '$x += $_ for (<>);print "$x\n";' 5810492 total gnu ps virual size # ps ax -o vsz|perl -e '$x += $_ for (<>);print "$x\n";' 10585400 total gnu ps "if all pages were dirtied and swapped" size # ps ax -o size|perl -e '$x += $_ for (<>);print "$x\n";' 1970952 ipcs -m # ipcs -m ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x0052e2c1 1802240 postgres 600 176054272 26 (that's the entire ipcs -m output) and the odd man out, free # free total used free shared buffers cached Mem: 32752268 22498448 10253820 0 329776 8289360 -/+ buffers/cache: 13879312 18872956 Swap: 31248712 136 31248576 I guess dstat is getting it's info from the same source as free, because: # dstat -m 1 ------memory-usage----- _used _buff _cach _free 13G 322M 8095M 9.8G Now, I'm not blaming Pg for the apparent discrepancy in calculated vs. reported-by-free memory usage, but I only noticed this after upgrading to 8.1. I'll collect any more info that anyone would like to see, just let me know. If anyone has any ideas on what is actually happening here I'd love to hear them! -- Mike Rylander mrylander@gmail.com GPLS -- PINES Development Database Developer http://open-ils.org
Mike Rylander <mrylander@gmail.com> writes: > To cut to the chase, here are > some numbers for everyone to digest: > total gnu ps resident size > # ps ax -o rss|perl -e '$x += $_ for (<>);print "$x\n";' > 5810492 > total gnu ps virual size > # ps ax -o vsz|perl -e '$x += $_ for (<>);print "$x\n";' > 10585400 > total gnu ps "if all pages were dirtied and swapped" size > # ps ax -o size|perl -e '$x += $_ for (<>);print "$x\n";' > 1970952 I wouldn't put any faith in those numbers at all, because you'll be counting the PG shared memory multiple times. On the Linux versions I've used lately, ps and top report a process' memory size as including all its private memory, plus all the pages of shared memory that it has touched since it started. So if you run say a seqscan over a large table in a freshly-started backend, the reported memory usage will ramp up from a couple meg to the size of your shared_buffer arena plus a couple meg --- but in reality the space used by the process is staying constant at a couple meg. Now, multiply that effect by N backends doing this at once, and you'll have a very skewed view of what's happening in your system. I'd trust the totals reported by free and dstat a lot more than summing per-process numbers from ps or top. > Now, I'm not blaming Pg for the apparent discrepancy in calculated vs. > reported-by-free memory usage, but I only noticed this after upgrading > to 8.1. I don't know of any reason to think that 8.1 would act differently from older PG versions in this respect. regards, tom lane
On 12/8/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Mike Rylander <mrylander@gmail.com> writes: > > To cut to the chase, here are > > some numbers for everyone to digest: > > total gnu ps resident size > > # ps ax -o rss|perl -e '$x += $_ for (<>);print "$x\n";' > > 5810492 > > total gnu ps virual size > > # ps ax -o vsz|perl -e '$x += $_ for (<>);print "$x\n";' > > 10585400 > > total gnu ps "if all pages were dirtied and swapped" size > > # ps ax -o size|perl -e '$x += $_ for (<>);print "$x\n";' > > 1970952 > > I wouldn't put any faith in those numbers at all, because you'll be > counting the PG shared memory multiple times. > > On the Linux versions I've used lately, ps and top report a process' > memory size as including all its private memory, plus all the pages > of shared memory that it has touched since it started. So if you run > say a seqscan over a large table in a freshly-started backend, the > reported memory usage will ramp up from a couple meg to the size of > your shared_buffer arena plus a couple meg --- but in reality the > space used by the process is staying constant at a couple meg. Right, I can definitely see that happening. Some backends are upwards of 200M, some are just a few since they haven't been touched yet. > > Now, multiply that effect by N backends doing this at once, and you'll > have a very skewed view of what's happening in your system. Absolutely ... > > I'd trust the totals reported by free and dstat a lot more than summing > per-process numbers from ps or top. > And there's the part that's confusing me: the numbers for used memory produced by free and dstat, after subtracting the buffers/cache amounts, are /larger/ than those that ps and top report. (top says the same thing as ps, on the whole.) > > Now, I'm not blaming Pg for the apparent discrepancy in calculated vs. > > reported-by-free memory usage, but I only noticed this after upgrading > > to 8.1. > > I don't know of any reason to think that 8.1 would act differently from > older PG versions in this respect. > Neither can I, which is why I don't blame it. ;) I'm just reporting when/where I noticed the issue. > regards, tom lane > -- Mike Rylander mrylander@gmail.com GPLS -- PINES Development Database Developer http://open-ils.org
Mike Rylander wrote: >Right, I can definitely see that happening. Some backends are upwards >of 200M, some are just a few since they haven't been touched yet. > > >>Now, multiply that effect by N backends doing this at once, and you'll >>have a very skewed view of what's happening in your system. >> > >Absolutely ... > >>I'd trust the totals reported by free and dstat a lot more than summing >>per-process numbers from ps or top. >> > >And there's the part that's confusing me: the numbers for used memory >produced by free and dstat, after subtracting the buffers/cache >amounts, are /larger/ than those that ps and top report. (top says the >same thing as ps, on the whole.) > I'm seeing the same thing on one of our 8.1 servers. Summing RSS from `ps` or RES from `top` accounts for about 1 GB, but `free` says: total used free shared buffers cached Mem: 4060968 3870328 190640 0 14788 432048 -/+ buffers/cache: 3423492 637476 Swap: 2097144 175680 1921464 That's 3.4 GB/170 MB in RAM/swap, up from 2.7 GB/0 last Thursday, 2.2 GB/0 last Monday, or 1.9 GB after a reboot ten days ago. Stopping Postgres brings down the number, but not all the way -- it drops to about 2.7 GB, even though the next most memory-intensive process is `ntpd` at 5 MB. (Before Postgres starts, there's less than 30 MB of stuff running.) The only way I've found to get this box back to normal is to reboot it. >>>Now, I'm not blaming Pg for the apparent discrepancy in calculated vs. >>>reported-by-free memory usage, but I only noticed this after upgrading >>>to 8.1. >>> >>I don't know of any reason to think that 8.1 would act differently from >>older PG versions in this respect. >> > >Neither can I, which is why I don't blame it. ;) I'm just reporting >when/where I noticed the issue. > I can't offer any explanation for why this server is starting to swap -- where'd the memory go? -- but I know it started after upgrading to PostgreSQL 8.1. I'm not saying it's something in the PostgreSQL code, but this server definitely didn't do this in the months under 7.4. Mike: is your system AMD64, by any chance? The above system is, as is another similar story I heard. --Will Glynn Freedom Healthcare
On 12/12/05, Will Glynn <wglynn@freedomhealthcare.org> wrote: > Mike Rylander wrote: > > >Right, I can definitely see that happening. Some backends are upwards > >of 200M, some are just a few since they haven't been touched yet. > > > > > >>Now, multiply that effect by N backends doing this at once, and you'll > >>have a very skewed view of what's happening in your system. > >> > > > >Absolutely ... > > > >>I'd trust the totals reported by free and dstat a lot more than summing > >>per-process numbers from ps or top. > >> > > > >And there's the part that's confusing me: the numbers for used memory > >produced by free and dstat, after subtracting the buffers/cache > >amounts, are /larger/ than those that ps and top report. (top says the > >same thing as ps, on the whole.) > > > > I'm seeing the same thing on one of our 8.1 servers. Summing RSS from > `ps` or RES from `top` accounts for about 1 GB, but `free` says: > > total used free shared buffers cached > Mem: 4060968 3870328 190640 0 14788 432048 > -/+ buffers/cache: 3423492 637476 > Swap: 2097144 175680 1921464 > > That's 3.4 GB/170 MB in RAM/swap, up from 2.7 GB/0 last Thursday, 2.2 > GB/0 last Monday, or 1.9 GB after a reboot ten days ago. Stopping > Postgres brings down the number, but not all the way -- it drops to > about 2.7 GB, even though the next most memory-intensive process is > `ntpd` at 5 MB. (Before Postgres starts, there's less than 30 MB of > stuff running.) The only way I've found to get this box back to normal > is to reboot it. > > >>>Now, I'm not blaming Pg for the apparent discrepancy in calculated vs. > >>>reported-by-free memory usage, but I only noticed this after upgrading > >>>to 8.1. > >>> > >>I don't know of any reason to think that 8.1 would act differently from > >>older PG versions in this respect. > >> > > > >Neither can I, which is why I don't blame it. ;) I'm just reporting > >when/where I noticed the issue. > > > I can't offer any explanation for why this server is starting to swap -- > where'd the memory go? -- but I know it started after upgrading to > PostgreSQL 8.1. I'm not saying it's something in the PostgreSQL code, > but this server definitely didn't do this in the months under 7.4. > > Mike: is your system AMD64, by any chance? The above system is, as is > another similar story I heard. > It sure is. Gentoo with kernel version 2.6.12, built for x86_64. Looks like we have a contender for the common factor. :) > --Will Glynn > Freedom Healthcare > -- Mike Rylander mrylander@gmail.com GPLS -- PINES Development Database Developer http://open-ils.org
Mike Rylander <mrylander@gmail.com> writes: > On 12/12/05, Will Glynn <wglynn@freedomhealthcare.org> wrote: >> Mike: is your system AMD64, by any chance? The above system is, as is >> another similar story I heard. > It sure is. Gentoo with kernel version 2.6.12, built for x86_64. > Looks like we have a contender for the common factor. :) Please tell me you're *not* running a production database on Gentoo. regards, tom lane
> >> It sure is. Gentoo with kernel version 2.6.12, built for x86_64. >> Looks like we have a contender for the common factor. :) >> > > Please tell me you're *not* running a production database on Gentoo. > > > regards, tom lane > You don't even want to know how many companies I know that are doing this very thing and no, it was not my suggestion. Joshua D. Drake > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster >
We're seeing memory problems on one of our postgres databases. We're using 7.4.6, and I suspect the kernel version is a key factor with this problem. One running under Redhat Linux 2.4.18-14smp #1 SMP and the other Debian Linux 2.6.8.1-4-686-smp #1 SMP The second Debian server is a replicated slave using Slony. We NEVER see any problems on the "older" Redhat (our master) DB, whereas the Debian slave database requires slony and postgres to be stopped every 2-3 weeks. This server just consumes more and more memory until it goes swap crazy and the load averages start jumping through the roof. Stopping the two services restores the server to some sort of normality - the load averages drop dramatically and remain low. But the memory is only fully recovered by a server reboot. Over time memory gets used up, until you get to the point where those services require another stop and start. Just my 2 cents... John Will Glynn wrote: > Mike Rylander wrote: > >> Right, I can definitely see that happening. Some backends are upwards >> of 200M, some are just a few since they haven't been touched yet. >> >> >>> Now, multiply that effect by N backends doing this at once, and you'll >>> have a very skewed view of what's happening in your system. >>> >> >> Absolutely ... >> >>> I'd trust the totals reported by free and dstat a lot more than summing >>> per-process numbers from ps or top. >>> >> >> And there's the part that's confusing me: the numbers for used memory >> produced by free and dstat, after subtracting the buffers/cache >> amounts, are /larger/ than those that ps and top report. (top says the >> same thing as ps, on the whole.) >> > > I'm seeing the same thing on one of our 8.1 servers. Summing RSS from > `ps` or RES from `top` accounts for about 1 GB, but `free` says: > > total used free shared buffers cached > Mem: 4060968 3870328 190640 0 14788 432048 > -/+ buffers/cache: 3423492 637476 > Swap: 2097144 175680 1921464 > > That's 3.4 GB/170 MB in RAM/swap, up from 2.7 GB/0 last Thursday, 2.2 > GB/0 last Monday, or 1.9 GB after a reboot ten days ago. Stopping > Postgres brings down the number, but not all the way -- it drops to > about 2.7 GB, even though the next most memory-intensive process is > `ntpd` at 5 MB. (Before Postgres starts, there's less than 30 MB of > stuff running.) The only way I've found to get this box back to normal > is to reboot it. > >>>> Now, I'm not blaming Pg for the apparent discrepancy in calculated vs. >>>> reported-by-free memory usage, but I only noticed this after upgrading >>>> to 8.1. >>>> >>> I don't know of any reason to think that 8.1 would act differently from >>> older PG versions in this respect. >>> >> >> Neither can I, which is why I don't blame it. ;) I'm just reporting >> when/where I noticed the issue. >> > I can't offer any explanation for why this server is starting to swap -- > where'd the memory go? -- but I know it started after upgrading to > PostgreSQL 8.1. I'm not saying it's something in the PostgreSQL code, > but this server definitely didn't do this in the months under 7.4. > > Mike: is your system AMD64, by any chance? The above system is, as is > another similar story I heard. > > --Will Glynn > Freedom Healthcare > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match
John Sidney-Woollett <johnsw@wardbrook.com> writes: > This server just consumes more and more memory until it goes swap crazy > and the load averages start jumping through the roof. *What* is consuming memory, exactly --- which processes? regards, tom lane
Sorry but I don't know how to determine that. We stopped and started postgres yesterday so the server is behaving well at the moment. top shows top - 07:51:48 up 34 days, 6 min, 1 user, load average: 0.00, 0.02, 0.00 Tasks: 85 total, 1 running, 84 sleeping, 0 stopped, 0 zombie Cpu(s): 0.6% us, 0.2% sy, 0.0% ni, 99.1% id, 0.2% wa, 0.0% hi, 0.0% si Mem: 1035612k total, 1030380k used, 5232k free, 48256k buffers Swap: 497972k total, 122388k used, 375584k free, 32716k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 27852 postgres 16 0 17020 11m 14m S 1.0 1.2 18:00.34 postmaster 27821 postgres 15 0 16236 6120 14m S 0.3 0.6 1:30.68 postmaster 4367 root 16 0 2040 1036 1820 R 0.3 0.1 0:00.05 top 1 root 16 0 1492 148 1340 S 0.0 0.0 0:04.75 init 2 root RT 0 0 0 0 S 0.0 0.0 0:02.00 migration/0 3 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/0 4 root RT 0 0 0 0 S 0.0 0.0 0:04.78 migration/1 5 root 34 19 0 0 0 S 0.0 0.0 0:00.04 ksoftirqd/1 6 root RT 0 0 0 0 S 0.0 0.0 0:04.58 migration/2 7 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/2 8 root RT 0 0 0 0 S 0.0 0.0 0:21.28 migration/3 9 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/3 10 root 5 -10 0 0 0 S 0.0 0.0 0:00.14 events/0 11 root 5 -10 0 0 0 S 0.0 0.0 0:00.04 events/1 12 root 5 -10 0 0 0 S 0.0 0.0 0:00.01 events/2 13 root 5 -10 0 0 0 S 0.0 0.0 0:00.00 events/3 14 root 8 -10 0 0 0 S 0.0 0.0 0:00.00 khelper This server only has postgres and slon running on it. There is also postfix but it is only used to relay emails from the root account to another server - it isn't really doing anything (I hope). ps shows UID PID PPID C STIME TIME CMD root 1 0 0 Nov09 00:00:04 init [2] root 2 1 0 Nov09 00:00:02 [migration/0] root 3 1 0 Nov09 00:00:00 [ksoftirqd/0] root 4 1 0 Nov09 00:00:04 [migration/1] root 5 1 0 Nov09 00:00:00 [ksoftirqd/1] root 6 1 0 Nov09 00:00:04 [migration/2] root 7 1 0 Nov09 00:00:00 [ksoftirqd/2] root 8 1 0 Nov09 00:00:21 [migration/3] root 9 1 0 Nov09 00:00:00 [ksoftirqd/3] root 10 1 0 Nov09 00:00:00 [events/0] root 11 1 0 Nov09 00:00:00 [events/1] root 12 1 0 Nov09 00:00:00 [events/2] root 13 1 0 Nov09 00:00:00 [events/3] root 14 11 0 Nov09 00:00:00 [khelper] root 15 10 0 Nov09 00:00:00 [kacpid] root 67 11 0 Nov09 00:17:10 [kblockd/0] root 68 10 0 Nov09 00:00:52 [kblockd/1] root 69 11 0 Nov09 00:00:07 [kblockd/2] root 70 10 0 Nov09 00:00:09 [kblockd/3] root 82 1 1 Nov09 09:08:14 [kswapd0] root 83 11 0 Nov09 00:00:00 [aio/0] root 84 10 0 Nov09 00:00:00 [aio/1] root 85 11 0 Nov09 00:00:00 [aio/2] root 86 10 0 Nov09 00:00:00 [aio/3] root 222 1 0 Nov09 00:00:00 [kseriod] root 245 1 0 Nov09 00:00:00 [scsi_eh_0] root 278 1 0 Nov09 00:00:37 [kjournald] root 359 1 0 Nov09 00:00:00 udevd root 1226 1 0 Nov09 00:00:00 [kjournald] root 1229 10 0 Nov09 00:00:16 [reiserfs/0] root 1230 11 0 Nov09 00:00:08 [reiserfs/1] root 1231 10 0 Nov09 00:00:00 [reiserfs/2] root 1232 11 0 Nov09 00:00:00 [reiserfs/3] root 1233 1 0 Nov09 00:00:00 [kjournald] root 1234 1 0 Nov09 00:00:13 [kjournald] root 1235 1 0 Nov09 00:00:24 [kjournald] root 1583 1 0 Nov09 00:00:00 [pciehpd_event] root 1598 1 0 Nov09 00:00:00 [shpchpd_event] root 1669 1 0 Nov09 00:00:00 [khubd] daemon 2461 1 0 Nov09 00:00:00 /sbin/portmap root 2726 1 0 Nov09 00:00:10 /sbin/syslogd root 2737 1 0 Nov09 00:00:00 /sbin/klogd message 2768 1 0 Nov09 00:00:00 /usr/bin/dbus-daemon-1 --system root 2802 1 0 Nov09 00:04:38 [nfsd] root 2804 1 0 Nov09 00:03:32 [nfsd] root 2803 1 0 Nov09 00:04:58 [nfsd] root 2806 1 0 Nov09 00:04:40 [nfsd] root 2807 1 0 Nov09 00:04:41 [nfsd] root 2805 1 0 Nov09 00:03:51 [nfsd] root 2808 1 0 Nov09 00:04:36 [nfsd] root 2809 1 0 Nov09 00:03:20 [nfsd] root 2811 1 0 Nov09 00:00:00 [lockd] root 2812 1 0 Nov09 00:00:00 [rpciod] root 2815 1 0 Nov09 00:00:00 /usr/sbin/rpc.mountd root 2933 1 0 Nov09 00:00:17 /usr/lib/postfix/master postfix 2938 2933 0 Nov09 00:00:11 qmgr -l -t fifo -u -c root 2951 1 0 Nov09 00:00:09 /usr/sbin/sshd root 2968 1 0 Nov09 00:00:00 /sbin/rpc.statd root 2969 1 0 Nov09 00:01:41 /usr/sbin/xinetd -pidfile /var/r root 2980 1 0 Nov09 00:00:07 /usr/sbin/ntpd -p /var/run/ntpd. root 2991 1 0 Nov09 00:00:01 /sbin/mdadm -F -m root -s daemon 3002 1 0 Nov09 00:00:00 /usr/sbin/atd root 3013 1 0 Nov09 00:00:03 /usr/sbin/cron root 3029 1 0 Nov09 00:00:00 /sbin/getty 38400 tty1 root 3031 1 0 Nov09 00:00:00 /sbin/getty 38400 tty2 root 3032 1 0 Nov09 00:00:00 /sbin/getty 38400 tty3 root 3033 1 0 Nov09 00:00:00 /sbin/getty 38400 tty4 root 3034 1 0 Nov09 00:00:00 /sbin/getty 38400 tty5 root 3035 1 0 Nov09 00:00:00 /sbin/getty 38400 tty6 postgres 27806 1 0 Dec12 00:00:00 /usr/local/pgsql/bin/postmaster postgres 27809 27806 0 Dec12 00:00:00 postgres: stats buffer process postgres 27810 27809 0 Dec12 00:00:00 postgres: stats collector proces postgres 27821 27806 0 Dec12 00:01:30 postgres: postgres bp_live postgres 27842 1 0 Dec12 00:00:00 /usr/local/pgsql/bin/slon -d 1 b postgres 27844 27842 0 Dec12 00:00:00 /usr/local/pgsql/bin/slon -d 1 b postgres 27847 27806 0 Dec12 00:00:50 postgres: postgres bp_live postgres 27852 27806 1 Dec12 00:18:00 postgres: postgres bp_live postgres 27853 27806 0 Dec12 00:00:33 postgres: postgres bp_live postgres 27854 27806 0 Dec12 00:00:18 postgres: postgres bp_live root 32735 10 0 05:35 00:00:00 [pdflush] postfix 2894 2933 0 07:04 00:00:00 pickup -l -t fifo -u -c root 3853 10 0 07:37 00:00:00 [pdflush] All I know is that stopping postgres brings the server back to normality. Stopping slon on its own is not enough. John Tom Lane wrote: > John Sidney-Woollett <johnsw@wardbrook.com> writes: > >>This server just consumes more and more memory until it goes swap crazy >>and the load averages start jumping through the roof. > > > *What* is consuming memory, exactly --- which processes? > > regards, tom lane
On Mon, Dec 12, 2005 at 08:31:52PM -0800, Joshua D. Drake wrote: > > > >>It sure is. Gentoo with kernel version 2.6.12, built for x86_64. > >>Looks like we have a contender for the common factor. :) > >> > > > >Please tell me you're *not* running a production database on Gentoo. > > > > > > regards, tom lane > > > You don't even want to know how many companies I know that are doing > this very thing and no, it was not my suggestion. "Like the annoying teenager next door with a 90hp import sporting a 6 foot tall bolt-on wing, Gentoo users are proof that society is best served by roving gangs of armed vigilantes, dishing out swift, cold justice with baseball bats..." http://funroll-loops.org/
John Sidney-Woollett <johnsw@wardbrook.com> writes: > Tom Lane wrote: >> *What* is consuming memory, exactly --- which processes? > Sorry but I don't know how to determine that. Try "ps auxw", or some other incantation if you prefer, so long as it includes some statistics about process memory use. What you showed us is certainly not helpful. regards, tom lane
Tom Lane said: > John Sidney-Woollett <johnsw@wardbrook.com> writes: >> Tom Lane wrote: >>> *What* is consuming memory, exactly --- which processes? > >> Sorry but I don't know how to determine that. > > Try "ps auxw", or some other incantation if you prefer, so long as it > includes some statistics about process memory use. What you showed us > is certainly not helpful. At the moment not one process's VSZ is over 16Mb with the exception of one of the slon processes which is at 66Mb. I'll run this over the next few days and especially as the server starts bogging down to see if it identifies the culprit. Is it possible to grab memory outsize of a processes space? Or would a leak always show up by an ever increasing VSZ amount? Thanks John
On Tue, 2005-12-13 at 09:13, Tom Lane wrote: > John Sidney-Woollett <johnsw@wardbrook.com> writes: > > Tom Lane wrote: > >> *What* is consuming memory, exactly --- which processes? > > > Sorry but I don't know how to determine that. > > Try "ps auxw", or some other incantation if you prefer, so long as it > includes some statistics about process memory use. What you showed us > is certainly not helpful. Or run top and hit M while it's running, and it'll sort according to what uses the most memory.
"John Sidney-Woollett" <johnsw@wardbrook.com> writes: > Is it possible to grab memory outsize of a processes space? Not unless there's a kernel bug involved. regards, tom lane
On Tue, Dec 13, 2005 at 04:37:42PM -0000, John Sidney-Woollett wrote: > I'll run this over the next few days and especially as the server starts > bogging down to see if it identifies the culprit. > > Is it possible to grab memory outsize of a processes space? Or would a > leak always show up by an ever increasing VSZ amount? The only way to know what a process can access is by looking in /proc/<pid>/maps. This lists all the memory ranges a process can access. The thing about postgres is that each backend dies when the connection closes, so only a handful of processes are going to be around long enough to cause a problem. The ones you need to look at are the number of mappings with a zero-inode excluding the shared memory segment. A diff between two days might tell you which segments are growing. Must be for exactly the same process to be meaningful. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Martijn Thanks for the tip. Since the connections on this server are from slon, I'm hoping that they hand around for a *long* time, and long enough to take a look to see what is going on. John Martijn van Oosterhout wrote: > On Tue, Dec 13, 2005 at 04:37:42PM -0000, John Sidney-Woollett wrote: > >>I'll run this over the next few days and especially as the server starts >>bogging down to see if it identifies the culprit. >> >>Is it possible to grab memory outsize of a processes space? Or would a >>leak always show up by an ever increasing VSZ amount? > > > The only way to know what a process can access is by looking in > /proc/<pid>/maps. This lists all the memory ranges a process can > access. The thing about postgres is that each backend dies when the > connection closes, so only a handful of processes are going to be > around long enough to cause a problem. > > The ones you need to look at are the number of mappings with a > zero-inode excluding the shared memory segment. A diff between two days > might tell you which segments are growing. Must be for exactly the same > process to be meaningful. > > Have a nice day,