Thread: Still problems with memory swapping and server load
Hi! I'm still "being hosed over big time" as Curt Sampson put it. It's still the same machine and database: 1GB RAM, 4xPIII550Xeon, dumpall.sql is ~300MB (see "[GENERAL] Urgent: Tuning strategies?"). It all starts with a humble 8MB swap being used (I expect that's just the empty swap with nothing in it but some system overhead). Then after a short time, memory usage climbs slow but continuously until it hits physical RAM ceiling and starts using swap - with not very nice results for the database. Swap sometimes amounts to 200MB or more. I altered postgresql.conf-settings as recommended: max_connections = 128 shared_buffers = 32768 sort_mem = 8192 (16384 or 32768 didn't help either) wal_files = 32 wal_buffers = 32 fsync = false Everything else is commented out resp. on default settings. This is what top gives me: 2:41pm up 6 days, 3:51, 3 users, load average: 4.41, 2.71, 1.89 114 processes: 105 sleeping, 9 running, 0 zombie, 0 stopped CPU0 states: 60.2% user, 14.3% system, 0.0% nice, 24.5% idle CPU1 states: 81.0% user, 14.5% system, 0.0% nice, 4.0% idle CPU2 states: 77.0% user, 12.3% system, 0.0% nice, 10.2% idle CPU3 states: 71.1% user, 13.3% system, 0.0% nice, 15.0% idle Mem: 1029400K av, 1023264K used, 6136K free, 0K shrd, 7176K buff Swap: 2097136K av, 49656K used, 2047480K free 819052K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 6848 postgres 16 0 247M 247M 246M R 93.6 24.6 4:06 postmaster 6928 postgres 9 0 13836 13M 13016 S 25.8 1.3 0:13 postmaster 6926 postgres 9 0 14572 14M 13756 S 23.8 1.4 0:13 postmaster 6920 postgres 10 0 14296 13M 13476 R 21.1 1.3 0:13 postmaster 6874 postgres 9 0 49408 48M 43168 S 19.8 4.7 3:57 postmaster 6911 postgres 9 0 66840 65M 65728 S 19.4 6.4 0:01 postmaster 6917 postgres 9 0 9108 9104 8204 R 19.4 0.8 0:13 postmaster 6875 postgres 11 0 41564 40M 35324 R 18.7 4.0 3:31 postmaster 6927 postgres 10 0 14148 13M 13328 R 17.4 1.3 0:12 postmaster 6889 postgres 9 0 24844 24M 23632 S 15.8 2.4 0:17 postmaster 6903 postgres 12 0 204M 204M 203M R 10.2 20.3 0:27 postmaster 6837 postgres 9 0 251M 251M 250M S 9.8 25.0 0:37 postmaster 5929 postgres 15 0 940 884 668 R 8.9 0.0 8:23 top 6934 root 16 0 976 976 732 R 8.0 0.0 0:07 top 6852 postgres 9 0 227M 227M 226M R 7.8 22.6 0:12 postmaster 6897 postgres 9 0 14988 14M 13948 S 6.0 1.4 0:01 postmaster 6838 postgres 9 0 18364 17M 17304 S 5.6 1.7 0:04 postmaster 6845 postgres 9 0 52344 51M 50916 S 3.6 5.0 0:09 postmaster 6834 postgres 9 0 25456 24M 24356 S 3.0 2.4 0:26 postmaster 6894 postgres 9 0 247M 247M 246M S 2.1 24.6 0:27 postmaster 6907 postgres 9 0 16020 15M 14992 S 1.8 1.5 0:03 postmaster 6904 postgres 9 0 16604 16M 15528 S 1.0 1.6 0:13 postmaster 4799 root 9 0 1820 1444 1300 S 0.1 0.1 0:07 sshd 6893 postgres 9 0 18396 17M 17332 S 0.1 1.7 0:07 postmaster 6916 postgres 9 0 8940 8936 8020 S 0.1 0.8 0:08 postmaster cat /proc/sys/kernel/shmmax is 323380838 and ipcs -m returns: ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 32768 root 600 1056768 3 dest 0x00000000 98305 root 600 33554432 3 dest 0x00000000 131074 wwwrun 600 368644 3 dest 0x0052e2c1 2457603 postgres 600 274554880 54 0x07021999 229380 root 644 1104 2 Any ideas? Some other way out than "get more RAM"? How effective can I expect more RAM to be in my situation - wouldn't it just clutter up and overflow into swap like the first 1GB? How come system-processingtime gets so high? I followed the other suggestions and tracked slow queries using logging - now there's hardly anything left with >0.5s execution time elapsed, but the backends still seem to run amok. As always very greatful for any help. Regardy, Markus
On Wed, Jun 26, 2002 at 02:50:56PM +0200, Markus Wollny wrote: > Hi! > > I'm still "being hosed over big time" as Curt Sampson put it. It's still > the same machine and database: 1GB RAM, 4xPIII550Xeon, dumpall.sql is > ~300MB (see "[GENERAL] Urgent: Tuning strategies?"). It all starts with > a humble 8MB swap being used (I expect that's just the empty swap with > nothing in it but some system overhead). Then after a short time, memory > usage climbs slow but continuously until it hits physical RAM ceiling > and starts using swap - with not very nice results for the database. > Swap sometimes amounts to 200MB or more. Well, there's your problam. As soon as you swap, all performance goes out the window. Looking below there is five processes using over 200MB of RAM each. Lookup the PIDs in the log file to see what query it is. Sound's like you have one whopping big query dragging everything down. > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND > 6848 postgres 16 0 247M 247M 246M R 93.6 24.6 4:06 postmaster > 6903 postgres 12 0 204M 204M 203M R 10.2 20.3 0:27 postmaster > 6837 postgres 9 0 251M 251M 250M S 9.8 25.0 0:37 postmaster > 6852 postgres 9 0 227M 227M 226M R 7.8 22.6 0:12 postmaster > 6894 postgres 9 0 247M 247M 246M S 2.1 24.6 0:27 postmaster > Any ideas? Some other way out than "get more RAM"? How effective can I > expect more RAM to be in my situation - wouldn't it just clutter up and > overflow into swap like the first 1GB? How come system-processingtime > gets so high? I followed the other suggestions and tracked slow queries > using logging - now there's hardly anything left with >0.5s execution > time elapsed, but the backends still seem to run amok. Well, that first postmaster there seems to have been going for 4 minutes already. Postgres has various stats functions to allow you to work out what query each backend is currently executing. Are you sure you don't have an unconstrained join or something? -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > There are 10 kinds of people in the world, those that can do binary > arithmetic and those that can't.
"Markus Wollny" <Markus.Wollny@computec.de> writes: > I'm still "being hosed over big time" as Curt Sampson put it. It's still > the same machine and database: 1GB RAM, 4xPIII550Xeon, dumpall.sql is > ~300MB (see "[GENERAL] Urgent: Tuning strategies?"). It all starts with > a humble 8MB swap being used (I expect that's just the empty swap with > nothing in it but some system overhead). Then after a short time, memory > usage climbs slow but continuously until it hits physical RAM ceiling > and starts using swap - with not very nice results for the database. It sort of looks like you are seeing a memory-leak problem. I thought we'd largely eliminated that class of trouble in recent releases, but maybe there's still one or two left. Can you identify the exact query or queries that cause individual backends' memory usage to grow? regards, tom lane
On Wed, 26 Jun 2002, Markus Wollny wrote: > the same machine and database: 1GB RAM, 4xPIII550Xeon, dumpall.sql is > ~300MB (see "[GENERAL] Urgent: Tuning strategies?"). It all starts with > a humble 8MB swap being used (I expect that's just the empty swap with > nothing in it but some system overhead). Then after a short time, memory > usage climbs slow but continuously until it hits physical RAM ceiling > and starts using swap - with not very nice results for the database. > Swap sometimes amounts to 200MB or more. Also use "vmstat", "systat vmstat" or whatever your system's equivalant is to see just how much swapping you're doing. I wouldn't be surprised to see some unused programs being pushed out to swap as you do a lot of I/O, but if you're pushing stuff out to swap and bringing it back in on a regular basis, you've still got problems. Also, remember, your OS may consider reading a program binary when you run a program to be "page in" activity, so don't get to worried about that, unless you also see page out activity. > max_connections = 128 > shared_buffers = 32768 > sort_mem = 8192 (16384 or 32768 didn't help either) > wal_files = 32 > wal_buffers = 32 > fsync = false That looks good. Nothing should be using terribly much memory now. > Mem: 1029400K av, 1023264K used, 6136K free, 0K shrd, 7176K buff Ok, with only 7176K allocated to buffers, you've definitely got some programs eating up your RAM, I'd say. Looking at your postmasters below, sorted by size: > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND > 6837 postgres 9 0 251M 251M 250M S 9.8 25.0 0:37 postmaster > 6894 postgres 9 0 247M 247M 246M S 2.1 24.6 0:27 postmaster > 6848 postgres 16 0 247M 247M 246M R 93.6 24.6 4:06 postmaster > 6852 postgres 9 0 227M 227M 226M R 7.8 22.6 0:12 postmaster > 6903 postgres 12 0 204M 204M 203M R 10.2 20.3 0:27 postmaster > 6911 postgres 9 0 66840 65M 65728 S 19.4 6.4 0:01 postmaster > 6845 postgres 9 0 52344 51M 50916 S 3.6 5.0 0:09 postmaster > 6874 postgres 9 0 49408 48M 43168 S 19.8 4.7 3:57 postmaster > 6875 postgres 11 0 41564 40M 35324 R 18.7 4.0 3:31 postmaster > 6834 postgres 9 0 25456 24M 24356 S 3.0 2.4 0:26 postmaster > 6889 postgres 9 0 24844 24M 23632 S 15.8 2.4 0:17 postmaster > 6893 postgres 9 0 18396 17M 17332 S 0.1 1.7 0:07 postmaster > 6838 postgres 9 0 18364 17M 17304 S 5.6 1.7 0:04 postmaster > 6904 postgres 9 0 16604 16M 15528 S 1.0 1.6 0:13 postmaster > 6907 postgres 9 0 16020 15M 14992 S 1.8 1.5 0:03 postmaster > 6897 postgres 9 0 14988 14M 13948 S 6.0 1.4 0:01 postmaster > 6926 postgres 9 0 14572 14M 13756 S 23.8 1.4 0:13 postmaster > 6920 postgres 10 0 14296 13M 13476 R 21.1 1.3 0:13 postmaster > 6927 postgres 10 0 14148 13M 13328 R 17.4 1.3 0:12 postmaster > 6928 postgres 9 0 13836 13M 13016 S 25.8 1.3 0:13 postmaster > 6917 postgres 9 0 9108 9104 8204 R 19.4 0.8 0:13 postmaster > 6916 postgres 9 0 8940 8936 8020 S 0.1 0.8 0:08 postmaster > 4799 root 9 0 1820 1444 1300 S 0.1 0.1 0:07 sshd > 6934 root 16 0 976 976 732 R 8.0 0.0 0:07 top > 5929 postgres 15 0 940 884 668 R 8.9 0.0 8:23 top Some of your backends are getting pretty darn big. I wonder what they're doing? It can't be sort memory at this point. But as you can see, those five 200-250MB backends are killing you. If you can figure out what they're doing, and put a stop to that memory usage, that would help you. Alternatively, perhaps just dropping another 1-2 GB of RAM in the machine would fix your problem. Also, for this kind of thing, it's better to provide a "ps aux" or "ps -ef" than a top, unless you're sure that that display above is all of the processes. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
Hi! I don't know exactly how to find the offending queries. All I was able to come up with is check top-output, nail down the pid and then scan over the logfile to get some queries - but of course there's lots of queries using this very pid subsequently. How do I determine the details? Right now all I could see was that all the queries where using the begin|declare sql_cursor|fetch|close sql_cursor|end-pattern induced by the odbc-driver, I presume. How do I pinpoint the specific offender? Regards, Markus > -----Ursprüngliche Nachricht----- > Von: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Gesendet: Mittwoch, 26. Juni 2002 16:59 > An: Markus Wollny > Cc: pgsql-general@postgresql.org > Betreff: Re: [GENERAL] Still problems with memory swapping and server > load > > > "Markus Wollny" <Markus.Wollny@computec.de> writes: > > I'm still "being hosed over big time" as Curt Sampson put > it. It's still > > the same machine and database: 1GB RAM, 4xPIII550Xeon, > dumpall.sql is > > ~300MB (see "[GENERAL] Urgent: Tuning strategies?"). It all > starts with > > a humble 8MB swap being used (I expect that's just the > empty swap with > > nothing in it but some system overhead). Then after a short > time, memory > > usage climbs slow but continuously until it hits physical > RAM ceiling > > and starts using swap - with not very nice results for the database. > > It sort of looks like you are seeing a memory-leak problem. I thought > we'd largely eliminated that class of trouble in recent releases, but > maybe there's still one or two left. Can you identify the exact query > or queries that cause individual backends' memory usage to grow? > > regards, tom lane >
Hi, -- Markus Wollny <Markus.Wollny@computec.de> wrote: > Mem: 1029400K av, 1023264K used, 6136K free, 0K shrd, 7176K > buff you use still too much RAM, there are only 7 MB left for OS caching, this is really not enough! > shared_buffers = 32768 256 MB shared memory; this might OK, if you don't use other large applications (e.g. a big mod_perl enabled Apache). > sort_mem = 8192 (16384 or 32768 didn't help either) Each sort can take up to 8 MB RAM; if you have some queries which needs e.g. 3 big sorts and have 30 from this in parallel, this takes ~720 MB RAM. Try to reduce a) memory consumption of other applications, b) Memory consumption of postgres: if you have a lot of big sorts, try to reduce them in your application or reduce sort_mem; also 20000 shared buffers is mostly enough. Also, sorting buffers on disk should be NOT an a RAID 5 array (slow(er) writes). With my experience, the BIOS should at least have 200 MB Cache on a 1 GB Machine. Ciao Alvar, Just my ideas ... :-) -- // Unterschreiben! http://www.odem.org/informationsfreiheit/ // Internet am Telefon: http://www.teletrust.info/ // Das freieste Medium? http://www.odem.org/insert_coin/ // Blaster: http://www.assoziations-blaster.de/
-- Curt Sampson <cjs@cynic.net> wrote: > Some of your backends are getting pretty darn big. I wonder what > they're doing? It can't be sort memory at this point. But as you > can see, those five 200-250MB backends are killing you. no, not really: they use shared memory: >> PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND >> 6837 postgres 9 0 251M 251M 250M S 9.8 25.0 0:37 postmaster >> 6894 postgres 9 0 247M 247M 246M S 2.1 24.6 0:27 postmaster >> 6848 postgres 16 0 247M 247M 246M R 93.6 24.6 4:06 postmaster >> 6852 postgres 9 0 227M 227M 226M R 7.8 22.6 0:12 postmaster >> 6903 postgres 12 0 204M 204M 203M R 10.2 20.3 0:27 postmaster >> 6911 postgres 9 0 66840 65M 65728 S 19.4 6.4 0:01 postmaster >> 6845 postgres 9 0 52344 51M 50916 S 3.6 5.0 0:09 postmaster >> 6874 postgres 9 0 49408 48M 43168 S 19.8 4.7 3:57 postmaster >> 6875 postgres 11 0 41564 40M 35324 R 18.7 4.0 3:31 postmaster >> 6834 postgres 9 0 25456 24M 24356 S 3.0 2.4 0:26 postmaster >> 6889 postgres 9 0 24844 24M 23632 S 15.8 2.4 0:17 postmaster >> 6893 postgres 9 0 18396 17M 17332 S 0.1 1.7 0:07 postmaster [...] So it seems that some other processes eat the memory. Ciao Alvar -- // Unterschreiben! http://www.odem.org/informationsfreiheit/ // Internet am Telefon: http://www.teletrust.info/ // Das freieste Medium? http://www.odem.org/insert_coin/ // Blaster: http://www.assoziations-blaster.de/
Alvar Freude <alvar@a-blast.org> writes: > -- Curt Sampson <cjs@cynic.net> wrote: >> Some of your backends are getting pretty darn big. I wonder what >> they're doing? It can't be sort memory at this point. But as you >> can see, those five 200-250MB backends are killing you. > no, not really: they use shared memory: No, because all the backends will be accessing the *exact same* shared memory; it's impossible for some to be using more than others. Since we see process sizes ranging from 17M to 251M, it's a pretty safe bet that the latter guys are actually chewing a lot of private space. I don't know what the "share" column means in that ps display, but I wouldn't trust it. regards, tom lane
On Wed, 26 Jun 2002, Alvar Freude wrote: > > sort_mem = 8192 (16384 or 32768 didn't help either) > > Each sort can take up to 8 MB RAM; if you have some queries which needs > e.g. 3 big sorts and have 30 from this in parallel, this takes ~720 MB RAM. Not quite. As I mentioned earlier, postgres 7.2 actually allocates between 2.5 and three times that amount of memory. So in your example, you'd have 90 * 3 * 8 * 2.5 or more like 5 GB of memory. :-) cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC