Thread: Still problems with memory swapping and server load

Still problems with memory swapping and server load

From
"Markus Wollny"
Date:
Hi!

I'm still "being hosed over big time" as Curt Sampson put it. It's still
the same machine and database: 1GB RAM, 4xPIII550Xeon, dumpall.sql is
~300MB (see "[GENERAL] Urgent: Tuning strategies?"). It all starts with
a humble 8MB swap being used (I expect that's just the empty swap with
nothing in it but some system overhead). Then after a short time, memory
usage climbs slow but continuously until it hits physical RAM ceiling
and starts using swap - with not very nice results for the database.
Swap sometimes amounts to 200MB or more.

I altered postgresql.conf-settings as recommended:

max_connections = 128
shared_buffers = 32768
sort_mem = 8192 (16384 or 32768 didn't help either)
wal_files = 32
wal_buffers = 32
fsync = false

Everything else is commented out resp. on default settings.

This is what top gives me:
  2:41pm  up 6 days,  3:51,  3 users,  load average: 4.41, 2.71, 1.89
114 processes: 105 sleeping, 9 running, 0 zombie, 0 stopped
CPU0 states: 60.2% user, 14.3% system,  0.0% nice, 24.5% idle
CPU1 states: 81.0% user, 14.5% system,  0.0% nice,  4.0% idle
CPU2 states: 77.0% user, 12.3% system,  0.0% nice, 10.2% idle
CPU3 states: 71.1% user, 13.3% system,  0.0% nice, 15.0% idle
Mem:  1029400K av, 1023264K used,    6136K free,       0K shrd,    7176K
buff
Swap: 2097136K av,   49656K used, 2047480K free                  819052K
cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 6848 postgres  16   0  247M 247M  246M R    93.6 24.6   4:06 postmaster
 6928 postgres   9   0 13836  13M 13016 S    25.8  1.3   0:13 postmaster
 6926 postgres   9   0 14572  14M 13756 S    23.8  1.4   0:13 postmaster
 6920 postgres  10   0 14296  13M 13476 R    21.1  1.3   0:13 postmaster
 6874 postgres   9   0 49408  48M 43168 S    19.8  4.7   3:57 postmaster
 6911 postgres   9   0 66840  65M 65728 S    19.4  6.4   0:01 postmaster
 6917 postgres   9   0  9108 9104  8204 R    19.4  0.8   0:13 postmaster
 6875 postgres  11   0 41564  40M 35324 R    18.7  4.0   3:31 postmaster
 6927 postgres  10   0 14148  13M 13328 R    17.4  1.3   0:12 postmaster
 6889 postgres   9   0 24844  24M 23632 S    15.8  2.4   0:17 postmaster
 6903 postgres  12   0  204M 204M  203M R    10.2 20.3   0:27 postmaster
 6837 postgres   9   0  251M 251M  250M S     9.8 25.0   0:37 postmaster
 5929 postgres  15   0   940  884   668 R     8.9  0.0   8:23 top
 6934 root      16   0   976  976   732 R     8.0  0.0   0:07 top
 6852 postgres   9   0  227M 227M  226M R     7.8 22.6   0:12 postmaster
 6897 postgres   9   0 14988  14M 13948 S     6.0  1.4   0:01 postmaster
 6838 postgres   9   0 18364  17M 17304 S     5.6  1.7   0:04 postmaster
 6845 postgres   9   0 52344  51M 50916 S     3.6  5.0   0:09 postmaster
 6834 postgres   9   0 25456  24M 24356 S     3.0  2.4   0:26 postmaster
 6894 postgres   9   0  247M 247M  246M S     2.1 24.6   0:27 postmaster
 6907 postgres   9   0 16020  15M 14992 S     1.8  1.5   0:03 postmaster
 6904 postgres   9   0 16604  16M 15528 S     1.0  1.6   0:13 postmaster
 4799 root       9   0  1820 1444  1300 S     0.1  0.1   0:07 sshd
 6893 postgres   9   0 18396  17M 17332 S     0.1  1.7   0:07 postmaster
 6916 postgres   9   0  8940 8936  8020 S     0.1  0.8   0:08 postmaster

cat /proc/sys/kernel/shmmax is 323380838 and ipcs -m returns:
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status

0x00000000 32768      root      600        1056768    3          dest

0x00000000 98305      root      600        33554432   3          dest

0x00000000 131074     wwwrun    600        368644     3          dest

0x0052e2c1 2457603    postgres  600        274554880  54

0x07021999 229380     root      644        1104       2


Any ideas? Some other way out than "get more RAM"? How effective can I
expect more RAM to be in my situation - wouldn't it just clutter up and
overflow into swap like the first 1GB? How come system-processingtime
gets so high? I followed the other suggestions and tracked slow queries
using logging - now there's hardly anything left with >0.5s execution
time elapsed, but the backends still seem to run amok.

As always very greatful for any help.

Regardy,

Markus



Re: Still problems with memory swapping and server load

From
Martijn van Oosterhout
Date:
On Wed, Jun 26, 2002 at 02:50:56PM +0200, Markus Wollny wrote:
> Hi!
>
> I'm still "being hosed over big time" as Curt Sampson put it. It's still
> the same machine and database: 1GB RAM, 4xPIII550Xeon, dumpall.sql is
> ~300MB (see "[GENERAL] Urgent: Tuning strategies?"). It all starts with
> a humble 8MB swap being used (I expect that's just the empty swap with
> nothing in it but some system overhead). Then after a short time, memory
> usage climbs slow but continuously until it hits physical RAM ceiling
> and starts using swap - with not very nice results for the database.
> Swap sometimes amounts to 200MB or more.

Well, there's your problam. As soon as you swap, all performance goes out
the window. Looking below there is five processes using over 200MB of RAM
each. Lookup the PIDs in the log file to see what query it is. Sound's like
you have one whopping big query dragging everything down.

>   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
>  6848 postgres  16   0  247M 247M  246M R    93.6 24.6   4:06 postmaster
>  6903 postgres  12   0  204M 204M  203M R    10.2 20.3   0:27 postmaster
>  6837 postgres   9   0  251M 251M  250M S     9.8 25.0   0:37 postmaster
>  6852 postgres   9   0  227M 227M  226M R     7.8 22.6   0:12 postmaster
>  6894 postgres   9   0  247M 247M  246M S     2.1 24.6   0:27 postmaster

> Any ideas? Some other way out than "get more RAM"? How effective can I
> expect more RAM to be in my situation - wouldn't it just clutter up and
> overflow into swap like the first 1GB? How come system-processingtime
> gets so high? I followed the other suggestions and tracked slow queries
> using logging - now there's hardly anything left with >0.5s execution
> time elapsed, but the backends still seem to run amok.

Well, that first postmaster there seems to have been going for 4 minutes
already. Postgres has various stats functions to allow you to work out what
query each backend is currently executing. Are you sure you don't have an
unconstrained join or something?

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> There are 10 kinds of people in the world, those that can do binary
> arithmetic and those that can't.



Re: Still problems with memory swapping and server load

From
Tom Lane
Date:
"Markus Wollny" <Markus.Wollny@computec.de> writes:
> I'm still "being hosed over big time" as Curt Sampson put it. It's still
> the same machine and database: 1GB RAM, 4xPIII550Xeon, dumpall.sql is
> ~300MB (see "[GENERAL] Urgent: Tuning strategies?"). It all starts with
> a humble 8MB swap being used (I expect that's just the empty swap with
> nothing in it but some system overhead). Then after a short time, memory
> usage climbs slow but continuously until it hits physical RAM ceiling
> and starts using swap - with not very nice results for the database.

It sort of looks like you are seeing a memory-leak problem.  I thought
we'd largely eliminated that class of trouble in recent releases, but
maybe there's still one or two left.  Can you identify the exact query
or queries that cause individual backends' memory usage to grow?

            regards, tom lane



Re: Still problems with memory swapping and server load

From
Curt Sampson
Date:
On Wed, 26 Jun 2002, Markus Wollny wrote:

> the same machine and database: 1GB RAM, 4xPIII550Xeon, dumpall.sql is
> ~300MB (see "[GENERAL] Urgent: Tuning strategies?"). It all starts with
> a humble 8MB swap being used (I expect that's just the empty swap with
> nothing in it but some system overhead). Then after a short time, memory
> usage climbs slow but continuously until it hits physical RAM ceiling
> and starts using swap - with not very nice results for the database.
> Swap sometimes amounts to 200MB or more.

Also use "vmstat", "systat vmstat" or whatever your system's
equivalant is to see just how much swapping you're doing. I wouldn't
be surprised to see some unused programs being pushed out to swap
as you do a lot of I/O, but if you're pushing stuff out to swap
and bringing it back in on a regular basis, you've still got
problems.

Also, remember, your OS may consider reading a program binary when
you run a program to be "page in" activity, so don't get to worried
about that, unless you also see page out activity.

> max_connections = 128
> shared_buffers = 32768
> sort_mem = 8192 (16384 or 32768 didn't help either)
> wal_files = 32
> wal_buffers = 32
> fsync = false

That looks good. Nothing should be using terribly much memory now.

> Mem:  1029400K av, 1023264K used,    6136K free,    0K shrd,    7176K buff

Ok, with only 7176K allocated to buffers, you've definitely got
some programs eating up your RAM, I'd say. Looking at your postmasters
below, sorted by size:

>   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
>  6837 postgres   9   0  251M 251M  250M S     9.8 25.0   0:37 postmaster
>  6894 postgres   9   0  247M 247M  246M S     2.1 24.6   0:27 postmaster
>  6848 postgres  16   0  247M 247M  246M R    93.6 24.6   4:06 postmaster
>  6852 postgres   9   0  227M 227M  226M R     7.8 22.6   0:12 postmaster
>  6903 postgres  12   0  204M 204M  203M R    10.2 20.3   0:27 postmaster
>  6911 postgres   9   0 66840  65M 65728 S    19.4  6.4   0:01 postmaster
>  6845 postgres   9   0 52344  51M 50916 S     3.6  5.0   0:09 postmaster
>  6874 postgres   9   0 49408  48M 43168 S    19.8  4.7   3:57 postmaster
>  6875 postgres  11   0 41564  40M 35324 R    18.7  4.0   3:31 postmaster
>  6834 postgres   9   0 25456  24M 24356 S     3.0  2.4   0:26 postmaster
>  6889 postgres   9   0 24844  24M 23632 S    15.8  2.4   0:17 postmaster
>  6893 postgres   9   0 18396  17M 17332 S     0.1  1.7   0:07 postmaster
>  6838 postgres   9   0 18364  17M 17304 S     5.6  1.7   0:04 postmaster
>  6904 postgres   9   0 16604  16M 15528 S     1.0  1.6   0:13 postmaster
>  6907 postgres   9   0 16020  15M 14992 S     1.8  1.5   0:03 postmaster
>  6897 postgres   9   0 14988  14M 13948 S     6.0  1.4   0:01 postmaster
>  6926 postgres   9   0 14572  14M 13756 S    23.8  1.4   0:13 postmaster
>  6920 postgres  10   0 14296  13M 13476 R    21.1  1.3   0:13 postmaster
>  6927 postgres  10   0 14148  13M 13328 R    17.4  1.3   0:12 postmaster
>  6928 postgres   9   0 13836  13M 13016 S    25.8  1.3   0:13 postmaster
>  6917 postgres   9   0  9108 9104  8204 R    19.4  0.8   0:13 postmaster
>  6916 postgres   9   0  8940 8936  8020 S     0.1  0.8   0:08 postmaster
>  4799 root       9   0  1820 1444  1300 S     0.1  0.1   0:07 sshd
>  6934 root      16   0   976  976   732 R     8.0  0.0   0:07 top
>  5929 postgres  15   0   940  884   668 R     8.9  0.0   8:23 top

Some of your backends are getting pretty darn big. I wonder what
they're doing? It can't be sort memory at this point. But as you
can see, those five 200-250MB backends are killing you.

If you can figure out what they're doing, and put a stop to that
memory usage, that would help you. Alternatively, perhaps just
dropping another 1-2 GB of RAM in the machine would fix your problem.

Also, for this kind of thing, it's better to provide a "ps aux" or
"ps -ef" than a top, unless you're sure that that display above is
all of the processes.

cjs
--
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
    Don't you know, in this new Dark Age, we're all light.  --XTC




Re: Still problems with memory swapping and server load

From
"Markus Wollny"
Date:
Hi!

I don't know exactly how to find the offending queries. All I was able
to come up with is check top-output, nail down the pid and then scan
over the logfile to get some queries - but of course there's lots of
queries using this very pid subsequently. How do I determine the
details? Right now all I could see was that all the queries where using
the begin|declare sql_cursor|fetch|close sql_cursor|end-pattern induced
by the odbc-driver, I presume. How do I pinpoint the specific offender?

Regards,

Markus

> -----Ursprüngliche Nachricht-----
> Von: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Gesendet: Mittwoch, 26. Juni 2002 16:59
> An: Markus Wollny
> Cc: pgsql-general@postgresql.org
> Betreff: Re: [GENERAL] Still problems with memory swapping and server
> load
>
>
> "Markus Wollny" <Markus.Wollny@computec.de> writes:
> > I'm still "being hosed over big time" as Curt Sampson put
> it. It's still
> > the same machine and database: 1GB RAM, 4xPIII550Xeon,
> dumpall.sql is
> > ~300MB (see "[GENERAL] Urgent: Tuning strategies?"). It all
> starts with
> > a humble 8MB swap being used (I expect that's just the
> empty swap with
> > nothing in it but some system overhead). Then after a short
> time, memory
> > usage climbs slow but continuously until it hits physical
> RAM ceiling
> > and starts using swap - with not very nice results for the database.
>
> It sort of looks like you are seeing a memory-leak problem.  I thought
> we'd largely eliminated that class of trouble in recent releases, but
> maybe there's still one or two left.  Can you identify the exact query
> or queries that cause individual backends' memory usage to grow?
>
>             regards, tom lane
>



Re: Still problems with memory swapping and server load

From
Alvar Freude
Date:
Hi,

-- Markus Wollny <Markus.Wollny@computec.de> wrote:

> Mem:  1029400K av, 1023264K used,    6136K free,       0K shrd,    7176K
> buff

you use still too much RAM, there are only 7 MB left for OS caching, this
is really not enough!


> shared_buffers = 32768

256 MB shared memory; this might OK, if you don't use other large
applications (e.g. a big mod_perl enabled Apache).


> sort_mem = 8192 (16384 or 32768 didn't help either)

Each sort can take up to 8 MB RAM; if you have some queries which needs
e.g. 3 big sorts and have 30 from this in parallel, this takes ~720 MB RAM.


Try to reduce a) memory consumption of other applications, b) Memory
consumption of postgres: if you have a lot of big sorts, try to reduce them
in your application or reduce sort_mem; also 20000 shared buffers is mostly
enough.

Also, sorting buffers on disk should be NOT an a RAID 5 array (slow(er)
writes).


With my experience, the BIOS should at least have 200 MB Cache on a 1 GB
Machine.


Ciao
  Alvar, Just my ideas ... :-)

--
// Unterschreiben!      http://www.odem.org/informationsfreiheit/
// Internet am Telefon: http://www.teletrust.info/
// Das freieste Medium? http://www.odem.org/insert_coin/
// Blaster:             http://www.assoziations-blaster.de/




Re: Still problems with memory swapping and server load

From
Alvar Freude
Date:

-- Curt Sampson <cjs@cynic.net> wrote:

> Some of your backends are getting pretty darn big. I wonder what
> they're doing? It can't be sort memory at this point. But as you
> can see, those five 200-250MB backends are killing you.

no, not really: they use shared memory:


>>   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
>>  6837 postgres   9   0  251M 251M  250M S     9.8 25.0   0:37 postmaster
>>  6894 postgres   9   0  247M 247M  246M S     2.1 24.6   0:27 postmaster
>>  6848 postgres  16   0  247M 247M  246M R    93.6 24.6   4:06 postmaster
>>  6852 postgres   9   0  227M 227M  226M R     7.8 22.6   0:12 postmaster
>>  6903 postgres  12   0  204M 204M  203M R    10.2 20.3   0:27 postmaster
>>  6911 postgres   9   0 66840  65M 65728 S    19.4  6.4   0:01 postmaster
>>  6845 postgres   9   0 52344  51M 50916 S     3.6  5.0   0:09 postmaster
>>  6874 postgres   9   0 49408  48M 43168 S    19.8  4.7   3:57 postmaster
>>  6875 postgres  11   0 41564  40M 35324 R    18.7  4.0   3:31 postmaster
>>  6834 postgres   9   0 25456  24M 24356 S     3.0  2.4   0:26 postmaster
>>  6889 postgres   9   0 24844  24M 23632 S    15.8  2.4   0:17 postmaster
>>  6893 postgres   9   0 18396  17M 17332 S     0.1  1.7   0:07 postmaster
[...]

So it seems that some other processes eat the memory.


Ciao
  Alvar


--
// Unterschreiben!      http://www.odem.org/informationsfreiheit/
// Internet am Telefon: http://www.teletrust.info/
// Das freieste Medium? http://www.odem.org/insert_coin/
// Blaster:             http://www.assoziations-blaster.de/




Re: Still problems with memory swapping and server load

From
Tom Lane
Date:
Alvar Freude <alvar@a-blast.org> writes:
> -- Curt Sampson <cjs@cynic.net> wrote:
>> Some of your backends are getting pretty darn big. I wonder what
>> they're doing? It can't be sort memory at this point. But as you
>> can see, those five 200-250MB backends are killing you.

> no, not really: they use shared memory:

No, because all the backends will be accessing the *exact same* shared
memory; it's impossible for some to be using more than others.  Since we
see process sizes ranging from 17M to 251M, it's a pretty safe bet that
the latter guys are actually chewing a lot of private space.  I don't
know what the "share" column means in that ps display, but I wouldn't
trust it.

            regards, tom lane



Re: Still problems with memory swapping and server load

From
Curt Sampson
Date:
On Wed, 26 Jun 2002, Alvar Freude wrote:

> > sort_mem = 8192 (16384 or 32768 didn't help either)
>
> Each sort can take up to 8 MB RAM; if you have some queries which needs
> e.g. 3 big sorts and have 30 from this in parallel, this takes ~720 MB RAM.

Not quite. As I mentioned earlier, postgres 7.2 actually allocates
between 2.5 and three times that amount of memory. So in your
example, you'd have 90 * 3 * 8 * 2.5 or more like 5 GB of memory. :-)

cjs
--
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
    Don't you know, in this new Dark Age, we're all light.  --XTC