Thread: postgresql + apache under heavy load

postgresql + apache under heavy load

From

Alex Madon

Date:

21 January 2004, 10:05:25

Hello,
I am testing a web application (using the DBX PHP function to call a
Postgresql backend).
I have 375Mb RAM on my test home box.
I ran ab (apache benchmark) to test the behaviour of the application
under heavy load.
When increasing the number of requests, all my memory is filled, and the
Linux server begins to cache and remains frozen.

ab -n 100 -c 10 http://localsite/testscript
behaves OK.

If I increases to
ab -n 1000 -c 100 http://localsite/testscript
I get this memory problem.

If I eliminate the connection to the (UNIX) socket of Postgresql, the
script behaves well even under very high load (and of course with much
less time spent per request).

I tried to change some parameters in postgresql.conf
max_connections = 32
to max_connections = 8

and

shared_buffers = 64
to shared_buffers = 16

without success.

I tried to use pmap on httpd and postmaster Process ID but don't get
much help.

Does anybody have some idea to help to debug/understand/solve this
issue? Any feedback is appreciated.
To me, it would not be a problem if the box is very slow under heavy
load (DoS like), but I really dislike having my box out of service after
such a DoS attack.
I am looking for a way to limit the memory used by postgres.

Thanks
Alex

Re: postgresql + apache under heavy load

From

"scott.marlowe"

Date:

21 January 2004, 11:58:47

On Wed, 21 Jan 2004, Alex Madon wrote:

> Hello,
> I am testing a web application (using the DBX PHP function to call a
> Postgresql backend).

I'm not familiar with DBX.  Is that connection pooling or what?

> I have 375Mb RAM on my test home box.
> I ran ab (apache benchmark) to test the behaviour of the application
> under heavy load.
> When increasing the number of requests, all my memory is filled, and the
> Linux server begins to cache and remains frozen.

Are you SURE all your memory is in use?  What exactly does top say about
things like cached and buff memory (I'm assuming you're on linux, any
differences in top on another OS would be minor.)  If the kernel still
shows a fair bit of cached and buff memory, your memory is not getting all
used up.

> ab -n 100 -c 10 http://localsite/testscript
> behaves OK.

Keep in mind, this is 10 simo users beating the machine continuously.
that's functionally equivalent to about 100 to 200 people running through
pages as fast as people can.

> If I increases to
> ab -n 1000 -c 100 http://localsite/testscript
> I get this memory problem.

Where's the break point?  Just wondering.  Does it show up at 20, 40, 60,
80, or only at 100?  If so, that's really not bad.

> If I eliminate the connection to the (UNIX) socket of Postgresql, the
> script behaves well even under very high load (and of course with much
> less time spent per request).

Of course, the database is the most expensive part of an application,
CPU/Memory wise, written on apache/php

> I tried to change some parameters in postgresql.conf
> max_connections = 32
> to max_connections = 8

Wrong direction.  The number of connections postgresql CAN create costs
very little.  The number of connections it does create, still, costs very
little.  Have you checked to see if ab is getting valid pages, and not
"connection failed, too many connections already open" pages?

> shared_buffers = 64
> to shared_buffers = 16

Way the wrong way.  Shared buffers are the max memory all the backends
together share.  The old setting was 512k ram, now you're down to 128k.
while 128k would be a lot of memory for a Commodore 128, for a machine
with 384 meg ram, it's nothing.  Since this is a TOTAL shared memory
setting, not a per process thing, you can hand it a good chunk of ram and
not usually worry about it.  Set it to 512 and just leave it.  That's only
4 megs of shared memory, if your machine is running that low, other things
have gone wrong.

> without success.
>
> I tried to use pmap on httpd and postmaster Process ID but don't get
> much help.
>
> Does anybody have some idea to help to debug/understand/solve this
> issue? Any feedback is appreciated.
> To me, it would not be a problem if the box is very slow under heavy
> load (DoS like), but I really dislike having my box out of service after
> such a DoS attack.

Does it not come back?  That's bad.

> I am looking for a way to limit the memory used by postgres.

Don't it's likely not using too much.

What does top say is the highest memory user?

Re: postgresql + apache under heavy load

From

"Joshua D. Drake"

Date:

21 January 2004, 12:08:39

Alex Madon wrote:

> Hello,
> I am testing a web application (using the DBX PHP function to call a
> Postgresql backend).
> I have 375Mb RAM on my test home box.
> I ran ab (apache benchmark) to test the behaviour of the application
> under heavy load.
> When increasing the number of requests, all my memory is filled, and
> the Linux server begins to cache and remains frozen.
>
> ab -n 100 -c 10 http://localsite/testscript
> behaves OK.
>
> If I increases to
> ab -n 1000 -c 100 http://localsite/testscript
> I get this memory problem.


We would need a lot more information. What version of Linux? What
version of the Kernel? What is your shmmax settting?
What is your sort_mem setting? Did you use top to see where the hang up?
Are there any messages in /var/log/messages?

Sincerely,

Joshua D. Drake



>
> If I eliminate the connection to the (UNIX) socket of Postgresql, the
> script behaves well even under very high load (and of course with much
> less time spent per request).
>
> I tried to change some parameters in postgresql.conf
> max_connections = 32
> to max_connections = 8
>
> and
>
> shared_buffers = 64
> to shared_buffers = 16
>
> without success.
>
> I tried to use pmap on httpd and postmaster Process ID but don't get
> much help.
>
> Does anybody have some idea to help to debug/understand/solve this
> issue? Any feedback is appreciated.
> To me, it would not be a problem if the box is very slow under heavy
> load (DoS like), but I really dislike having my box out of service
> after such a DoS attack.
> I am looking for a way to limit the memory used by postgres.
>
> Thanks
> Alex
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if
> your
>      joining column's datatypes do not match



--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL

Re: postgresql + apache under heavy load

From

Richard Huxton

Date:

21 January 2004, 12:58:25

On Wednesday 21 January 2004 14:11, Alex Madon wrote:
> Hello,
> I am testing a web application (using the DBX PHP function to call a
> Postgresql backend).
> I have 375Mb RAM on my test home box.
[10 connections is fine, 100 is not]

> I tried to change some parameters in postgresql.conf
> max_connections = 32
> to max_connections = 8

Are you saying you had more than 8 connections open simultaneously? Are you
sure you restarted PG so that it noticed the new values? You can check config
settings with "show all;" from psql, or "show <setting>".

You'll want to use the "top" command to show the amount of memory each process
is using and then check the configuration/tuning articles at the following
URL:

http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php

First step is to make sure your changes are being detected. Then, I'd guess
you want to set:
  max_connections
  shared_buffers
  sort_mem
  vacuum_mem (less important)
and then adjust effective_cache_size so it matches your normal load.

--
  Richard Huxton
  Archonet Ltd

Re: postgresql + apache under heavy load

From

Ericson Smith

Date:

21 January 2004, 13:08:55

Could be problem be that PHP is not using connection efficiently?
Apache KeepAlive with PHP, is a dual edged sword with you holding the
blade :-)

If I am not mistaken, what happens is that a connection is kept alive
because Apache believes that other requests will come in from the client
who made the initial  connection. So 10 concurrent connections are fine,
but they are not released timely enough with 100 concurrent connections.
The system ends up waiting around for other KeepAlive connections to
timeout before Apache allows others to come in. We had this exact
problem in an environment with millions of impressions per day going to
the database. Because of the nature of our business, we were able to
disable KeepAlive and the load immediately dropped (concurrent
connection on the Postgresql database also dropped sharply). We also
turned off PHP persistent connections to the database.

The drawback is that connections are built up and torn down all the
time, and with Postgresql, it is sort of expensive. But thats a fraction
of the expense of having KeepAlive on.

Warmest regards,
Ericson Smith
Tracking Specialist/DBA
+-----------------------+--------------------------------------+
| http://www.did-it.com | "Crush my enemies, see then driven   |
| eric@did-it.com       | before me, and hear the lamentations |
| 516-255-0500          | of their women." - Conan             |
+-----------------------+--------------------------------------+

Alex Madon wrote:

> Hello,
> I am testing a web application (using the DBX PHP function to call a
> Postgresql backend).
> I have 375Mb RAM on my test home box.
> I ran ab (apache benchmark) to test the behaviour of the application
> under heavy load.
> When increasing the number of requests, all my memory is filled, and
> the Linux server begins to cache and remains frozen.
>
> ab -n 100 -c 10 http://localsite/testscript
> behaves OK.
>
> If I increases to
> ab -n 1000 -c 100 http://localsite/testscript
> I get this memory problem.
>
> If I eliminate the connection to the (UNIX) socket of Postgresql, the
> script behaves well even under very high load (and of course with much
> less time spent per request).
>
> I tried to change some parameters in postgresql.conf
> max_connections = 32
> to max_connections = 8
>
> and
>
> shared_buffers = 64
> to shared_buffers = 16
>
> without success.
>
> I tried to use pmap on httpd and postmaster Process ID but don't get
> much help.
>
> Does anybody have some idea to help to debug/understand/solve this
> issue? Any feedback is appreciated.
> To me, it would not be a problem if the box is very slow under heavy
> load (DoS like), but I really dislike having my box out of service
> after such a DoS attack.
> I am looking for a way to limit the memory used by postgres.
>
> Thanks
> Alex
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if
> your
>      joining column's datatypes do not match
>

Attachment

eric.vcf

Re: postgresql + apache under heavy load

From

Alex Madon

Date:

21 January 2004, 15:12:59

Hello Scott,
Thank you for your answer.

>I'm not familiar with DBX.  Is that connection pooling or what?
>
>
I could not find this information, sorry.

>Are you SURE all your memory is in use?  What exactly does top say about
>things like cached and buff memory (I'm assuming you're on linux, any
>differences in top on another OS would be minor.)  If the kernel still
>shows a fair bit of cached and buff memory, your memory is not getting all
>used up.
>
Well my xosview show that caching begin at a concurrency of 40.
At 80 my cache begins to be filled completely, so machine having big
problems.

>>If I increases to
>>ab -n 1000 -c 100 http://localsite/testscript
>>I get this memory problem.
>>
>>
>
>Where's the break point?  Just wondering.  Does it show up at 20, 40, 60,
>80, or only at 100?  If so, that's really not bad.
>
 Here is some results (I kept -n 100 an just vraied the -c option)
--c 1
Failed requests:        0
Time per request:       322.096 [ms] (mean, across all concurrent requests)

-c 2
Failed requests:        0
Time per request:       374.220 [ms] (mean, across all concurrent requests)

-c 10
Failed requests:        68
 (Connect: 0, Length: 68, Exceptions: 0)
Time per request:       314.779 [ms] (mean, across all concurrent requests)

-c 20
Failed requests:        68
Time per request:       369.290 [ms] (mean, across all concurrent requests)

-c 30
Failed requests:        43
Time per request:       441.947 [ms] (mean, across all concurrent requests)

=====Here begins caching to disk====

-c 40
Failed requests:        65
Time per request:       528.829 [ms] (mean, across all concurrent requests)

-c 50
Failed requests:        66
Time per request:       993.674 [ms] (mean, across all concurrent requests)

For a higher concurrency, the cache is completly filled, and have to
reboot the machine.
(I didn't leave the system caching forever, just press to reboot
button)... could be interesting to wait to see if the systems recovers
after a while

>>To me, it would not be a problem if the box is very slow under heavy
>>load (DoS like), but I really dislike having my box out of service after
>>such a DoS attack.
>>
>>
>
>Does it not come back?  That's bad.
>
>
see above

thanks
Alex

Re: postgresql + apache under heavy load

From

"scott.marlowe"

Date:

21 January 2004, 15:28:50

On Wed, 21 Jan 2004, Alex Madon wrote:

> Hello Scott,
> Thank you for your answer.
>
> >I'm not familiar with DBX.  Is that connection pooling or what?
> >
> >
> I could not find this information, sorry.
>
> >Are you SURE all your memory is in use?  What exactly does top say about
> >things like cached and buff memory (I'm assuming you're on linux, any
> >differences in top on another OS would be minor.)  If the kernel still
> >shows a fair bit of cached and buff memory, your memory is not getting all
> >used up.
> >
> Well my xosview show that caching begin at a concurrency of 40.
> At 80 my cache begins to be filled completely, so machine having big
> problems.

I think you're confusing what I meant.  Caching is good.  Swapping is bad.
Having a large amount of cache is a good thing.  It means the OS is
caching all your data in memory for faster access.

> >>If I increases to
> >>ab -n 1000 -c 100 http://localsite/testscript
> >>I get this memory problem.
> >>
> >>
> >
> >Where's the break point?  Just wondering.  Does it show up at 20, 40, 60,
> >80, or only at 100?  If so, that's really not bad.
> >
>  Here is some results (I kept -n 100 an just vraied the -c option)
> --c 1
> Failed requests:        0
> Time per request:       322.096 [ms] (mean, across all concurrent requests)
>
> -c 2
> Failed requests:        0
> Time per request:       374.220 [ms] (mean, across all concurrent requests)
>
> -c 10
> Failed requests:        68
>  (Connect: 0, Length: 68, Exceptions: 0)
> Time per request:       314.779 [ms] (mean, across all concurrent requests)

OK, there's a problem, you're getting failed requests at -c 10, which
means you likely have postgresql configured in the wrong
direction.  configure postgresql to use more memory (sort_mem can be set
to about 8 megs without a lot of issues on most boxes, going higher may
use up all your memory in certain situations (high concurrency)).

> For a higher concurrency, the cache is completly filled, and have to
> reboot the machine.

No, you should NEVER have to reboot a unix box.  period.  filled cache,
again, is a GOOD THING.   not bad.

> (I didn't leave the system caching forever, just press to reboot
> button)... could be interesting to wait to see if the systems recovers
> after a while

Yes, please do.  Also, show us a save of top while under load.

I'm betting your machine has plenty of memory, and is not using it
effectively, due to postgresql being too conservatively configured.

Re: postgresql + apache under heavy load

From

Alex Madon

Date:

21 January 2004, 15:40:00

Hello Joshua,
Thank you for your reply.
Joshua D. Drake wrote:

>
>
> We would need a lot more information. What version of Linux?

 uname -a
Linux daube 2.4.20-8 #1 Thu Mar 13 17:18:24 EST 2003 i686 athlon i386
GNU/Linux

> What version of the Kernel? What is your shmmax settting?

cat /proc/sys/kernel/shmmax
33554432

>
> What is your sort_mem setting?

I didn't change the postgresql.conf settings:
#sort_mem = 1024                # min 64, size in KB

> Did you use top to see where the hang up? Are there any messages in
> /var/log/messages?

Well as I said before the box is almost out of control: disk is caching
intensively; I run X Windows and the mouse can not point a shell... very
bad. The only thing I see is that cache is filling quickly with
xosview... and then X become frozen (or better said extremely slow).

Abou the logs:
I sent the PHP error messages to a file, and yes there are errors:
pg_connect(): Unable to connect to PostgreSQL server: FATAL:
Non-superuser connection limit exceeded
or
pg_connect(): Unable to connect to PostgreSQL server: FATAL:  Sorry, too
many clients already

Thanks
Alex

Re: postgresql + apache under heavy load

From

Alex Madon

Date:

21 January 2004, 15:43:45

Hello Richard,
Thank you for your answer.
Richard Huxton wrote:

>Are you saying you had more than 8 connections open simultaneously?
>
Well, I don't know how to find that out.
What I did is issuing several tomes a ps aux and I never saw more than
6-7 postmaster SELECT.

>Are you
>sure you restarted PG so that it noticed the new values? You can check config
>settings with "show all;" from psql, or "show <setting>".
>
Yes I restart it. The show command outputs the correct value (8).

>
>You'll want to use the "top" command to show the amount of memory each process
>

A typical output (in a concurrency of 20, no cache) is:
 ps aux | grep postgres
postgres  2332  0.0  0.0  8804  328 ?        R    18:54   0:01
/usr/bin/postmaster -p 5432 -d2
postgres  2334  0.0  0.0  9792   68 ?        S    18:54   0:00 postgres:
stats buffer process
postgres  2335  0.0  0.0  8828  200 ?        S    18:54   0:00 postgres:
stats collector process
postgres  4386  0.0  0.2  4312  956 pts/3    S    19:22   0:00 -bash
postgres  4871  0.0  0.5  9480 2304 ?        S    20:36   0:00 postgres:
user db [local] SELECT
postgres  4873  0.0  0.2  8816 1032 ?        R    20:36   0:00 postgres:
user db [local] startup
myuser    4877  0.0  0.1  3572  624 pts/4    S    20:36   0:00 grep postgres
postgres  4878  0.0  0.5  9220 2228 ?        R    20:36   0:00 postgres:
user db [local] SELECT
postgres  4879  0.0  0.5  9204 2016 ?        R    20:36   0:00 postgres:
user db [local] SELECT
---------------------------top-----------------------------
114 processes: 99 sleeping, 12 running, 3 zombie, 0 stopped
CPU states:  91.8% user   8.1% system   0.0% nice   0.0% iowait   0.0% idle
Mem:   384580k av,  316328k used,   68252k free,       0k shrd,   25424k
buff
                    253976k actv,   36916k in_d,    4704k in_c
Swap:  265064k av,   64788k used,  200276k free                   71132k
cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
 4914 apache    16   0  9016 8416  2552 S     6.7  2.1   0:00   0 httpd
 4832 apache    16   0  9016 8416  2552 S     6.3  2.1   0:01   0 httpd
 4915 apache    16   0  9016 8416  2552 S     5.9  2.1   0:00   0 httpd
 4917 apache    16   0  9016 8416  2552 S     5.9  2.1   0:00   0 httpd
 4919 apache    16   0  9020 8420  2536 S     5.9  2.1   0:00   0 httpd
 4774 apache    16   0  9016 8416  2552 S     5.7  2.1   0:02   0 httpd
 4896 apache    16   0  9060 8460  2568 S     5.7  2.1   0:00   0 httpd
 4908 apache    15   0  9016 8416  2552 S     5.7  2.1   0:00   0 httpd
 4909 apache    16   0  9016 8416  2552 S     5.7  2.1   0:00   0 httpd
 4658 apache    16   0  9136 8536  2568 S     5.5  2.2   0:04   0 httpd
 4921 apache    16   0  9016 8416  2552 S     5.5  2.1   0:00   0 httpd
 2581 root      16   0 14492 4544  1252 R     5.3  1.1   2:26   0 X
 4795 apache    16   0  9104 8504  2568 S     5.3  2.2   0:02   0 httpd
 4796 apache    16   0  9080 8480  2568 S     5.3  2.2   0:01   0 httpd
 4782 apache    16   0  8924 8324  2568 R     3.5  2.1   0:02   0 httpd
 2612 madona    15   0  4524 4136  2380 S     1.5  1.0   0:18   0 metacity
 4656 apache    15   0  9084 8484  2568 S     1.3  2.2   0:03   0 httpd
 4950 postgres  25   0     0    0     0 Z     1.1  0.0   0:00   0
postmaster <defunct>
 3812 madona    15   0 44728  42M 17460 S     0.7 11.2   3:21   0
mozilla-bin
 4947 postgres  25   0  2540 2392  1688 S     0.7  0.6   0:00   0 postmaster
 4952 postgres  25   0  2812 2664  1872 R     0.7  0.6   0:00   0 postmaster
 4610 madona    15   0  7460 7460  2152 R     0.5  1.9   0:00   0 xterm
 4904 madona    15   0  1108 1108   856 R     0.3  0.2   0:00   0 top
 4954 postgres  24   0  1916 1768  1244 R     0.1  0.4   0:00   0 postmaster
 4959 postgres  25   0  1596 1448   940 S     0.1  0.3   0:00   0 postmaster
 4961 postgres  25   0   984  824   640 R     0.1  0.2   0:00   0 postmaster
    1 root      15   0    88   60    40 S     0.0  0.0   0:04   0 init



Thanks
Alex

Re: postgresql + apache under heavy load

From

Alex Madon

Date:

21 January 2004, 15:46:06

Hello Ericson,
Thank you for your reply.
Ericson Smith wrote:

> Could be problem be that PHP is not using connection efficiently?
> Apache KeepAlive with PHP, is a dual edged sword with you holding the
> blade :-)

I turned off the KeepAlive option in httpd.conf

[
I think keepalive is not used by default by "ab" and that apche uses it
only on static content)-- see the last paragraph of:
http://httpd.apache.org/docs/keepalive.html
]
and set
pgsql.allow_persistent = Off
in php.ini,
it didn't work for me.
thanks
Alex

>
> If I am not mistaken, what happens is that a connection is kept alive
> because Apache believes that other requests will come in from the
> client who made the initial  connection. So 10 concurrent connections
> are fine, but they are not released timely enough with 100 concurrent
> connections. The system ends up waiting around for other KeepAlive
> connections to timeout before Apache allows others to come in. We had
> this exact problem in an environment with millions of impressions per
> day going to the database. Because of the nature of our business, we
> were able to disable KeepAlive and the load immediately dropped
> (concurrent connection on the Postgresql database also dropped
> sharply). We also turned off PHP persistent connections to the database.
>
> The drawback is that connections are built up and torn down all the
> time, and with Postgresql, it is sort of expensive. But thats a
> fraction of the expense of having KeepAlive on.
>
> Warmest regards, Ericson Smith
> Tracking Specialist/DBA
> +-----------------------+--------------------------------------+
> | http://www.did-it.com | "Crush my enemies, see then driven   |
> | eric@did-it.com       | before me, and hear the lamentations |
> | 516-255-0500          | of their women." - Conan             |
> +-----------------------+--------------------------------------+
>
>
> Alex Madon wrote:
>
>> Hello,
>> I am testing a web application (using the DBX PHP function to call a
>> Postgresql backend).
>> I have 375Mb RAM on my test home box.
>> I ran ab (apache benchmark) to test the behaviour of the application
>> under heavy load.
>> When increasing the number of requests, all my memory is filled, and
>> the Linux server begins to cache and remains frozen.
>>
>> ab -n 100 -c 10 http://localsite/testscript
>> behaves OK.
>>
>> If I increases to
>> ab -n 1000 -c 100 http://localsite/testscript
>> I get this memory problem.
>>
>> If I eliminate the connection to the (UNIX) socket of Postgresql, the
>> script behaves well even under very high load (and of course with
>> much less time spent per request).
>>
>> I tried to change some parameters in postgresql.conf
>> max_connections = 32
>> to max_connections = 8
>>
>> and
>>
>> shared_buffers = 64
>> to shared_buffers = 16
>>
>> without success.
>>
>> I tried to use pmap on httpd and postmaster Process ID but don't get
>> much help.
>>
>> Does anybody have some idea to help to debug/understand/solve this
>> issue? Any feedback is appreciated.
>> To me, it would not be a problem if the box is very slow under heavy
>> load (DoS like), but I really dislike having my box out of service
>> after such a DoS attack.
>> I am looking for a way to limit the memory used by postgres.
>>
>> Thanks
>> Alex
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 9: the planner will ignore your desire to choose an index scan if
>> your
>>      joining column's datatypes do not match
>>

Re: postgresql + apache under heavy load

From

Alex Madon

Date:

21 January 2004, 16:12:07

Hello Scott,
Thank you for your reply.

 >
 >I think you're confusing what I meant.  Caching is good.  Swapping is
bad.
 >Having a large amount of cache is a good thing.  It means the OS is
 >caching all your data in memory for faster access.
 >
OK; Sorry about my approximative language. I was meaning, caching to
disk... so better said as you point out is swapping.
So my system is filling the RAM (OK to me) and then begins to fill all
the Swap.
When the swap is used at near 100%, the system becomes extremely
difficult to control (not actually crashing, but accessing disk very
noisily).

 >
 >OK, there's a problem, you're getting failed requests at -c 10, which
 >means you likely have postgresql configured in the wrong
 >direction.  configure postgresql to use more memory (sort_mem can be set
 >to about 8 megs without a lot of issues on most boxes, going higher may
 >use up all your memory in certain situations (high concurrency)).
 >
Thank you!
I setting sort_mem to 8 Mb and max_connections = 32, I have zero failed
with a c=20 and 1 failed with a c=30!
Decreasing max_connections to 8 gave me a lot of failed as before.

 >
 >No, you should NEVER have to reboot a unix box.  period.  filled cache,
 >again, is a GOOD THING.   not bad.
 >
sorry, I was meaning swapping.


 >>button)... could be interesting to wait to see if the systems recovers
 >>after a while
 >
 >
 >Yes, please do.  Also, show us a save of top while under load.

Well, increasing the sort_mem and max_connections was very good to have
less failed, but this does not solve the memory limitation.
Perhaps that comes from my superficial understanding of concurrency.

Does anybody have a simple formula to know how much memory a ab
benchmark will need in this case (assuming one knows the memory needed
by one apache process (A) and the memory used by one postgres process (P)?

I would say, issuing a ab -n N -c C (asuming there is no shared library
between apache and postgres)
memory needed= CxA + CxP
(independent of N?)

Just another pedestrian question: what usually happens to a linux box
when swap is exhausted?
Is the parent process killed? Could this crash the whole machine?

My conclusion up to now, is that this box does not stand more than 40
concurrent connections.
To stand more concurrent connections, I'd need to increase the RAM or
redesign my software, e.g. using a PHP cache, static pages being served
much faster (almost by a factor 100...)
Thanks
Alex

Re: postgresql + apache under heavy load

From

Robert Treat

Date:

21 January 2004, 16:17:36

On Wed, 2004-01-21 at 13:48, Alex Madon wrote:
> Hello Scott,
> Thank you for your answer.
>
> >I'm not familiar with DBX.  Is that connection pooling or what?
> >
> >
> I could not find this information, sorry.
>

It's generic database access functions written in C. It's not connection
pooling.

http://www.php.net/manual/en/ref.dbx.php

Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

Re: postgresql + apache under heavy load

From

Richard Huxton

Date:

21 January 2004, 16:28:28

On Wednesday 21 January 2004 19:40, Alex Madon wrote:
Alex - I'm copying scott in on this since he asked for details on top too.
>
> Richard Huxton wrote:
> >Are you saying you had more than 8 connections open simultaneously?
>
> Well, I don't know how to find that out.
> What I did is issuing several tomes a ps aux and I never saw more than
> 6-7 postmaster SELECT.

Good enough for me - that's what I'd have done.

> >Are you
> >sure you restarted PG so that it noticed the new values? You can check
> > config settings with "show all;" from psql, or "show <setting>".
>
> Yes I restart it. The show command outputs the correct value (8).

Good - I don't think PG is using up your system memory, then (but we'll see).

> ---------------------------top-----------------------------
> 114 processes: 99 sleeping, 12 running, 3 zombie, 0 stopped
> CPU states:  91.8% user   8.1% system   0.0% nice   0.0% iowait   0.0% idle
> Mem:   384580k av,  316328k used,   68252k free,       0k shrd,   25424k
> buff
>                     253976k actv,   36916k in_d,    4704k in_c
> Swap:  265064k av,   64788k used,  200276k free                   71132k
> cached

If I'm decoding this through the email line wrapping you've got 71MB of disk
cached in RAM and 25MB used as i/o buffers (which is a good thing). I'm a
little puzzled as to why your swap is so high, but as long as it isn't
growing rapidly there shouldn't be a problem.

>   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
>  4914 apache    16   0  9016 8416  2552 S     6.7  2.1   0:00   0 httpd
>  4832 apache    16   0  9016 8416  2552 S     6.3  2.1   0:01   0 httpd

OK - so each httpd process is using about 8500KB with 2500KB shared = 6MB
each.

> 4947 postgres  25   0  2540 2392  1688 S     0.7  0.6   0:00   0
> postmaster
> 4952 postgres  25   0  2812 2664  1872 R     0.7  0.6   0:00   0
> postmaster
> 4954 postgres  24   0  1916 1768  1244 R     0.1  0.4   0:00   0 postmaster
> 4959 postgres  25   0  1596 1448   940 S     0.1  0.3   0:00   0 postmaster
> 4961 postgres  25   0   984  824   640 R     0.1  0.2   0:00   0 postmaster

Here we've got what appears to be about 1MB per PG backend. Since you've only
got 8, that should be fine.

[I noticed you were running X and mozilla (on top of gnome?). If you don't
need them, close them - that can free up to 64MB, depending on what you're
running]

I don't think you've got a RAM shortage, I think your settings are too small
and you're running out of I/O bandwidth. If you run "vmstat 5", what does it
show - you're interested in the i/o section: blocks in (bi) and out (bo) as
well as memory.
Save your readings, then follow the advice in that GeneralBits link I posted
earlier and see if that makes a difference.

--
  Richard Huxton
  Archonet Ltd

Re: postgresql + apache under heavy load

From

"scott.marlowe"

Date:

21 January 2004, 16:56:39

On Wed, 21 Jan 2004, Alex Madon wrote:

> >You'll want to use the "top" command to show the amount of memory each process
> >
>
> A typical output (in a concurrency of 20, no cache) is:
>  ps aux | grep postgres
> postgres  2332  0.0  0.0  8804  328 ?        R    18:54   0:01
> /usr/bin/postmaster -p 5432 -d2
> postgres  2334  0.0  0.0  9792   68 ?        S    18:54   0:00 postgres:
> stats buffer process
> postgres  2335  0.0  0.0  8828  200 ?        S    18:54   0:00 postgres:
> stats collector process
> postgres  4386  0.0  0.2  4312  956 pts/3    S    19:22   0:00 -bash
> postgres  4871  0.0  0.5  9480 2304 ?        S    20:36   0:00 postgres:
> user db [local] SELECT
> postgres  4873  0.0  0.2  8816 1032 ?        R    20:36   0:00 postgres:
> user db [local] startup
> myuser    4877  0.0  0.1  3572  624 pts/4    S    20:36   0:00 grep postgres
> postgres  4878  0.0  0.5  9220 2228 ?        R    20:36   0:00 postgres:
> user db [local] SELECT
> postgres  4879  0.0  0.5  9204 2016 ?        R    20:36   0:00 postgres:
> user db [local] SELECT
> ---------------------------top-----------------------------
> 114 processes: 99 sleeping, 12 running, 3 zombie, 0 stopped
> CPU states:  91.8% user   8.1% system   0.0% nice   0.0% iowait   0.0% idle
> Mem:   384580k av,  316328k used,   68252k free,       0k shrd,   25424k
> buff
>                     253976k actv,   36916k in_d,    4704k in_c
> Swap:  265064k av,   64788k used,  200276k free                   71132k
> cached
>
>   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
>  4914 apache    16   0  9016 8416  2552 S     6.7  2.1   0:00   0 httpd
>  4832 apache    16   0  9016 8416  2552 S     6.3  2.1   0:01   0 httpd
>  4915 apache    16   0  9016 8416  2552 S     5.9  2.1   0:00   0 httpd
>  4917 apache    16   0  9016 8416  2552 S     5.9  2.1   0:00   0 httpd
>  4919 apache    16   0  9020 8420  2536 S     5.9  2.1   0:00   0 httpd
>  4774 apache    16   0  9016 8416  2552 S     5.7  2.1   0:02   0 httpd
>  4896 apache    16   0  9060 8460  2568 S     5.7  2.1   0:00   0 httpd
>  4908 apache    15   0  9016 8416  2552 S     5.7  2.1   0:00   0 httpd
>  4909 apache    16   0  9016 8416  2552 S     5.7  2.1   0:00   0 httpd
>  4658 apache    16   0  9136 8536  2568 S     5.5  2.2   0:04   0 httpd
>  4921 apache    16   0  9016 8416  2552 S     5.5  2.1   0:00   0 httpd
>  2581 root      16   0 14492 4544  1252 R     5.3  1.1   2:26   0 X
>  4795 apache    16   0  9104 8504  2568 S     5.3  2.2   0:02   0 httpd
>  4796 apache    16   0  9080 8480  2568 S     5.3  2.2   0:01   0 httpd
>  4782 apache    16   0  8924 8324  2568 R     3.5  2.1   0:02   0 httpd
>  2612 madona    15   0  4524 4136  2380 S     1.5  1.0   0:18   0 metacity
>  4656 apache    15   0  9084 8484  2568 S     1.3  2.2   0:03   0 httpd
>  4950 postgres  25   0     0    0     0 Z     1.1  0.0   0:00   0
> postmaster <defunct>
>  3812 madona    15   0 44728  42M 17460 S     0.7 11.2   3:21   0
> mozilla-bin
>  4947 postgres  25   0  2540 2392  1688 S     0.7  0.6   0:00   0 postmaster
>  4952 postgres  25   0  2812 2664  1872 R     0.7  0.6   0:00   0 postmaster
>  4610 madona    15   0  7460 7460  2152 R     0.5  1.9   0:00   0 xterm
>  4904 madona    15   0  1108 1108   856 R     0.3  0.2   0:00   0 top
>  4954 postgres  24   0  1916 1768  1244 R     0.1  0.4   0:00   0 postmaster
>  4959 postgres  25   0  1596 1448   940 S     0.1  0.3   0:00   0 postmaster
>  4961 postgres  25   0   984  824   640 R     0.1  0.2   0:00   0 postmaster
>     1 root      15   0    88   60    40 S     0.0  0.0   0:04   0 init

OK, the memory usage is NOT your problem.  You have crashing postgresql
backends, see the <defunct> and the listing of 3 zombie processes in top?
That's bad.  You very likely have bad memory in your box, or possibly
other hardware problems.  It could be software, if you've got a really odd
configuration (i.e. compiling apache against one ldap sdk, and php against
another can make apache start failing, something similar may be happening
in postgresql, but I doubt it.)

You should go to www.memtest86.com and download their free tester and see
if your machine has any bad ram.

The reason the machine will get really slow when this happens is that the
postgresql database has to reset all the cache on all backends when one
crashes to prevent corruption.

The fact that X goes unresponsive is simply a load issue.  Have a little
more patience.  Even with bad memory under it, linux / unix will usually
come back from the brink once the load goes away.

You need to make sure logging is enabled on postgresql and then you can
look at the logs for clues as to why the backends are crashing.

Re: postgresql + apache under heavy load

From

"scott.marlowe"

Date:

21 January 2004, 17:04:33

On Wed, 21 Jan 2004, Richard Huxton wrote:

> [I noticed you were running X and mozilla (on top of gnome?). If you don't
> need them, close them - that can free up to 64MB, depending on what you're
> running]

Note that if you've got a 64 Meg memory video card, and X uses 4 megs of
system ram, it will show, in linux, as using 68 Megs of ram.  Turning off
X, however, will only free up the 4 megs of system ram, not the 64 on the
video card.  so, it's not uncommon for folks to see a modern video card in
their system, look at top, and go "geeze, X is a total memory hog" when in
fact it's just that linux / top count the video memory as part of the
memory allocated to X.

That said, I'd at least run a less heavyweight window manager like
afterstep or twm, not gnome or kde on a box like that one.

> I don't think you've got a RAM shortage, I think your settings are too small
> and you're running out of I/O bandwidth. If you run "vmstat 5", what does it
> show - you're interested in the i/o section: blocks in (bi) and out (bo) as
> well as memory.

Also, look at si/so (swap in and out) to monitor swap activity.  If those
numbers are low, then swapping isn't an issue...

Re: postgresql + apache under heavy load

From

Tom Lane

Date:

21 January 2004, 17:35:40

"scott.marlowe" <scott.marlowe@ihs.com> writes:
> OK, the memory usage is NOT your problem.  You have crashing postgresql
> backends, see the <defunct> and the listing of 3 zombie processes in top?
> That's bad.  You very likely have bad memory in your box, or possibly
> other hardware problems.

While that could be true, the presence of a few zombie processes hardly
proves it.  What that says to me is that the box is so loaded that the
postmaster isn't able to reap dead children instantaneously --- that is,
what we see is top running between the time that a backend quits and the
time the postmaster next gets to run.

While we don't have a lot of data for determining why the overload,
the top output:

>> CPU states:  91.8% user   8.1% system   0.0% nice   0.0% iowait   0.0% idle

sure looks like the bottleneck is CPU cycles, not disk (and not RAM
either, since RAM shortage would lead to swapping and hence disk waits).
Furthermore, the listing looks like it is the httpd processes that are
sucking CPU, not Postgres.  I think this is actually not Postgres'
problem at all, but some inefficiency in the site's Web code.

            regards, tom lane

Re: postgresql + apache under heavy load

From

Alex Madon

Date:

21 January 2004, 18:14:27

Well the top issued was corresponding to a concurrency of 20 (so a level
where swap is not sollicited).
With a concurrency of 60, swap is very high

Here is a "movie" of how swap is maanged (a snapshot every 5 sec)
Swap:  265064k av,  110592k used,  154472k free                   48384k
cached
Swap:  265064k av,  110592k used,  154472k free                   48404k
cached
Swap:  265064k av,  107064k used,  158000k free                   48452k
cached
Swap:  265064k av,  104816k used,  160248k free                   48484k
cached
Swap:  265064k av,  246496k used,   18568k free                   31044k
cached
Swap:  265064k av,  190364k used,   74700k free                   31356k
cached
Swap:  265064k av,  186512k used,   78552k free                   31480k
cached
Swap:  265064k av,  181592k used,   83472k free                   31588k
cached
Swap:  265064k av,  173924k used,   91140k free                   31692k
cached
Swap:  265064k av,  168384k used,   96680k free                   31864k
cached
Swap:  265064k av,  165444k used,   99620k free                   31872k
cached
Swap:  265064k av,  163084k used,  101980k free                   32004k
cached
Swap:  265064k av,  160592k used,  104472k free                   32124k
cached
Swap:  265064k av,  158116k used,  106948k free                   32136k
cached
Swap:  265064k av,  155692k used,  109372k free                   32212k
cached
Swap:  265064k av,  153008k used,  112056k free                   32252k
cached


and the parallel with Mem
Mem:   384580k av,  202952k used,  181628k free,       0k shrd,   20876k
buff
Swap:  265064k av,  110592k used,  154472k free                   48384k
cached
Mem:   384580k av,  219072k used,  165508k free,       0k shrd,   20880k
buff
Swap:  265064k av,  110592k used,  154472k free                   48404k
cached
Mem:   384580k av,  284076k used,  100504k free,       0k shrd,   20888k
buff
Swap:  265064k av,  107064k used,  158000k free                   48452k
cached
Mem:   384580k av,  353520k used,   31060k free,       0k shrd,   20892k
buff
Swap:  265064k av,  104816k used,  160248k free                   48484k
cached
Mem:   384580k av,  380108k used,    4472k free,       0k shrd,    6352k
buff
Swap:  265064k av,  246496k used,   18568k free                   31044k
cached
Mem:   384580k av,  210008k used,  174572k free,       0k shrd,    6372k
buff
Swap:  265064k av,  190364k used,   74700k free                   31356k
cached
Mem:   384580k av,  206024k used,  178556k free,       0k shrd,    6380k
buff
Swap:  265064k av,  186512k used,   78552k free                   31480k
cached
Mem:   384580k av,  210008k used,  174572k free,       0k shrd,    6396k
buff
Swap:  265064k av,  181592k used,   83472k free                   31588k
cached
Mem:   384580k av,  201612k used,  182968k free,       0k shrd,    6408k
buff
Swap:  265064k av,  173924k used,   91140k free                   31692k
cached
Mem:   384580k av,  192464k used,  192116k free,       0k shrd,    6420k
buff
Swap:  265064k av,  168384k used,   96680k free                   31864k
cached
Mem:   384580k av,  190504k used,  194076k free,       0k shrd,    6428k
buff
Swap:  265064k av,  165444k used,   99620k free                   31872k
cached
Mem:   384580k av,  190956k used,  193624k free,       0k shrd,    6444k
buff
Swap:  265064k av,  163084k used,  101980k free                   32004k
cached
Mem:   384580k av,  193244k used,  191336k free,       0k shrd,    6452k
buff
Swap:  265064k av,  160592k used,  104472k free                   32124k
cached
Mem:   384580k av,  195924k used,  188656k free,       0k shrd,    6460k
buff
Swap:  265064k av,  158116k used,  106948k free                   32136k
cached
Mem:   384580k av,  198428k used,  186152k free,       0k shrd,    6468k
buff
Swap:  265064k av,  155692k used,  109372k free                   32212k
cached


One can see that at the maximum feeling of swap (74700k free swap), the
full picture is:


 22:51:54  up  3:58,  6 users,  load average: 47.38, 18.53, 7.79
131 processes: 130 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:   5.3% user   3.0% system   0.0% nice   0.0% iowait  91.6% idle
Mem:   384580k av,  210008k used,  174572k free,       0k shrd,    6372k
buff
                    158748k actv,   14556k in_d,    1412k in_c
Swap:  265064k av,  190364k used,   74700k free                   31356k
cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
    7 root      16   0     0    0     0 SW    1.2  0.0   0:07   0
kscand/Normal
    5 root      15   0     0    0     0 SW    1.0  0.0   0:01   0 kswapd
 7050 apache    15   0  8016 5896  1580 D     1.0  1.5   0:00   0 httpd
 3870 madona    15   0  6540 1440   472 D     0.6  0.3   0:07   0 xterm
 7032 apache    15   0  8336 7568  1980 S     0.6  1.9   0:00   0 httpd
 7051 apache    15   0  4784 1640   280 D     0.6  0.4   0:00   0 httpd
 2581 root      15   0 15928 1452   704 S     0.5  0.3   5:40   0 X
 6985 madona    16   0   788  732   476 R     0.5  0.1   0:00   0 top
 7023 apache    15   0  7956 7160  1736 S     0.4  1.8   0:00   0 httpd
 7025 apache    15   0  7944 6816  1584 D     0.4  1.7   0:00   0 httpd
 7027 apache    15   0  7808 6976  1588 D     0.4  1.8   0:00   0 httpd
 7052 apache    15   0  6616 3584   404 D     0.3  0.9   0:00   0 httpd
 7142 postgres  15   0  3420 3268  2460 S     0.3  0.8   0:00   0 postmaster
 7020 apache    15   0  8092 5880  1980 D     0.2  1.5   0:00   0 httpd
 7034 apache    15   0  8380 7660  2036 S     0.2  1.9   0:00   0 httpd
 2626 madona    15   0  1316  396   292 S     0.1  0.1   0:00   0 magicdev
 7001 apache    15   0  8016 5892  1580 D     0.1  1.5   0:00   0 httpd
 7008 apache    15   0  8016 5784  1580 D     0.1  1.5   0:00   0 httpd
 7024 apache    15   0  7944 6664  1584 D     0.1  1.7   0:00   0 httpd
 7054 apache    15   0  8016 5636  1580 D     0.1  1.4   0:00   0 httpd
    1 root      15   0    88   60    40 S     0.0  0.0   0:04   0 init
    2 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 keventd
    3 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kapmd
    4 root      34  19     0    0     0 SWN   0.0  0.0   0:00   0
ksoftirqd_CPU0
    9 root      25   0     0    0     0 SW    0.0  0.0   0:00   0 bdflush
    6 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kscand/DMA
    8 root      15   0     0    0     0 SW    0.0  0.0   0:00   0
kscand/HighMem
   10 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kupdated
   11 root      25   0     0    0     0 SW    0.0  0.0   0:00   0
mdrecoveryd
   15 root      15   0     0    0     0 SW    0.0  0.0   0:04   0 kjournald
   72 root      25   0     0    0     0 SW    0.0  0.0   0:00   0 khubd
 2066 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 eth1
 2117 root      15   0   112   60    36 S     0.0  0.0   0:00   0 syslogd
 2121 root      15   0    52    0     0 SW    0.0  0.0   0:00   0 klogd
 2139 rpc       15   0    72    0     0 SW    0.0  0.0   0:00   0 portmap
 2158 rpcuser   25   0    84    0     0 SW    0.0  0.0   0:00   0 rpc.statd
 2208 root      25   0    52    4     0 S     0.0  0.0   0:00   0 apmd
 2246 root      25   0   240    4     0 S     0.0  0.0   0:00   0 sshd
 2260 root      15   0   120    4     0 S     0.0  0.0   0:00   0 xinetd
 2269 root      15   0    88   56    36 S     0.0  0.0   0:00   0 gpm
 2279 root      15   0  1436  216   152 S     0.0  0.0   0:00   0 httpd
 2348 root      15   0   124  108    64 S     0.0  0.0   0:00   0 crond
 2359 root      15   0   852  496   228 S     0.0  0.1   0:00   0 cupsd
 2419 xfs       15   0  2696  188   188 S     0.0  0.0   0:32   0 xfs
 2428 root      24   0   616    4     0 S     0.0  0.0   0:00   0 smbd
 2432 root      15   0   824  356   260 S     0.0  0.0   0:00   0 nmbd
 2450 daemon    15   0    64   12    12 S     0.0  0.0   0:00   0 atd
 2458 root      16   0   224    0     0 SW    0.0  0.0   0:00   0 login
 2459 root      22   0    48    4     0 S     0.0  0.0   0:00   0 mingetty
 2460 root      22   0    52    4     0 S     0.0  0.0   0:00   0 mingetty
 2461 root      22   0    52    4     0 S     0.0  0.0   0:00   0 mingetty
 2462 root      22   0    52    4     0 S     0.0  0.0   0:00   0 mingetty
 2463 root      22   0    52    4     0 S     0.0  0.0   0:00   0 mingetty
 2464 madona    16   0   356    0     0 SW    0.0  0.0   0:00   0 bash
 2569 madona    21   0   132    4     0 S     0.0  0.0   0:00   0 startx
 2580 madona    20   0    72    0     0 SW    0.0  0.0   0:00   0 xinit
 2590 madona    15   0  3412  240   236 S     0.0  0.0   0:01   0
gnome-session
 2592 madona    15   0  5452  424   160 S     0.0  0.1   0:00   0 gconfd-2
 2604 madona    15   0   236   72    40 S     0.0  0.0   0:00   0 ssh-agent
 2608 madona    17   0   488    4     0 S     0.0  0.0   0:00   0
bonobo-activati
 2610 madona    15   0   384  204   124 S     0.0  0.0   0:00   0 fam
 2612 madona    16   0  4032 1916  1912 S     0.0  0.4   0:38   0 metacity
 2614 madona    15   0  1620  268   264 S     0.0  0.0   0:00   0
gnome-settings-
 2623 madona    15   0   720  396   288 S     0.0  0.1   0:01   0
xscreensaver
 2633 madona    15   0  1276  688   528 S     0.0  0.1   0:00   0
pam-panel-icon
 2634 root      15   0   152  136    88 S     0.0  0.0   0:00   0
pam_timestamp_c
 2655 madona    15   0  5284 2428  1916 D     0.0  0.6   0:15   0
gnome-panel
 2657 madona    15   0  2556 1176   804 S     0.0  0.3   0:01   0
multiload-apple
 2659 madona    15   0  1808  452   448 S     0.0  0.1   0:00   0
notification-ar
 2662 madona    15   0  5444  164   160 S     0.0  0.0   0:01   0 xterm
 2664 madona    15   0   360    4     0 S     0.0  0.0   0:00   0 bash
 3015 madona    15   0  9100  248   244 S     0.0  0.0   0:03   0 xterm
 3017 madona    15   0   384   28    24 S     0.0  0.0   0:00   0 bash
 3812 madona    15   0 35292 4908  4568 S     0.0  1.2   8:02   0
mozilla-bin
 3872 madona    22   0   352    4     0 S     0.0  0.0   0:00   0 bash
 3902 root      22   0   168    0     0 SW    0.0  0.0   0:00   0 su
 3905 root      15   0   392   32    28 S     0.0  0.0   0:00   0 bash
 4306 madona    15   0  5604  252   248 S     0.0  0.0   0:01   0 xterm
 4308 madona    22   0   352    4     0 S     0.0  0.0   0:00   0 bash
 4338 root      21   0   168    0     0 SW    0.0  0.0   0:00   0 su
 4341 root      22   0   360    4     0 S     0.0  0.0   0:00   0 bash
 4385 root      24   0   172    0     0 SW    0.0  0.0   0:00   0 su
 4386 postgres  15   0   312   28    24 S     0.0  0.0   0:00   0 bash
 4610 madona    15   0  5728  384   380 S     0.0  0.0   0:00   0 xterm
 4612 madona    15   0   384   28    24 S     0.0  0.0   0:00   0 bash
 6636 apache    15   0  6868  840   404 D     0.0  0.2   0:03   0 httpd
 6820 postgres  15   0   808  560   468 S     0.0  0.1   0:00   0 postmaster
 6821 postgres  15   0  1296  496   276 S     0.0  0.1   0:00   0 postmaster
 6822 postgres  15   0   744  460   372 S     0.0  0.1   0:00   0 postmaster
 6828 root      15   0   140  100    64 S     0.0  0.0   0:00   0 tail
 6854 apache    15   0  6888  652   424 S     0.0  0.1   0:01   0 httpd
 6867 apache    15   0  6816  740   352 D     0.0  0.1   0:01   0 httpd
 6869 apache    15   0  6960 1084   640 D     0.0  0.2   0:00   0 httpd
 6875 apache    15   0  6856  780   392 D     0.0  0.2   0:01   0 httpd
 6882 apache    15   0  6700  436   236 S     0.0  0.1   0:01   0 httpd
 6890 apache    15   0  7020 1184   700 D     0.0  0.3   0:01   0 httpd
 6896 apache    15   0  6744 1084   424 D     0.0  0.2   0:01   0 httpd
 6986 madona    15   0   816  592   180 S     0.0  0.1   0:00   0 ab
 6993 apache    15   0  7084 1404   860 D     0.0  0.3   0:00   0 httpd
 7002 apache    15   0  8016 2576  1580 D     0.0  0.6   0:00   0 httpd
 7004 apache    15   0  8016 3104  1580 D     0.0  0.8   0:00   0 httpd
 7007 postgres  15   0  3000 2576  2048 D     0.0  0.6   0:00   0 postmaster
 7009 apache    15   0  8016 2956  1580 D     0.0  0.7   0:00   0 httpd
 7011 apache    16   0  7956 7164  1736 S     0.0  1.8   0:00   0 httpd
 7012 apache    15   0  8024 5364  1588 D     0.0  1.3   0:00   0 httpd
 7021 apache    16   0  8376 7676  2032 S     0.0  1.9   0:00   0 httpd
 7022 apache    15   0  8016 5140  1580 D     0.0  1.3   0:00   0 httpd
 7029 apache    15   0  8336 7584  1980 S     0.0  1.9   0:00   0 httpd
 7036 apache    15   0  8380 7632  2036 S     0.0  1.9   0:00   0 httpd
 7053 apache    15   0  8016 4420  1580 D     0.0  1.1   0:00   0 httpd
 7056 apache    15   0  8148 6292  1712 D     0.0  1.6   0:00   0 httpd
 7059 apache    15   0  8240 7192  2020 S     0.0  1.8   0:00   0 httpd
 7061 apache    15   0  8236 6436  1800 D     0.0  1.6   0:00   0 httpd
 7062 apache    15   0  8240 7604  2020 S     0.0  1.9   0:00   0 httpd
 7066 apache    16   0  8240 7544  2020 S     0.0  1.9   0:00   0 httpd
 7068 postgres  15   0  3216 2580  2272 D     0.0  0.6   0:00   0 postmaster
 7069 postgres  15   0  2908 2108  1960 S     0.0  0.5   0:00   0 postmaster
 7070 postgres  15   0  2436 2104  1740 S     0.0  0.5   0:00   0 postmaster
 7071 postgres  15   0  1884 1212  1188 S     0.0  0.3   0:00   0 postmaster
 7079 postgres  15   0  3396 3244  2448 S     0.0  0.8   0:00   0 postmaster
 7082 postgres  17   0  3412 3260  2460 S     0.0  0.8   0:00   0 postmaster
 7086 postgres  16   0  3412 3232  2460 S     0.0  0.8   0:00   0 postmaster
 7087 postgres  15   0  3424 3160  2468 S     0.0  0.8   0:00   0 postmaster
 7091 postgres  15   0  3420 3264  2464 S     0.0  0.8   0:00   0 postmaster
 7096 postgres  17   0  3412 3140  2456 S     0.0  0.8   0:00   0 postmaster
 7097 postgres  15   0  3436 3196  2476 S     0.0  0.8   0:00   0 postmaster
 7102 postgres  15   0  3436 3220  2480 S     0.0  0.8   0:00   0 postmaster
 7112 postgres  15   0  3432 3280  2472 S     0.0  0.8   0:00   0 postmaster
 7123 postgres  15   0  3440 3288  2476 S     0.0  0.8   0:00   0 postmaster
 7124 postgres  15   0  3428 3276  2468 S     0.0  0.8   0:00   0 postmaster
 7125 postgres  15   0  3424 3272  2464 S     0.0  0.8   0:00   0 postmaster



After that swap and ram usage decrease.... and the ab ends up with a
time out.
For the gnome and X, this is just my development machine. On the web
server itself, no X of course.

I had also a closer look at postgresql logs and make logs more verbose.
I saw a query taking 16 sec (it take normally few millisec):
[7104]   LOG:  connection received: host=[local]
[7104]   LOG:  connection authorized: user=myuser database=mydb
[7104]   LOG:  query: begin; select getdatabaseencoding(); commit
[7104]   LOG:  duration: 0.002152 sec
[7104]   LOG:  query: select ---- myselect instruction -----
[7104]   LOG:  duration: 16.561325 sec
[7104]   LOG:  query: select oid,typname from pg_type
[7104]   LOG:  duration: 15.498828 sec
[6820]   DEBUG:  child process (pid 7104) exited with exit code 0


Thanks everybody,
Alex


Tom Lane wrote:

>"scott.marlowe" <scott.marlowe@ihs.com> writes:
>
>
>>OK, the memory usage is NOT your problem.  You have crashing postgresql
>>backends, see the <defunct> and the listing of 3 zombie processes in top?
>>That's bad.  You very likely have bad memory in your box, or possibly
>>other hardware problems.
>>
>>
>
>While that could be true, the presence of a few zombie processes hardly
>proves it.  What that says to me is that the box is so loaded that the
>postmaster isn't able to reap dead children instantaneously --- that is,
>what we see is top running between the time that a backend quits and the
>time the postmaster next gets to run.
>
>While we don't have a lot of data for determining why the overload,
>the top output:
>
>
>
>>>CPU states:  91.8% user   8.1% system   0.0% nice   0.0% iowait   0.0% idle
>>>
>>>
>
>sure looks like the bottleneck is CPU cycles, not disk (and not RAM
>either, since RAM shortage would lead to swapping and hence disk waits).
>Furthermore, the listing looks like it is the httpd processes that are
>sucking CPU, not Postgres.  I think this is actually not Postgres'
>problem at all, but some inefficiency in the site's Web code.
>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>
>

Re: postgresql + apache under heavy load

From

"scott.marlowe"

Date:

21 January 2004, 19:52:34

On Wed, 21 Jan 2004, Alex Madon wrote:

> One can see that at the maximum feeling of swap (74700k free swap), the
> full picture is:
>
>
>  22:51:54  up  3:58,  6 users,  load average: 47.38, 18.53, 7.79
> 131 processes: 130 sleeping, 1 running, 0 zombie, 0 stopped
> CPU states:   5.3% user   3.0% system   0.0% nice   0.0% iowait  91.6% idle
> Mem:   384580k av,  210008k used,  174572k free,       0k shrd,    6372k
> buff
>                     158748k actv,   14556k in_d,    1412k in_c
> Swap:  265064k av,  190364k used,   74700k free                   31356k
> cached
>
>   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
>     7 root      16   0     0    0     0 SW    1.2  0.0   0:07   0
> kscand/Normal
>     5 root      15   0     0    0     0 SW    1.0  0.0   0:01   0 kswapd
>  7050 apache    15   0  8016 5896  1580 D     1.0  1.5   0:00   0 httpd
>  3870 madona    15   0  6540 1440   472 D     0.6  0.3   0:07   0 xterm
>  7032 apache    15   0  8336 7568  1980 S     0.6  1.9   0:00   0 httpd
>  7051 apache    15   0  4784 1640   280 D     0.6  0.4   0:00   0 httpd
>  2581 root      15   0 15928 1452   704 S     0.5  0.3   5:40   0 X
>  6985 madona    16   0   788  732   476 R     0.5  0.1   0:00   0 top
>  7023 apache    15   0  7956 7160  1736 S     0.4  1.8   0:00   0 httpd
>  7025 apache    15   0  7944 6816  1584 D     0.4  1.7   0:00   0 httpd
>  7027 apache    15   0  7808 6976  1588 D     0.4  1.8   0:00   0 httpd
>  7052 apache    15   0  6616 3584   404 D     0.3  0.9   0:00   0 httpd

this is really strange.  You've got 170Megs free memory, yet are going
into a swapstorm.  I had this problem with older kernels under rh7.2
(2.4.7 and 2.4.9) when accessing really large files but they went away
with the latest one I'm running, which is 2.4.20.

Are any of the underlying tables really large and maybe being seq scanned?
I'm strictly guessing here.  anyone else have any ideas?  I'm stumped.

Re: postgresql + apache under heavy load

From

Alvaro Herrera

Date:

21 January 2004, 20:14:22

On Wed, Jan 21, 2004 at 04:47:57PM -0700, scott.marlowe wrote:
> On Wed, 21 Jan 2004, Alex Madon wrote:
>
> >   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
> >     7 root      16   0     0    0     0 SW    1.2  0.0   0:07   0
> > kscand/Normal
> >     5 root      15   0     0    0     0 SW    1.0  0.0   0:01   0 kswapd
> >  7050 apache    15   0  8016 5896  1580 D     1.0  1.5   0:00   0 httpd
> >  3870 madona    15   0  6540 1440   472 D     0.6  0.3   0:07   0 xterm
> >  7032 apache    15   0  8336 7568  1980 S     0.6  1.9   0:00   0 httpd
> >  7051 apache    15   0  4784 1640   280 D     0.6  0.4   0:00   0 httpd
> >  2581 root      15   0 15928 1452   704 S     0.5  0.3   5:40   0 X
> >  6985 madona    16   0   788  732   476 R     0.5  0.1   0:00   0 top
> >  7023 apache    15   0  7956 7160  1736 S     0.4  1.8   0:00   0 httpd
> >  7025 apache    15   0  7944 6816  1584 D     0.4  1.7   0:00   0 httpd
> >  7027 apache    15   0  7808 6976  1588 D     0.4  1.8   0:00   0 httpd
> >  7052 apache    15   0  6616 3584   404 D     0.3  0.9   0:00   0 httpd
>
> Are any of the underlying tables really large and maybe being seq scanned?
> I'm strictly guessing here.  anyone else have any ideas?  I'm stumped.

'D' processes are probably I/O bound, but why would webserver processes
be so busy reading from disk?  'postmaster' processes had 'S' status
AFAIR, so the load is probably elsewhere, not Postgres.

My guess would be that the Apache processes are doing something
completely unexpected.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Thou shalt check the array bounds of all strings (indeed, all arrays), for
surely where thou typest "foo" someone someday shall type
"supercalifragilisticexpialidocious" (5th Commandment for C programmers)

Re: postgresql + apache under heavy load

From

Richard Huxton

Date:

21 January 2004, 21:54:36

On Wednesday 21 January 2004 22:20, Alex Madon wrote:
> Well the top issued was corresponding to a concurrency of 20 (so a level
> where swap is not sollicited).
> With a concurrency of 60, swap is very high
>
> Here is a "movie" of how swap is maanged (a snapshot every 5 sec)
[used sits around 110MB, spikes to 240MB then gradually drops back to 153MB
just under a minute later]

> and the parallel with Mem
[buffer memory plumets from 20MB down to 6MB, then stabilises around 6.5MB]

OK - I'm guessing the two spikes in the figures are as the ab requests hit,
spawning http daemons and postgres backends.

> One can see that at the maximum feeling of swap (74700k free swap), the
> full picture is:
>
>
>  22:51:54  up  3:58,  6 users,  load average: 47.38, 18.53, 7.79
> 131 processes: 130 sleeping, 1 running, 0 zombie, 0 stopped
> CPU states:   5.3% user   3.0% system   0.0% nice   0.0% iowait  91.6% idle

Although Tom pointed to the high CPU in a previous post, here you're not
suffering on CPU. Run a "vmstat 5" while this is happening - that'll show us
the details.

--
  Richard Huxton
  Archonet Ltd