Thread: Troubles with performances

Troubles with performances

From
Guillaume Lémery
Date:
Hi,


I use PostGreSQL with a Web server which receive 200 HTTP simultaneous
queries.
For each HTTP query, I have about 5 SELECT queries and 3 UPDATE ones.

Queries have been optimized, I have INDEX on my tables...
The biggest table has more than 500 000 records.

And I have big troubles, PostGreSQL doesn't seem to be able to handle so
many queries at the same time.

The server is a NetFinity bi-proc 400 MHz with 1Gb RAM...

What can I do ??

Thanx in advance

Guillaume


Re: Troubles with performances

From
Guillaume Lémery
Date:
>> I use PostGreSQL with a Web server which receive 200 HTTP
>> simultaneous queries.
>
>
> With what opts do you start postmaster?
-F to disable fsync()
I don't know if I have to use the -B and -S options to change the memory
use...

>> For each HTTP query, I have about 5 SELECT queries and 3 UPDATE ones.
> Do you use BEGIN/END to run queries in one transaction, when appropriate?
Yep...
In fact, I have 3 SELECT and the 3 UPDATE in a PL/PGSQL Function...



RE: Troubles with performances

From
"Mikheev, Vadim"
Date:
> I use PostGreSQL with a Web server which receive 200 HTTP
> simultaneous queries.

With what opts do you start postmaster?

> For each HTTP query, I have about 5 SELECT queries and 3 UPDATE ones.

Do you use BEGIN/END to run queries in one transaction, when appropriate?

Vadim

RE: Troubles with performances

From
"Mikheev, Vadim"
Date:
> >> I use PostGreSQL with a Web server which receive 200 HTTP
> >> simultaneous queries.
> >
> > With what opts do you start postmaster?
> -F to disable fsync()
> I don't know if I have to use the -B and -S options to change
> the memory use...

With 200 simult connections I would recomment to use
-N 256 -B 16384

Vadim

RE: Troubles with performances

From
"Mikheev, Vadim"
Date:
> I checked the Linux FAQ and the only thing related to this
> issue says that I don't have shared memory turned on,
> which I'm pretty sure I do, considering it's running..  ;)

FAQ is wrong - you don't have *enough* of shared memory -:)
I don't know how to increase shmem segment size in Linux.

Vadim

RE: Troubles with performances

From
Warren Vanichuk
Date:
> > >> I use PostGreSQL with a Web server which receive 200 HTTP
> > >> simultaneous queries.
> > >
> > > With what opts do you start postmaster?
> > -F to disable fsync()
> > I don't know if I have to use the -B and -S options to change
> > the memory use...
>
> With 200 simult connections I would recomment to use
> -N 256 -B 16384

As we are about the same load, and I am always interested in more
performance, I increased the value of my -B to what you suggested, but I
generate the following error :

IpcMemoryCreate: shmget failed (Invalid argument) key=5432001, size=143193088, permission=600
This type of error is usually caused by an improper
shared memory or System V IPC semaphore configuration.
For more information, see the FAQ and platform-specific
FAQ's in the source directory pgsql/doc or on our
web site at http://www.postgresql.org.
FATAL 1:  ShmemCreate: cannot create region

I checked the Linux FAQ and the only thing related to this issue says that I
don't have shared memory turned on, which I'm pretty sure I do, considering
it's running..  ;)

This is a PostgreSQL 7.0.3 installed on a Debian (Stable) box, running Linux
2.2.18.  The machine has 1GB of memory, and is compiled for 32,768 file
descriptors.  The startup options I am using for PostgreSQL are :

-o -F -B 2048 -N 512 -S 4096

Opinions or suggestions? :)

Sincerely, Warren


Re: Troubles with performances

From
Lamar Owen
Date:
Warren Vanichuk wrote:
> IpcMemoryCreate: shmget failed (Invalid argument) key=5432001, size=143193088, permission=600

> -o -F -B 2048 -N 512 -S 4096

Ok, you have 2048 8K buffers, and up to 512 backends.  If you're trying
to set SortSize to 4096, you need quotes around it and a -o, AFAIK, asx
the -S postmaster option sets silent mode.

Further along,

According to the shmget man page:

       The  followings  are  limits  on  shared  memory   segment
       resources affecting a shmget call:

       SHMALL     System  wide  maximum  of  shared memory pages:
                  policy dependent.

       SHMMAX     Maximum size in bytes for a shared memory  seg­
                  ment:  implementation dependent (currently 4M).

       SHMMIN     Minimum size in bytes for a shared memory  seg­
                  ment:  implementation  dependent  (currently  1
                  byte, though PAGE_SIZE is the effective minimum
                  size).

       SHMMNI     System  wide  maximum  number  of shared memory
                  segments: implementation  dependent  (currently
                  4096).

       The implementation has no specific limits for the per pro­
       cess maximum number of shared memory segments (SHMSEG).

And from the linux kernel source, includes/asm-i386/shmparam.h:
/*
 * Keep _SHM_ID_BITS as low as possible since SHMMNI depends on it and
 * there is a static array of size SHMMNI.
 */
#define _SHM_ID_BITS    9
#define SHM_ID_MASK     ((1<<_SHM_ID_BITS)-1)

#define SHM_IDX_SHIFT   (_SHM_ID_BITS)
#define _SHM_IDX_BITS   15
#define SHM_IDX_MASK    ((1<<_SHM_IDX_BITS)-1)

/*
 * _SHM_ID_BITS + _SHM_IDX_BITS must be <= 24 on the i386 and
 * SHMMAX <= (PAGE_SIZE << _SHM_IDX_BITS).
 */

#define SHMMAX 0x2000000                /* max shared seg size (bytes)
*/
/* Try not to change the default shipped SHMMAX - people rely on it */

-------

Hmmmm....

Looks pretty difficult to change.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11

Re: Troubles with performances

From
"Oliver Elphick"
Date:
"Mikheev, Vadim" wrote:
  >FAQ is wrong - you don't have *enough* of shared memory -:)
  >I don't know how to increase shmem segment size in Linux.

If you have  /proc/sys/kernel/shmmax, try (as root):

   cat new_bigger_value >  /proc/sys/kernel/shmmax

If it won't allow you to do that, you may have to recompile the kernel
instead.

--
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver
PGP: 1024R/32B8FAA1: 97 EA 1D 47 72 3F 28 47  6B 7E 39 CC 56 E4 C1 47
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
                 ========================================
     "For the eyes of the LORD run to and fro throughout the
      whole earth, to show himself strong in the behalf of
      them whose heart is perfect toward him..."
                                   II Chronicles 16:9



RE: Troubles with performances

From
KuroiNeko
Date:
> FAQ is wrong - you don't have *enough* of shared memory -:)
> I don't know how to increase shmem segment size in Linux.

 asm/shmparam.h Actually, not recommended.


--

 ������������������


Re: Troubles with performances

From
Jan Wieck
Date:
Lamar Owen wrote:
>
> /*
>  * _SHM_ID_BITS + _SHM_IDX_BITS must be <= 24 on the i386 and
>  * SHMMAX <= (PAGE_SIZE << _SHM_IDX_BITS).
>  */
>
> #define SHMMAX 0x2000000                /* max shared seg size (bytes)
> */
> /* Try not to change the default shipped SHMMAX - people rely on it */
>
> -------
>
> Hmmmm....
>
> Looks pretty difficult to change.

    echo 134217728 >/proc/sys/kernel/shmmax

    set's  the  maximum  allowed  segment!  size  to 128M. Shared
    memory under Linux can  be  swapped,  so  if  you  have  256M
    physical  and  run  3  postmasters, you can start all of them
    with 64M (or more) and while they're  idle  they  most  likey
    consume  some swap space. But when it poors, they'll grab the
    memory they need.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


Re: Troubles with performances

From
Lamar Owen
Date:
Jan Wieck wrote:
> Lamar Owen wrote:
> > Looks pretty difficult to change.

>     echo 134217728 >/proc/sys/kernel/shmmax

>     set's  the  maximum  allowed  segment!  size  to 128M. Shared
>     memory under Linux can  be  swapped,  so  if  you  have  256M

One of those well-documented /proc deals.

Might have guessed. :-) Everything else seems to work that way.... :-)
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11

Re: Troubles with performances

From
Martijn van Oosterhout
Date:
Lamar Owen wrote:
>
> Warren Vanichuk wrote:
> > IpcMemoryCreate: shmget failed (Invalid argument) key=5432001, size=143193088, permission=600
>
> > -o -F -B 2048 -N 512 -S 4096
>
> Ok, you have 2048 8K buffers, and up to 512 backends.  If you're trying
> to set SortSize to 4096, you need quotes around it and a -o, AFAIK, asx
> the -S postmaster option sets silent mode.
>
> Further along,
>
> According to the shmget man page:
>
[snip]

I think these are the ones you're looking for:

kleptog/~>grep . `find /proc/sys -name '*shm*'`
/proc/sys/kernel/shmall:4194304
/proc/sys/kernel/shmmax:33554432

Just echo new values into there to change them...
--
Martijn van Oosterhout <kleptog@cupid.suninternet.com>
http://cupid.suninternet.com/~kleptog/

Re: Troubles with performances

From
Guillaume Lémery
Date:
> A few quick thoughts:
>   200 simultaneous queries sounds like a lot
> to me, and suggests you might be CPU-bound.
Yes, in 'top' CPU usage is 99%...

> Here are the things you should try:
>
> 1) Put PostgreSQL on a separate server from your
>    Web server/application logic.  If you are CPU-bound,
>    then this can greatly improve your throughput.
I don't really think my Web Server uses a lot of CPU, because my
application is written in a module .so and it does only queries to my
database.

> 2) After doing #1, watch the CPU load on both database
>    and web server machines.  If your web server is the
>    bottleneck, try running multiple web servers.
I was wondering to have 2 web servers per database server, but if you
say that 200 simultaneous queries is a lot.. what about 400 ? :-(

> 3) After trying #1, if your database is the bottleneck
>    and your data will all fit into RAM, try redesigning
>    your web application to run entirely from memory.
>    With this design, you would only do UPDATEs on each
>    page; SELECTs would be done directly from data
>    you've stored in memory.  This dramatically reduces
>    database traffic.  However, it makes it hard to
>    run multiple web servers and also creates problems if
>    you have other applications updating the database.
I don't know how to put the data in memormy to increase performance...
What should I do ?

> This assumes, of course, that you've carefully studied
> and optimized the performance of your code.  In my experience,
> application performance is usually the bottleneck, not
> the database.  Each of the above strategies is appropriate
> in different circumstances.
Well, with all the performance problems, I tried to optimize my code
everywhere I could.
All I can do now is to change the way I pass the queries to the database...

Thank you for all the advices.

Guillaume


Re: Troubles with performances

From
Lincoln Yeoh
Date:
At 07:07 PM 1/18/01 +0100, Guillaume Lémery wrote:
>I use PostGreSQL with a Web server which receive 200 HTTP simultaneous
>queries.
>For each HTTP query, I have about 5 SELECT queries and 3 UPDATE ones.

Just a shot in the dark:
Are you opening and closing a database connection for each query? If you
are, I suggest you don't, and instead use persistent database connections,
or some form of connection pooling.

How many http connections per second are you getting?

If it's not many connections per second, but they are taking a long time to
complete, there might be ways of reducing the number of simultaneous queries.

For example you could use buffering aka "http accelerator"- e.g. put a
webcache in front of your webserver. The idea is so that your app (and
database) can just spit out the results to the webcache at 100Mbps and not
wait for the remote client which is probably <<2Mbps (and 50-500msec away),
which will take 50 times longer or more. The webcache will buffer the
results and trickle them to the client.

However do note that some webcaches (e.g. squid) only buffer up to 8KB
before blocking (not sure if you can change that). You need a webcache
which can completely buffer your big and popular dynamic webpages (possibly
about 50-100KB). Apache mod_proxy can actually be configured to buffer
more, but I haven't really tested it in detail.

More info about your environment and configuration/architecture could be
helpful. e.g. what are you using for the stressed parts - mod_perl,
fast-cgi, php, cgi-bin, apache module.

Cheerio,
Link.


Re: Troubles with performances

From
Alexander Jerusalem
Date:
The really big sites scale by employing two strategies:

* Parallelize everything. That is design the application so that each
important part can run on multiple machines at the same time. For low
budget projects this is easy to achieve at the web and application logic
tiers but hard for databases. Databases with built in paralellism like
Oracle cost a fortune.

* Cache everything. For example if your data changes infrequently or
changes need not be visible immediately you can send back a cached web page
instead of querying the database every time. Servlet engines like Resin
(www.caucho.com) do that out of the box.

If it is as you said, that the database is really the bottleneck (which is
rare by the way), I would go for caching. If caching is not possible
because you have very frequent updates and all queries must include the
most recent data, you can simulate database parallelism by writing all
changes to two database servers in a distributed transaction. You will need
an XA capable transaction manager for this (in Java this cheap and easy
because you can use an open source EJB server like JBoss or a low cost
application server like Orion)

Alexander Jerusalem
ajeru@gmx.net
vknn

At 09:38 22.01.01, Guillaume Lémery wrote:
>>A few quick thoughts:
>>   200 simultaneous queries sounds like a lot
>>to me, and suggests you might be CPU-bound.
>Yes, in 'top' CPU usage is 99%...
>
>>Here are the things you should try:
>>1) Put PostgreSQL on a separate server from your
>>    Web server/application logic.  If you are CPU-bound,
>>    then this can greatly improve your throughput.
>I don't really think my Web Server uses a lot of CPU, because my
>application is written in a module .so and it does only queries to my database.
>
>>2) After doing #1, watch the CPU load on both database
>>    and web server machines.  If your web server is the
>>    bottleneck, try running multiple web servers.
>I was wondering to have 2 web servers per database server, but if you say
>that 200 simultaneous queries is a lot.. what about 400 ? :-(
>
>>3) After trying #1, if your database is the bottleneck
>>    and your data will all fit into RAM, try redesigning
>>    your web application to run entirely from memory.
>>    With this design, you would only do UPDATEs on each
>>    page; SELECTs would be done directly from data
>>    you've stored in memory.  This dramatically reduces
>>    database traffic.  However, it makes it hard to
>>    run multiple web servers and also creates problems if
>>    you have other applications updating the database.
>I don't know how to put the data in memormy to increase performance...
>What should I do ?
>
>>This assumes, of course, that you've carefully studied
>>and optimized the performance of your code.  In my experience,
>>application performance is usually the bottleneck, not
>>the database.  Each of the above strategies is appropriate
>>in different circumstances.
>Well, with all the performance problems, I tried to optimize my code
>everywhere I could.
>All I can do now is to change the way I pass the queries to the database...
>
>Thank you for all the advices.
>
>Guillaume


Re: Troubles with performances

From
Guillaume Lémery
Date:
> Just a shot in the dark:
> Are you opening and closing a database connection for each query? If you
> are, I suggest you don't, and instead use persistent database connections,
> or some form of connection pooling.
Woops, I forgot to say that of course I use a connection pooling.
Just 5 connections are pooled.

> How many http connections per second are you getting?
200

> If it's not many connections per second, but they are taking a long time to
> complete, there might be ways of reducing the number of simultaneous queries.
>
> However do note that some webcaches (e.g. squid) only buffer up to 8KB
> before blocking (not sure if you can change that). You need a webcache
> which can completely buffer your big and popular dynamic webpages (possibly
> about 50-100KB). Apache mod_proxy can actually be configured to buffer
> more, but I haven't really tested it in detail.
I don't handle dynamic pages, but only HTTP redirects, so I think I
don't need cache...

> More info about your environment and configuration/architecture could be
> helpful. e.g. what are you using for the stressed parts - mod_perl,
> fast-cgi, php, cgi-bin, apache module.
I do not use PHP or CGI because they are too slow.
I built an Apache module. I'd like to have response in 200ms max,
because it's an application for banners.


Re: Re: Troubles with performances

From
Tom Lane
Date:
Guillaume =?ISO-8859-1?Q?L=E9mery?= <glemery@comclick.com> writes:
> Woops, I forgot to say that of course I use a connection pooling.
> Just 5 connections are pooled.

>> How many http connections per second are you getting?
> 200

Uh, so all of those http connections have to wait till one of the five
database connections is free?  Maybe that's your bottleneck.  Try
increasing the number of DB connections to something consistent with
the traffic level.

            regards, tom lane

Re: Troubles with performances

From
Guillaume Lémery
Date:
>> Woops, I forgot to say that of course I use a connection pooling.
>> Just 5 connections are pooled.
>>> How many http connections per second are you getting?
>> 200
>
> Uh, so all of those http connections have to wait till one of the five
> database connections is free?  Maybe that's your bottleneck.  Try
> increasing the number of DB connections to something consistent with
> the traffic level.

Oddly, increasing the number of connections pooled doesn't increase
performance and if I create too much connections (e.g. 15 or 20,
performance decrease).

I don't know why... because all my queries try to access to the same
table and sometime the same record (a problem with the lock ??)

Thanx.

Guillaume


Re: Troubles with performances

From
Tim Kientzle
Date:
> > How many http connections per second are you getting?
> 200

Huh?  200 connections per second is very different
from 200 simultaneous connections.  Which is it?

> I don't handle dynamic pages, but only HTTP redirects, so I think I
> don't need cache...

> I do not use PHP or CGI because they are too slow.
> I built an Apache module. I'd like to have response in 200ms max,
> because it's an application for banners.

That should be easy if you simplify your code a LOT.
1) Configure Apache to limit to 10 child processes.
   (No, that's not a mis-print.)  By limiting the number
   of child processes, you limit CPU load and help ensure
   fast response.  You also limit total memory use.
   (To really optimize, drop Apache and find a good
   single-threaded/single-process web server; modify
   that to build a custom web server just for your app.)
2) When your module is initialized, open the database and
   read _everything_ into memory.  If you're building
   a banner redirect system, then you probably only have at most
   a few megabytes of data, so just store it all in memory
   at startup.  From there, just look things up in memory.
3) Don't write logging information to the database; write
   it to a file.  (Designing a good logging system is
   tricky to do well.  Log _files_ are easier to understand
   and manage and faster to write than trying to send log
   data to a database.  If you need summary information such
   as hit totals in the database, have a separate program
   periodically scan the log files to generate such data.)
The net result of 2 and 3 is that you won't ever touch
the database during normal operation.  Logging to files
is extremely fast (one disk write for each transaction)
and keeping your banner data in memory ensures that you
can generate responses very quickly.  You should be
able to consistently generate responses in under 10ms with
this kind of design.    (Under 1ms if you do everything
exactly right.)  The only drawback is that a change in
your database data won't immediately impact what's being
served; you can deal with this within Apache by setting
a limit on the number of hits served per child.  That
will help encourage Apache child processes to be restarted
fairly regularly.

> Oddly, increasing the number of connections pooled doesn't increase
> performance and if I create too much connections (e.g. 15 or 20,
> performance decrease).

Of course.  Remember that Apache is a forking web server; you've
got five pooled connections for _every_process_.  That's way too
many; a single process handles only one request at a time,
you only need to pool one connection in each process.  By pooling
more connections in each process, you're just asking PostgreSQL
to keep a LOT of connections open, which is just a needless
drain on the system.

Re: Troubles with performances

From
Lincoln Yeoh
Date:
At 02:39 PM 1/22/01 +0100, Guillaume Lémery wrote:
>Woops, I forgot to say that of course I use a connection pooling.
>Just 5 connections are pooled.
>
>> How many http connections per second are you getting?
>200

So you're having 200 simultaneous connections, AND 200 connections per
second (aka hits per second)?

Wow that's quite a high load for your system (2 cpu 400MHz x86). 5 selects
and 3 updates too!

I only managed up to about 100 hits per sec on my test phonebook search app
using FastCGI and perl with just a single LIKE selects. Benchmarked using
ab with ab -n 10000 -c 200 on a Dell Poweredge 1300 single CPU 500MHz 128MB
machine. However I didn't do any tweaking yet.

Have you tried apachebench on your app? e.g.
ab -n 10000 -c 200 "http://yourserver/UriToBannerApp"

What do you get?

Is your banner app:
http://comtrack.comclick.com/cgi-bin/aff_bandeau.cgi

>I don't handle dynamic pages, but only HTTP redirects, so I think I
>don't need cache...

>I do not use PHP or CGI because they are too slow.
>I built an Apache module. I'd like to have response in 200ms max,
>because it's an application for banners.

Actually in that case the cache might still be useful. Since you're doing
redirects they should fit nicely into almost any decent cache's buffer.

If the same url always gives the same redirect that makes things even
better - you can then tell the cache to cache and not check urls until they
are 10 minutes old (or longer).

Your webapp then talks to the cache as fast as it can.
The cache talks to the client, as slow as the client is usually.

With this often a _single_ webapp instance can handle 10 or more clients.

BTW php etc can definite respond in faster than 200ms, but nevermind.

Cheerio,
Link.