Re: Performance Bottleneck - Mailing list pgsql-performance

From Alex Hayward
Subject Re: Performance Bottleneck
Date
Msg-id Pine.LNX.4.58.0408101521200.423@sphinx.mythic-beasts.com
Whole thread Raw
In response to Re: Performance Bottleneck  ("Matt Clark" <matt@ymogen.net>)
Responses Re: Performance Bottleneck
List pgsql-performance
On Sun, 8 Aug 2004, Matt Clark wrote:

> > And this is exactly where the pgpool advantage lies.
> > Especially with the
> > TPC-W, the Apache is serving a mix of PHP (or whatever CGI
> > technique is
> > used) and static content like images. Since the 200+ Apache
> > kids serve
> > any of that content by random and the emulated browsers very much
> > encourage it to ramp up MaxClients children by using up to 4
> > concurrent
> > image connections, one does end up with MaxClients DB
> > connections that
> > are all relatively low frequently used. In contrast to that the real
> > pgpool causes lesser, more active DB connections, which is better for
> > performance.
>
> There are two well-worn and very mature techniques for dealing with the
> issue of web apps using one DB connection per apache process, both of which
> work extremely well and attack the issue at its source.
>
> 1)    Use a front-end caching proxy like Squid as an accelerator.  Static
> content will be served by the accelerator 99% of the time.  Additionally,
> large pages can be served immediately to the accelerator by Apache, which
> can then go on to serve another request without waiting for the end user's
> dial-up connection to pull the data down.  Massive speedup, fewer apache
> processes needed.

Squid also takes away the work of doing SSL (presuming you're running it
on a different machine). Unfortunately it doesn't support HTTP/1.1 which
means that most generated pages (those that don't set Content-length) end
up forcing squid to close and then reopen the connection to the web
server.

Because you no longer need to worry about keeping Apache processes around
to dribble data to people on the wrong end of modems you can reduce
MaxClients quite a bit (to, say, 10 or 20 per web server). This keeps the
number of PostgreSQL connections down. I'd guess that above some point
you're going to reduce performance by increasing MaxClients and running
queries in parallel rather than queueing the request and doing them
serially.

I've also had some problems when Squid had a large number of connections
open (several thousand); though that may have been because of my
half_closed_clients setting. Squid 3 coped a lot better when I tried it
(quite a few months ago now - and using FreeBSD and the special kqueue
system call) but crashed under some (admittedly synthetic) conditions.

> I'm sure pgpool and the like have their place, but being band-aids for
> poorly configured websites probably isn't the best use for them.

You still have periods of time when the web servers are busy using their
CPUs to generate HTML rather than waiting for database queries. This is
especially true if you cache a lot of data somewhere on the web servers
themselves (which, in my experience, reduces the database load a great
deal). If you REALLY need to reduce the number of connections (because you
have a large number of web servers doing a lot of computation, say) then
it might still be useful.

pgsql-performance by date:

Previous
From: Litao Wu
Date:
Subject: Re: insert waits for delete with trigger
Next
From: "Merlin Moncure"
Date:
Subject: Re: [HACKERS] fsync vs open_sync