Re: Scalability in postgres - Mailing list pgsql-performance

From Robert Haas
Subject Re: Scalability in postgres
Date
Msg-id 603c8f070906041129t5b1011ft5b2c174dc04578d6@mail.gmail.com
Whole thread Raw
In response to Re: Scalability in postgres  (Scott Carey <scott@richrelevance.com>)
List pgsql-performance
On Thu, Jun 4, 2009 at 2:04 PM, Scott Carey <scott@richrelevance.com> wrote:
> To clarify if needed:
>
> I'm not saying the two issues are unrelated.  I'm saying that the
> relationship between connection pooling and a database is multi-dimensional,
> and the scalability improvement does not have a hard dependency on
> connection pooling.
>
> On one spectrum, you have the raw performance improvement by caching
> connections so they do not need to be created and destroyed frequently.
> This is a universal benefit to all databases, though some have higher
> overhead of connection creation than others.  Any book on databases
> mentioning connection pools will list this benefit.
>
> On another spectrum, a connection pool can act as a concurrency throttle.
> The benefit of such a thing varies greatly from database to database, but
> the trend for each DB out there has been to solve this issue internally and
> not trust client or third party tools to prevent concurrency/scalability
> related disasters.
>
> The latter should be treated separately, a solution to it does not have to
> address the connection creation/destruction efficiency -- almost all clients
> these days can do that part, and third party tools are simpler if they only
> have to meet that goal and not also try and reduce idle connection count.
>
> So a fix to the connection scalability issues only optionally involves what
> most would call connection pooling.
>
> -------
> Postgres' MVCC nature has something to do with it, but I'm sure there are
> ways to significantly improve the current situation.  Locks and processor
> cache-line behavior on larger SMP systems are often strangely behaving
> beasts.

I think in the particular case of PostgreSQL the only suggestions I've
heard for improving performance with very large numbers of
simultaneous connections are (1) connection caching, not so much
because of the overhead of creating the connection as because it
involves creating a whole new process whose private caches start out
cold, (2) finding a way to reduce ProcArrayLock contention, and (3)
reducing the cost of deriving a snapshot.  I think (2) and (3) are
related but I'm not sure how closely.  As far as I know, Simon is the
only one to submit a patch in this area and I think I'm not being
unfair if I say that that particular patch is mostly nibbling around
the edges of the problem.  There was a discussion a few months ago on
some possible changes to the lock modes of ProcArrayLock, based I
believe on some ideas from Tom (might have been Heikki), but I don't
think anyone has coded that or tested it.

We probably won't be able to make significant improvements in this
area unless someone comes up with some new, good ideas.   I agree with
you that there are probably ways to significantly improve the current
situation, but I'm not sure anyone has figured out with any degree of
specificity what they are.

...Robert

pgsql-performance by date:

Previous
From: Scott Carey
Date:
Subject: Re: degenerate performance on one server of 3
Next
From: Josh Berkus
Date:
Subject: Re: Pointers needed on optimizing slow SQL statements