Re: Scalability in postgres - Mailing list pgsql-performance

From Kevin Grittner
Subject Re: Scalability in postgres
Date
Msg-id 4A28E56702000025000275EA@gw.wicourts.gov
Whole thread Raw
In response to Re: Scalability in postgres  (Mark Mielke <mark@mark.mielke.cc>)
List pgsql-performance
Mark Mielke <mark@mark.mielke.cc> wrote:
> Kevin Grittner wrote:
>> James Mansion <james@mansionfamily.plus.com> wrote:
>>> Kevin Grittner wrote:
>>>
>>>> Sure, but the architecture of those products is based around all
>>>> the work being done by "engines" which try to establish affinity
>>>> to different CPUs, and loop through the various tasks to be done.
>>>> You don't get a context switch storm because you normally have
>>>> the number of engines set at or below the number of CPUs.
>>>>
>>> This is just misleading at best.
>>
>> What part?  Last I checked, Sybase ASE and SQL Server worked as I
>> described.  Those are the products I was describing.  Or is it
>> misleading to say that you aren't likely to get a context switch
>> storm if you keep your active thread count at or below the number
>> of CPUs?
>
> Context switch storm is about how the application and runtime
> implements concurrent accesses to shared resources, not about the
> potentials of the operating system.

I'm really not following how that's responsive to my questions or
points, at all.  You're making pretty basic and obvious points about
other ways to avoid the problem, but the fact is that the other
databases people point to as examples of handling large numbers of
connections have (so far at least) been ones which solve the problems
in other ways than what people seem to be proposing.  That doesn't
mean that the techniques used by these other products are the only way
to solve the issue, or even that they are the best ways; but it does
mean that pointing to those other products doesn't prove anything
relative to what lock optimization is likely to buy us.

> For example, if threads all spin every time a condition or event is
> raised, then yes, a context storm probably occurs if there are
> thousands of threads. But, it doesn't have to work that way. At it's
> very simplest, this is the difference between "wake one thread"
> (which is then responsible for waking the next thread) vs "wake all
> threads". This isn't necessarily the best solution - but it is one
> alternative. Other solutions might involve waking the *right*
> thread.  For example, if I know that a particular thread is waiting
> on my change and it has the highest priority - perhaps I only need
> to wake that one thread. Or, if I know that 10 threads are waiting
> on my results and can act on it, I only need to wake these specific
> 10 threads. Any system which actually wakes all threads will
> probably exhibit scaling limitations.

I would be surprised if any of this is not obvious to all on the list.

>>> I'm sorry, but (in particular) UNIX systems have routinely
>>> managed large numbers of runnable processes where the run queue
>>> lengths are long without such an issue.
>>>
>> Well, the OP is looking at tens of thousands of connections.  If we
>> have a process per connection, how many tens of thousands can we
>> handle before we get into problems with exhausting possible pid
>> numbers (if nothing else)?
>
> This depends if it is 16-bit pid numbers or 32-bit pid numbers. I
> believe Linux supports 32-bit pid numbers although I'm not up-to-date
on
> what the default configurations are for all systems in use today. In

> particular, Linux 2.6 added support for the O(1) task scheduler, with

> the express requirement of supporting hundreds of thousands of
(mostly
> idle) threads. The support exists. Is it activated or in proper use?
I
> don't know.

Interesting.  I'm running the latest SuSE Enterprise on a 64 bit
system with 128 GB RAM and 16 CPUs, yet my pids and port numbers are
16 bit.  Since I only use a tiny fraction of the available numbers
using current techniques, I don't need to look at this yet, but I'll
keep it in mind.

>> I know that if you do use a large number of threads, you have to be
>> pretty adaptive.  In our Java app that pulls data from 72 sources
and
>> replicates it to eight, plus feeding it to filters which determine
>> what publishers for interfaces might be interested, the Sun JVM
does
>> very poorly, but the IBM JVM handles it nicely.  It seems they use
>> very different techniques for the monitors on objects which
>> synchronize the activity of the threads, and the IBM technique does
>> well when no one monitor is dealing with a very large number of
>> blocking threads.  They got complaints from people who had
thousands
>> of threads blocking on one monitor, so they now keep a count and
>> switch techniques for an individual monitor if the count gets too
>> high.
>>
> Could be, and if so then Sun JVM should really address the problem.

I wish they would.

> However, having thousands of threads waiting on one monitor probably

> isn't a scalable solution, regardless of whether the JVM is able to
> optimize around your usage pattern or not. Why have thousands of
threads
> waiting on one monitor? That's a bit insane. :-)

Agreed.  We weren't the ones complaining to IBM.  :-)

>> Perhaps something like that (or some other new approach) might
>> mitigate the effects of tens of thousands of processes competing
for
>> for a few resources, but it fundamentally seems unwise to turn
those
>> loose to compete if requests can be queued in some way.
>
> An alternative approach might be: 1) Idle processes not currently
> running a transaction do not need to be consulted for their snapshot

> (and other related expenses) - if they are idle for a period of time,

> they "unregister" from the actively used processes list - if they
become
> active again, they "register" in the actively used process list, and
2)
> Processes could be reusable across different connections - they could

> stick around for a period after disconnect, and make themselves
> available again to serve the next connection.
>
> Still heavy-weight in terms of memory utilization, but cheap in terms
of
> other impacts. Without the cost of connection "pooling" in the sense
of
> requests always being indirect through a proxy of some sort.

Just guessing here, but I would expect the cost of such forwarding to
be pretty insignificant compared to the cost of even parsing the
query, much less running it.  That would be especially true if the
pool
was integrated into the DBMS in a way similar to what was described as
the
Oracle default.

-Kevin

pgsql-performance by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: Why is my stats collector so busy?
Next
From: "Kevin Grittner"
Date:
Subject: Re: Scalability in postgres