Home > mailing lists

Re: Scalability in postgres - Mailing list pgsql-performance

From	Mark Mielke
Subject	Re: Scalability in postgres
Date	June 4, 2009 20:03:51
Msg-id	4A285305.2070103@mark.mielke.cc Whole thread Raw
In response to	Re: Scalability in postgres ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses	Re: Scalability in postgres Re: Scalability in postgres Re: Scalability in postgres
List	pgsql-performance

Tree view

Kevin Grittner wrote:

James Mansion <james@mansionfamily.plus.com> wrote:

Kevin Grittner wrote:

Sure, but the architecture of those products is based around all
the work being done by "engines" which try to establish affinity to
different CPUs, and loop through the various tasks to be done.  You
don't get a context switch storm because you normally have the
number of engines set at or below the number of CPUs.  The down
side is that they spend a lot of time spinning around queue access
to see if anything has become available to do -- which causes them
not to play nice with other processes on the same box.

This is just misleading at best.

 
What part?  Last I checked, Sybase ASE and SQL Server worked as I
described.  Those are the products I was describing.  Or is it
misleading to say that you aren't likely to get a context switch storm
if you keep your active thread count at or below the number of CPUs?

Context switch storm is about how the application and runtime implements concurrent accesses to shared resources, not about the potentials of the operating system. For example, if threads all spin every time a condition or event is raised, then yes, a context storm probably occurs if there are thousands of threads. But, it doesn't have to work that way. At it's very simplest, this is the difference between "wake one thread" (which is then responsible for waking the next thread) vs "wake all threads". This isn't necessarily the best solution - but it is one alternative. Other solutions might involve waking the *right* thread. For example, if I know that a particular thread is waiting on my change and it has the highest priority - perhaps I only need to wake that one thread. Or, if I know that 10 threads are waiting on my results and can act on it, I only need to wake these specific 10 threads. Any system which actually wakes all threads will probably exhibit scaling limitations.

The operating system itself only needs to keep threads in the run queue if they have work to do. Having thousands of idle thread does not need to cost *any* cpu time, if they're kept in an idle thread collection separate from the run queue.

I'm sorry, but (in particular) UNIX systems have routinely
managed large numbers of runnable processes where the run queue
lengths are long without such an issue.

Well, the OP is looking at tens of thousands of connections.  If we
have a process per connection, how many tens of thousands can we
handle before we get into problems with exhausting possible pid
numbers (if nothing else)?

This depends if it is 16-bit pid numbers or 32-bit pid numbers. I believe Linux supports 32-bit pid numbers although I'm not up-to-date on what the default configurations are for all systems in use today. In particular, Linux 2.6 added support for the O(1) task scheduler, with the express requirement of supporting hundreds of thousands of (mostly idle) threads. The support exists. Is it activated or in proper use? I don't know.

I know that if you do use a large number of threads, you have to be
pretty adaptive.  In our Java app that pulls data from 72 sources and
replicates it to eight, plus feeding it to filters which determine
what publishers for interfaces might be interested, the Sun JVM does
very poorly, but the IBM JVM handles it nicely.  It seems they use
very different techniques for the monitors on objects which
synchronize the activity of the threads, and the IBM technique does
well when no one monitor is dealing with a very large number of
blocking threads.  They got complaints from people who had thousands
of threads blocking on one monitor, so they now keep a count and
switch techniques for an individual monitor if the count gets too
high.

Could be, and if so then Sun JVM should really address the problem. However, having thousands of threads waiting on one monitor probably isn't a scalable solution, regardless of whether the JVM is able to optimize around your usage pattern or not. Why have thousands of threads waiting on one monitor? That's a bit insane. :-)

You should really only have as 1X or 2X many threads as there are CPUs waiting on one monitor. Beyond that is waste. The idle threads can be pooled away, and only activated (with individual monitors which can be far more easily and effectively optimized) when the other threads become busy.

Perhaps something like that (or some other new approach) might
mitigate the effects of tens of thousands of processes competing for
for a few resources, but it fundamentally seems unwise to turn those
loose to compete if requests can be queued in some way.

An alternative approach might be: 1) Idle processes not currently running a transaction do not need to be consulted for their snapshot (and other related expenses) - if they are idle for a period of time, they "unregister" from the actively used processes list - if they become active again, they "register" in the actively used process list, and 2) Processes could be reusable across different connections - they could stick around for a period after disconnect, and make themselves available again to serve the next connection.

Still heavy-weight in terms of memory utilization, but cheap in terms of other impacts. Without the cost of connection "pooling" in the sense of requests always being indirect through a proxy of some sort.

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>

pgsql-performance by date:

From: "Kevin Grittner"
Date: 04 June 2009, 19:33:57
Subject: Re: Scalability in postgres

From: david@lang.hm
Date: 04 June 2009, 21:52:03
Subject: Re: Scalability in postgres

Re: Scalability in postgres - Mailing list pgsql-performance

Previous

Next