Re: Scalability in postgres - Mailing list pgsql-performance

From david@lang.hm
Subject Re: Scalability in postgres
Date
Msg-id alpine.DEB.1.10.0906041755590.7953@asgard
Whole thread Raw
In response to Re: Scalability in postgres  (Mark Mielke <mark@mark.mielke.cc>)
Responses Re: Scalability in postgres
List pgsql-performance
On Thu, 4 Jun 2009, Mark Mielke wrote:

> Kevin Grittner wrote:
>> James Mansion <james@mansionfamily.plus.com> wrote:
>
>> I know that if you do use a large number of threads, you have to be
>> pretty adaptive.  In our Java app that pulls data from 72 sources and
>> replicates it to eight, plus feeding it to filters which determine
>> what publishers for interfaces might be interested, the Sun JVM does
>> very poorly, but the IBM JVM handles it nicely.  It seems they use
>> very different techniques for the monitors on objects which
>> synchronize the activity of the threads, and the IBM technique does
>> well when no one monitor is dealing with a very large number of
>> blocking threads.  They got complaints from people who had thousands
>> of threads blocking on one monitor, so they now keep a count and
>> switch techniques for an individual monitor if the count gets too
>> high.
>>
> Could be, and if so then Sun JVM should really address the problem. However,
> having thousands of threads waiting on one monitor probably isn't a scalable
> solution, regardless of whether the JVM is able to optimize around your usage
> pattern or not. Why have thousands of threads waiting on one monitor? That's
> a bit insane. :-)
>
> You should really only have as 1X or 2X many threads as there are CPUs
> waiting on one monitor. Beyond that is waste. The idle threads can be pooled
> away, and only activated (with individual monitors which can be far more
> easily and effectively optimized) when the other threads become busy.

sometimes the decrease in complexity in the client makes it worthwhile to
'brute force' things.

this actually works well for the vast majority of services (including many
databases)

the question is how much complexity (if any) it adds to postgres to handle
this condition better, and what those changes are.

>> Perhaps something like that (or some other new approach) might
>> mitigate the effects of tens of thousands of processes competing for
>> for a few resources, but it fundamentally seems unwise to turn those
>> loose to compete if requests can be queued in some way.
>>
>
> An alternative approach might be: 1) Idle processes not currently running a
> transaction do not need to be consulted for their snapshot (and other related
> expenses) - if they are idle for a period of time, they "unregister" from the
> actively used processes list - if they become active again, they "register"
> in the actively used process list,

how expensive is this register/unregister process? if it's cheap enough do
it all the time and avoid the complexity of having another config option
to tweak.

> and 2) Processes could be reusable across
> different connections - they could stick around for a period after
> disconnect, and make themselves available again to serve the next connection.

depending on what criteria you have for the re-use, this could be a
significant win (if you manage to re-use the per process cache much. but
this is far more complex.

> Still heavy-weight in terms of memory utilization, but cheap in terms of
> other impacts. Without the cost of connection "pooling" in the sense of
> requests always being indirect through a proxy of some sort.

it would seem to me that the cost of making the extra hop through the
external pooler would be significantly more than the overhead of idle
processes marking themselvs as such so that they don't get consulted for
MVCC decisions

David Lang

pgsql-performance by date:

Previous
From: Scott Carey
Date:
Subject: Re: Scalability in postgres
Next
From: Robert Haas
Date:
Subject: Re: Scalability in postgres