Re: Scalability in postgres - Mailing list pgsql-performance

From Mark Mielke
Subject Re: Scalability in postgres
Date
Msg-id 4A2892DD.3090809@mark.mielke.cc
Whole thread Raw
In response to Re: Scalability in postgres  (david@lang.hm)
Responses Re: Scalability in postgres  (david@lang.hm)
List pgsql-performance
david@lang.hm wrote:
> On Thu, 4 Jun 2009, Mark Mielke wrote:
>> You should really only have as 1X or 2X many threads as there are
>> CPUs waiting on one monitor. Beyond that is waste. The idle threads
>> can be pooled away, and only activated (with individual monitors
>> which can be far more easily and effectively optimized) when the
>> other threads become busy.
> sometimes the decrease in complexity in the client makes it worthwhile
> to 'brute force' things.
> this actually works well for the vast majority of services (including
> many databases)
> the question is how much complexity (if any) it adds to postgres to
> handle this condition better, and what those changes are.

Sure. Locks that are not generally contended, for example, don't deserve
the extra complexity. Locks that have any expected frequency of a
"context storm" though, probably make good candidates.

>> An alternative approach might be: 1) Idle processes not currently
>> running a transaction do not need to be consulted for their snapshot
>> (and other related expenses) - if they are idle for a period of time,
>> they "unregister" from the actively used processes list - if they
>> become active again, they "register" in the actively used process list,
> how expensive is this register/unregister process? if it's cheap
> enough do it all the time and avoid the complexity of having another
> config option to tweak.

Not really relevant if you look at the "idle for a period of time". An
active process would not unregister/register. An inactive process,
though, after it is not in a commit, and after it hits some time that is
many times more than the cost of unregister + register, would free up
other processes from having to take this process into account, allowing
for better scaling. For example, let's say it doesn't unregister itself
for 5 seconds.

>> and 2) Processes could be reusable across different connections -
>> they could stick around for a period after disconnect, and make
>> themselves available again to serve the next connection.
> depending on what criteria you have for the re-use, this could be a
> significant win (if you manage to re-use the per process cache much.
> but this is far more complex.

Does it need to be? From a naive perspective - what's the benefit of a
PostgreSQL process dying, and a new connection getting a new PostgreSQL
process? I suppose bugs in PostgreSQL don't have the opportunity to
affect later connections, but overall, this seems like an unnecessary
cost. I was thinking of either: 1) The Apache model, where a PostreSQL
process waits on accept(), or 2) When the PostgreSQL process is done, it
does connection cleanup and then it waits for a file descriptor to be
transferred to it through IPC and just starts over using it. Too hand
wavy? :-)

>> Still heavy-weight in terms of memory utilization, but cheap in terms
>> of other impacts. Without the cost of connection "pooling" in the
>> sense of requests always being indirect through a proxy of some sort.
> it would seem to me that the cost of making the extra hop through the
> external pooler would be significantly more than the overhead of idle
> processes marking themselvs as such so that they don't get consulted
> for MVCC decisions

They're separate ideas to be considered separately on the complexity vs
benefit merit.

For the first - I think we already have an "external pooler", in the
sense of the master process which forks to manage a connection, so it
already involves a possible context switch to transfer control. I guess
the question is whether or not we can do better than fork(). In
multi-threaded programs, it's definitely possible to outdo fork using
thread pools. Does the same remain true of a multi-process program that
communicates using IPC? I'm not completely sure, although I believe
Apache does achieve this by having the working processes do accept()
rather than some master process that spawns off new processes on each
connection. Apache re-uses the process.

Cheers,
mark

--
Mark Mielke <mark@mielke.cc>


pgsql-performance by date:

Previous
From: Robert Haas
Date:
Subject: Re: Scalability in postgres
Next
From: Greg Smith
Date:
Subject: Re: Scalability in postgres