Re: bg worker: general purpose requirements - Mailing list pgsql-hackers

From Robert Haas
Subject Re: bg worker: general purpose requirements
Date
Msg-id AANLkTinGDWzMQpsTuTCGQnm6sY4aiysbm5vBL6-O-Ci_@mail.gmail.com
Whole thread Raw
In response to Re: bg worker: general purpose requirements  (Markus Wanner <markus@bluegap.ch>)
Responses Re: bg worker: general purpose requirements
List pgsql-hackers
On Tue, Sep 21, 2010 at 4:23 AM, Markus Wanner <markus@bluegap.ch> wrote:
> On 09/21/2010 02:49 AM, Robert Haas wrote:
>> OK.  At least for me, what is important is not only how many GUCs
>> there are but how likely they are to require tuning and how easy it
>> will be to know what the appropriate value is.  It seems fairly easy
>> to tune the maximum number of background workers, and it doesn't seem
>> hard to tune an idle timeout, either.  Both of those are pretty
>> straightforward trade-offs between, on the one hand, consuming more
>> system resources, and on the other hand, better throughput and/or
>> latency.
>
> Hm.. I thought of it the other way around. It's more obvious and direct
> for me to determine a min and max of the amount of parallel jobs I want
> to perform at once. Based on the number of spindles, CPUs and/or nodes
> in the cluster (in case of Postgres-R). Admittedly, not necessarily per
> database, but at least overall.

Wait, are we in violent agreement here?  An overall limit on the
number of parallel jobs is exactly what I think *does* make sense.
It's the other knobs I find odd.

> I wouldn't known what to set a timeout to. And you didn't make a good
> argument for any specific value so far. Nor did you offer a reasoning
> for how to find one. It's certainly very workload and feature specific.

I think my basic contention is that it doesn't matter very much, so
any reasonable value should be fine.  I think 5 minutes will be good
enough for 99% of cases.  But if you find that this leaves too many
extra backends around and you start to run out of file descriptors or
your ProcArray gets too full, then you might want to drop it down.
Conversely, if you want to fine-tune your system for sudden load
spikes, you could raise it.

> I'd consider the case of min_spare_background_workers * number of
> databases > max_background_workers to be a configuration error, about
> which the coordinator should warn.

The number of databases isn't a configuration parameter.  Ideally,
users shouldn't have to reconfigure the system because they create
more databases.

>> I think we need to look for a way to eliminate the maximum number of
>> workers per database, too.
>
> Okay, might make sense, yes.
>
> Dropping both of these per-database GUCs, we'd simply end up with having
> max_background_workers around all the time.
>
> A timeout would mainly help to limit the max amount of time workers sit
> around idle. I fail to see how that's more helpful than the proposed
> min/max. Quite the opposite, it's impossible to get any useful guarantees.
>
> It assumes that the workload remains the same over time, but doesn't
> cope well with sudden spikes and changes in the workload.

I guess we differ on the meaning of "cope well"...  being able to spin
up 18 workers in one second seems very fast to me.  How many do you
expect to ever need?!!

> Unlike the
> proposed min/max combination, which forks new bgworkers in advance, even
> if the database already uses lots of them. And after the spike, it
> quickly reduces the amount of spare bgworkers to a certain max. While
> not perfect, it's definitely more adaptive to the workload (at least in
> the usual case of having only few databases).
>
> Maybe we need a more sophisticated algorithm in the coordinator. For
> example measuring the avg. amount of concurrent jobs per database over
> time and adjust the number of idle backends according to that, the
> current workload and the max_background_workers, or some such. The
> min/max GUCs were simply easier to implement, but I'm open to a more
> sophisticated thing.

Possibly, but I'm still having a hard time understanding why you need
all the complexity you already have.  The way I'd imagine doing this
is:

1. If a new job arrives, and there is an idle worker available for the
correct database, then allocate that worker to that job.  Stop.
2. Otherwise, if the number of background workers is less than the
maximum number allowable, then start a new worker for the appropriate
database and allocate it to the new job.  Stop.
3. Otherwise, if there is at least one idle background worker, kill it
and start a new one for the correct database.  Allocate that new
worker to the new job.  Stop.
4. Otherwise, you're already at the maximum number of background
workers and they're all busy.  Wait until some worker finishes a job,
and then try again beginning with step 1.

When a worker finishes a job, it hangs around for a few minutes to see
if it gets assigned a new job (as per #1) and then exits.

Although there are other tunables that can be exposed, I would expect,
in this design, that the only thing most people would need to adjust
would be the maximum pool size.

It seems (to me) like your design is being driven by start-up latency,
which I just don't understand.  Sure, 50 ms to start up a worker isn't
fantastic, but the idea is that it won't happen much because there
will probably already be a worker in that database from previous
activity.  The only exception is when there's a sudden surge of
activity.  But I don't think that's the case to optimize for.  If a
database hasn't had any activity in a while, I think it's better to
reclaim the memory and file descriptors and ProcArray slots that we're
spending on it so that the rest of the system can run faster.  If that
means it takes an extra fraction of a second to respond at some later
point, I can live with that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


pgsql-hackers by date:

Previous
From: Itagaki Takahiro
Date:
Subject: Re: Basic JSON support
Next
From: Peter Eisentraut
Date:
Subject: Re: .gitignore files, take two