Re: bg worker: general purpose requirements - Mailing list pgsql-hackers

From Markus Wanner
Subject Re: bg worker: general purpose requirements
Date
Msg-id 4C98CFCD.9070403@bluegap.ch
Whole thread Raw
In response to Re: bg worker: general purpose requirements  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: bg worker: general purpose requirements
List pgsql-hackers
On 09/21/2010 03:46 PM, Robert Haas wrote:
> Wait, are we in violent agreement here?  An overall limit on the
> number of parallel jobs is exactly what I think *does* make sense.
> It's the other knobs I find odd.

Note that the max setting I've been talking about here is the maximum
amount of *idle* workers allowed. It does not include busy bgworkers.

> I guess we differ on the meaning of "cope well"...  being able to spin
> up 18 workers in one second seems very fast to me.  

Well, it's obviously use case dependent. For Postgres-R (and sync
replication) in general, people are very sensitive to latency. There's
the network latency already, but adding a 50ms latency for no good
reason is not going to make these people happy.

> How many do you expect to ever need?!!

Again, very different. For Postgres-R, easily a couple of dozens. Same
applies for parallel querying when having multiple concurrent parallel
queries.

> Possibly, but I'm still having a hard time understanding why you need
> all the complexity you already have.

To make sure I we only pay the startup cost in very rare occasions, and
not every time the workload changes a bit (or isn't in conformance with
an arbitrary timeout).

(BTW the min/max is hardly any more complex than a timeout. It doesn't
even need a syscall).

> It seems (to me) like your design is being driven by start-up latency,
> which I just don't understand.  Sure, 50 ms to start up a worker isn't
> fantastic, but the idea is that it won't happen much because there
> will probably already be a worker in that database from previous
> activity.  The only exception is when there's a sudden surge of
> activity.

I'm less optimistic about the consistency of the workload.

> But I don't think that's the case to optimize for.  If a
> database hasn't had any activity in a while, I think it's better to
> reclaim the memory and file descriptors and ProcArray slots that we're
> spending on it so that the rest of the system can run faster.

Absolutely. It's what I call a change in workload. The min/max approach
is certainly faster at reclaiming unused workers, but (depending on the
max setting) doesn't necessarily ever go down to zero.

Regards

Markus Wanner


pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: Re: Shutting down server from a backend process, e.g. walrceiver
Next
From: Andrew Dunstan
Date:
Subject: Re: .gitignore files, take two