Re: [HACKERS] Unportable implementation of background worker start - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: [HACKERS] Unportable implementation of background worker start
Date
Msg-id 20170421151941.45njrtwykn5dd476@alvherre.pgsql
Whole thread Raw
In response to Re: [HACKERS] Unportable implementation of background worker start  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Unportable implementation of background worker start  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: [HACKERS] Unportable implementation of background worker start  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:

> After sleeping and thinking more, I've realized that the
> slow-bgworker-start issue actually exists on *every* platform, it's just
> harder to hit when select() is interruptable.  But consider the case
> where multiple bgworker-start requests arrive while ServerLoop is
> actively executing (perhaps because a connection request just came in).
> The postmaster has signals blocked, so nothing happens for the moment.
> When we go around the loop and reach
> 
>             PG_SETMASK(&UnBlockSig);
> 
> the pending SIGUSR1 is delivered, and sigusr1_handler reads all the
> bgworker start requests, and services just one of them.  Then control
> returns and proceeds to
> 
>             selres = select(nSockets, &rmask, NULL, NULL, &timeout);
> 
> But now there's no interrupt pending.  So the remaining start requests
> do not get serviced until (a) some other postmaster interrupt arrives,
> or (b) the one-minute timeout elapses.  They could be waiting awhile.
> 
> Bottom line is that any request for more than one bgworker at a time
> faces a non-negligible risk of suffering serious latency.

Interesting.  It's hard to hit, for sure.

> I'm coming back to the idea that at least in the back branches, the
> thing to do is allow maybe_start_bgworker to start multiple workers.
>
> Is there any actual evidence for the claim that that might have
> bad side effects?

Well, I ran tests with a few dozen thousand sample workers and the
neglect for other things (such as connection requests) was visible, but
that's probably not a scenario many servers run often currently.  I
don't strongly object to the idea of removing the "return" in older
branches, since it's evidently a problem.  However, as bgworkers start
to be used more, I think we should definitely have some protection.  In
a system with a large number of workers available for parallel queries,
it seems possible for a high velocity server to get stuck in the loop
for some time.  (I haven't actually verified this, though.  My
experiments were with the early kind, static bgworkers.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Unportable implementation of background worker start
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Unportable implementation of background worker start