Home > mailing lists

Re: [HACKERS] Unportable implementation of background worker start - Mailing list pgsql-hackers

From	Alvaro Herrera
Subject	Re: [HACKERS] Unportable implementation of background worker start
Date	April 21, 2017 21:19:41
Msg-id	20170421151941.45njrtwykn5dd476@alvherre.pgsql Whole thread Raw
In response to	Re: [HACKERS] Unportable implementation of background worker start (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: [HACKERS] Unportable implementation of background worker start (Tom Lane <tgl@sss.pgh.pa.us>) Re: [HACKERS] Unportable implementation of background worker start (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

Tom Lane wrote:

> After sleeping and thinking more, I've realized that the
> slow-bgworker-start issue actually exists on *every* platform, it's just
> harder to hit when select() is interruptable.  But consider the case
> where multiple bgworker-start requests arrive while ServerLoop is
> actively executing (perhaps because a connection request just came in).
> The postmaster has signals blocked, so nothing happens for the moment.
> When we go around the loop and reach
> 
>             PG_SETMASK(&UnBlockSig);
> 
> the pending SIGUSR1 is delivered, and sigusr1_handler reads all the
> bgworker start requests, and services just one of them.  Then control
> returns and proceeds to
> 
>             selres = select(nSockets, &rmask, NULL, NULL, &timeout);
> 
> But now there's no interrupt pending.  So the remaining start requests
> do not get serviced until (a) some other postmaster interrupt arrives,
> or (b) the one-minute timeout elapses.  They could be waiting awhile.
> 
> Bottom line is that any request for more than one bgworker at a time
> faces a non-negligible risk of suffering serious latency.

Interesting.  It's hard to hit, for sure.

> I'm coming back to the idea that at least in the back branches, the
> thing to do is allow maybe_start_bgworker to start multiple workers.
>
> Is there any actual evidence for the claim that that might have
> bad side effects?

Well, I ran tests with a few dozen thousand sample workers and the
neglect for other things (such as connection requests) was visible, but
that's probably not a scenario many servers run often currently.  I
don't strongly object to the idea of removing the "return" in older
branches, since it's evidently a problem.  However, as bgworkers start
to be used more, I think we should definitely have some protection.  In
a system with a large number of workers available for parallel queries,
it seems possible for a high velocity server to get stuck in the loop
for some time.  (I haven't actually verified this, though.  My
experiments were with the early kind, static bgworkers.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Tom Lane
Date: 21 April 2017, 21:09:16
Subject: Re: [HACKERS] Unportable implementation of background worker start

From: Tom Lane
Date: 21 April 2017, 22:50:04
Subject: Re: [HACKERS] Unportable implementation of background worker start

Re: [HACKERS] Unportable implementation of background worker start - Mailing list pgsql-hackers

Previous

Next