Re: [HACKERS] Unportable implementation of background worker start - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Unportable implementation of background worker start
Date
Msg-id 17795.1492663813@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Unportable implementation of background worker start  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Unportable implementation of background worker start  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: [HACKERS] Unportable implementation of background worker start  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
I wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Tom Lane wrote:
>>> Hm.  Do you have a more-portable alternative?

>> I was thinking in a WaitEventSet from latch.c.

My first reaction was that that sounded like a lot more work than removing
two lines from maybe_start_bgworker and adjusting some comments.  But on
closer inspection, the slow-bgworker-start issue isn't the only problem
here.  On a machine that restarts select()'s timeout after an interrupt,
as (at least) HPUX does, the postmaster will actually never iterate
ServerLoop's loop except immediately after receiving a new connection
request.  The stream of interrupts from the autovac launcher is alone
sufficient to prevent the initial 60-second timeout from ever elapsing.
So if there are no new connection requests for awhile, none of the
housekeeping actions in ServerLoop get done.

Most of those actions are concerned with restarting failed background
tasks, which is something we could get by without --- it's unlikely
that those tasks would fail without causing a database-wide restart,
and then there really isn't going to be much need for them until at least
one new connection request has arrived.  But the last step in that loop is
concerned with touching the sockets and lock files to prevent aggressive
/tmp cleaners from removing them, and that's something that can't be let
slide, or we might as well not have the logic at all.

I've confirmed experimentally that on my HPUX box, a postmaster not
receiving new connections for an hour or more in fact fails to update
the mod times on the sockets and lock files.  So that problem isn't
hypothetical.

So it's looking to me like we do need to do something about this, and
ideally back-patch it all the way.  But WaitEventSet doesn't exist
before 9.6.  Do we have the stomach for back-patching that into
stable branches?
        regards, tom lane



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [HACKERS] Interval for launching the table sync worker
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables