Re: [HACKERS] Why does logical replication launcher exit with exitcode 1? - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [HACKERS] Why does logical replication launcher exit with exitcode 1?
Date
Msg-id 20170802002028.srtnekv4qirzob6c@alap3.anarazel.de
Whole thread Raw
In response to Re: [HACKERS] Why does logical replication launcher exit with exitcode 1?  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: [HACKERS] Why does logical replication launcher exit with exitcode 1?  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
List pgsql-hackers
On 2017-08-02 12:14:18 +1200, Thomas Munro wrote:
> On Wed, Aug 2, 2017 at 11:03 AM, Andres Freund <andres@anarazel.de> wrote:
> > On 2017-08-02 10:58:32 +1200, Thomas Munro wrote:
> >> When I shut down a cluster that isn't using logical replication, it
> >> always logs a line like the following.  So do the build farm members I
> >> looked at.  I didn't see anything about this in the open items list --
> >> isn't it a bug?
> >>
> >> 2017-08-02 10:39:25.007 NZST [34781] LOG:  worker process: logical
> >> replication launcher (PID 34788) exited with exit code 1
> >
> > Exit code 0 signals that a worker should be restarted. Therefore
> > graceful exit can't really use that.  I think a) we really need to
> > improve bgworker infrastructure around that b) shows the limit of using
> > bgworkers for this kinda thing - we should probably have a more bgworker
> > like infrastructure for internal workers.
> 
> I see.  In the meantime IMHO I think we should try to find a way to
> avoid printing out this message -- it looks like something is wrong to
> the uninitiated.

Well, that's how it is for all bgworkers - maybe a better solution is to
adjust that message in the postmaster rather than fiddle with the worker
exist code?  Seems like we could easily take pmStatus into account
inside LogChildExit() and set the log level to DEBUG1 even for
EXIT_STATUS_1 in that case?  Additionally we probably should always log
a better message for bgworkers exiting with exit 1, something about
unregistering the worker or such.


> Possibly stupid question: why do we restart workers when we know we're
> shutting down anyway?  Hmm, I suppose there might conceivably be
> workers that need to do something during shutdown and they might not
> have done it yet.

The launcher doesn't really know the reason for the shutdown.

- Andres



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: [HACKERS] Why does logical replication launcher exit with exitcode 1?
Next
From: Amit Langote
Date:
Subject: Re: [HACKERS] Partitioning vs ON CONFLICT