Re: Race conditions with checkpointer and shutdown - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Race conditions with checkpointer and shutdown
Date
Msg-id 20190429163511.7rbdb7gmlz634dc4@alap3.anarazel.de
Whole thread Raw
In response to Re: Race conditions with checkpointer and shutdown  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Race conditions with checkpointer and shutdown
List pgsql-hackers
Hi,

On 2019-04-27 20:56:51 -0400, Tom Lane wrote:
> Even if that isn't the proximate cause of the current reports, it's
> clearly trouble waiting to happen, and we should get rid of it.
> Accordingly, see attached proposed patch.  This just flushes the
> "immediate interrupt" stuff in favor of making sure that
> libpqwalreceiver.c will take care of any signals received while
> waiting for input.

Good plan.


> The existing code does not use PQsetnonblocking, which means that it's
> theoretically at risk of blocking while pushing out data to the remote
> server.  In practice I think that risk is negligible because (IIUC) we
> don't send very large amounts of data at one time.  So I didn't bother to
> change that.  Note that for the most part, if that happened, the existing
> code was at risk of slow response to SIGTERM anyway since it didn't have
> Enable/DisableWalRcvImmediateExit around the places that send data.

Hm, I'm not convinced that's OK. What if there's a network hickup? We'll
wait until there's an OS tcp timeout, no? It's bad enough that there
were cases of this before. Increasing the surface of cases where we
might want to shut down walreceiver, e.g. because we would rather switch
to recovery_command, or just shut down the server, but just get stuck
waiting for an hour for a tcp timeout, doesn't seem OK.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Laurenz Albe
Date:
Subject: Re: Identity columns should own only one sequence
Next
From: Rob
Date:
Subject: CHAR vs NVARCHAR vs TEXT performance