Re: BUG #16199: pg_restore stuck on interrupts - Mailing list pgsql-bugs

From Raúl Marín
Subject Re: BUG #16199: pg_restore stuck on interrupts
Date
Msg-id 67ed46c3-da21-6397-f7b0-43f8ce2bd823@rmr.ninja
Whole thread Raw
In response to Re: BUG #16199: pg_restore stuck on interrupts  (Raúl Marín <admin@rmr.ninja>)
Responses Re: BUG #16199: pg_restore stuck on interrupts
List pgsql-bugs
Hi,

After a little more than week with half of the servers in the farm 
patched, the issue has only appeared in the unpatched servers and I 
haven't seen any odd behaviour from the ones patched.


I'm attaching the patch applied for PG11, but it applies cleanly in 
master and PG12 too.

Regards,
Raúl Marín.



On 8/1/20 18:53, Raúl Marín wrote:
> On 8/1/20 18:13, Tom Lane wrote:
> 
>> You didn't actually say, but you must be interrupting parallel restores
>> with SIGINT or the like?
> 
> Yes, CI (Jenkins) is interrupted automatically with new pushes and 
> that's supposed to send a SIGTERM to the process group, which includes 
> the pg_restore process.
> 
> 
>> sigTermHandler tries to be safe to run in a signal context, but I'm
>> afraid we didn't think hard about what exit() might call.  The way
>> I'd be inclined to fix this is to call _exit() instead of exit(),
>> and the heck with what any atexit handlers think.  Can you try that
>> and see if it improves matters for you?
> 
> 
> Initially I didn't like this idea since that means not cleaning up 
> gnutls stuff, and modifying things related to crypto is always scary; 
> but following the same reasoning, I trust that any good cryto library 
> shouldn't leak anything important due to a fast exit.
> 
> I'll set up some of the servers to use _exit() for some days and see if 
> that fixes it.
> 
> Thanks!
> Raúl Marín.


Attachment

pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: REINDEX CONCURRENTLY unexpectedly fails
Next
From: PG Bug reporting form
Date:
Subject: BUG #16219: EvalPlanQualFetchRowMark segfaults on Updates