Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket
Date
Msg-id 61373517-e5d5-a718-75a4-a4022cc48db4@iki.fi
Whole thread Raw
In response to Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
On 08/08/18 19:19, Heikki Linnakangas wrote:
> So, with this commit, the various SIGQUIT quickdie handlers are in a
> better shape. The regular backend's quickdie() handler still calls
> ereport(), which is not safe, but it's a step in the right direction.
> And we haven't addressed the original complaint, which was actually
> about startup_die(), not quickdie().
> 
> What should we do about startup_die()? Ideas?

To recap: If a backend process receives a SIGTERM, or if the 
authentication timeout expires, while the backend is reading the startup 
packet, we call proc_exit(0) from the signal handler. proc_exit(0) is 
clearly not async-safe, so if the process was busy modifying backend 
state, for example doing a memory allocation, bad things can happen. In 
the beginning of this thread, Jimmy demonstrated a self-deadlock, when 
the backend was in the middle of proc_exit(), when the signal arrived, 
and a nested call to proc_exit was made.

I think the proper fix here is to stop calling proc_exit(0) directly 
from the startup_die() signal handler. Instead, let's install the 
regular die() signal handler earlier, before reading the startup 
process, and rely on the usual CHECK_FOR_INTERRUPTS() mechanism.

One annoying thing is the pg_getnameinfo_all() call, in 
BackendInitialize(). If 'log_hostname' is enabled, it can perform a DNS 
lookup, which can take a very long time. It would be nice to still be 
able to interrupt that, but if we stop calling proc_exit() from the 
signal handler, we won't. I don't think it's safe to interrupt it, like 
we do today, but I wonder if the cure is worse than the disease.

We revamped the signal handling in backend processes in PostgreSQL 9.5, 
so I'm inclined to only backpatch this to 9.5. In 9.3 and 9.4, let's 
just add a '!proc_exit_in_progress' check to the signal handler, to 
prevent the nested proc_eixt() call that Jimmy ran into from happening. 
It won't fix the general problem, but since we haven't heard any other 
reports about this, I think that's the right amount of effort for 9.3 
and 9.4.

- Heikki


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Temporary tables prevent autovacuum, leading to XID wraparound
Next
From: Robert Haas
Date:
Subject: Re: Get funcid when create function