Home > mailing lists

Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket
Date	August 13, 2018 16:53:05
Msg-id	61373517-e5d5-a718-75a4-a4022cc48db4@iki.fi Whole thread Raw
In response to	Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket (Heikki Linnakangas <hlinnaka@iki.fi>)
List	pgsql-hackers

Tree view

On 08/08/18 19:19, Heikki Linnakangas wrote:
> So, with this commit, the various SIGQUIT quickdie handlers are in a
> better shape. The regular backend's quickdie() handler still calls
> ereport(), which is not safe, but it's a step in the right direction.
> And we haven't addressed the original complaint, which was actually
> about startup_die(), not quickdie().
> 
> What should we do about startup_die()? Ideas?

To recap: If a backend process receives a SIGTERM, or if the 
authentication timeout expires, while the backend is reading the startup 
packet, we call proc_exit(0) from the signal handler. proc_exit(0) is 
clearly not async-safe, so if the process was busy modifying backend 
state, for example doing a memory allocation, bad things can happen. In 
the beginning of this thread, Jimmy demonstrated a self-deadlock, when 
the backend was in the middle of proc_exit(), when the signal arrived, 
and a nested call to proc_exit was made.

I think the proper fix here is to stop calling proc_exit(0) directly 
from the startup_die() signal handler. Instead, let's install the 
regular die() signal handler earlier, before reading the startup 
process, and rely on the usual CHECK_FOR_INTERRUPTS() mechanism.

One annoying thing is the pg_getnameinfo_all() call, in 
BackendInitialize(). If 'log_hostname' is enabled, it can perform a DNS 
lookup, which can take a very long time. It would be nice to still be 
able to interrupt that, but if we stop calling proc_exit() from the 
signal handler, we won't. I don't think it's safe to interrupt it, like 
we do today, but I wonder if the cure is worse than the disease.

We revamped the signal handling in backend processes in PostgreSQL 9.5, 
so I'm inclined to only backpatch this to 9.5. In 9.3 and 9.4, let's 
just add a '!proc_exit_in_progress' check to the signal handler, to 
prevent the nested proc_eixt() call that Jimmy ran into from happening. 
It won't fix the general problem, but since we haven't heard any other 
reports about this, I think that's the right amount of effort for 9.3 
and 9.4.

- Heikki

pgsql-hackers by date:

From: Andres Freund
Date: 13 August 2018, 15:56:16
Subject: Re: Temporary tables prevent autovacuum, leading to XID wraparound

From: Robert Haas
Date: 13 August 2018, 16:57:05
Subject: Re: Get funcid when create function

Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket - Mailing list pgsql-hackers

Previous

Next