Re: Immediate shutdown and system(3) - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Immediate shutdown and system(3)
Date
Msg-id 49AE5F8E.9010403@enterprisedb.com
Whole thread Raw
In response to Re: Immediate shutdown and system(3)  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Immediate shutdown and system(3)  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
Fujii Masao wrote:
> Hi,
> 
> On Mon, Mar 2, 2009 at 4:59 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Fujii Masao wrote:
>>> On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
>>> <heikki.linnakangas@enterprisedb.com> wrote:
>>>> I'm leaning towards option 3, but I wonder if anyone sees a better
>>>> solution.
>>> 4. Use the shared memory to tell the startup process about the shutdown
>>> state.
>>> When a shutdown signal arrives, postmaster sets the corresponding shutdown
>>> state to the shared memory before signaling to the child processes. The
>>> startup
>>> process check the shutdown state whenever executing system(), and
>>> determine
>>> how to exit according to that state. This solution doesn't change any
>>> existing
>>> behavior of pg_standby. What is your opinion?
>> That would only solve the problem for pg_standby. Other programs you might
>> use as a restore_command or archive_command like "cp" or "rsync" would still
>> core dump on the SIGQUIT.
> 
> Right. I've just understood your intention. I also agree with option 3 if nobody
> complains about lack of backward compatibility of pg_standby. If no, how about
> using SIGUSR2 instead of SIGINT for immediate shutdown of only the archiver
> and the startup process. SIGUSR2 by default terminates the process.
> The archiver already uses SIGUSR2 for pgarch_waken_stop, so we need to
> reassign that function to another signal (SIGINT is suitable, I think).
> This solution doesn't need signal multiplexing. Thought?

Hmm, the startup/archiver process would then in turn need to kill the 
external command with SIGINT. I guess that would work.

There's a problem with my idea of just using SIGINT instead of SIGQUIT. 
Some (arguably bad-behaving) programs trap SIGINT and exit() with a 
return code. The startup process won't recognize that as "killed by 
signal", and we're back to same problem we have with pg_standby that the 
startup process doesn't die but continues with the startup. Notably 
rsync seems to behave like that.

BTW, searching the archive, I found this long thread about this same issue:

http://archives.postgresql.org/pgsql-hackers/2006-11/msg00406.php

The idea of SIGUSR2 was mentioned there as well, as well as the idea of 
reimplementing system(3). The conclusion of that thread was the usage of 
setsid() and process groups, to ensure that the SIGQUIT is delivered to 
the archive/recovery_command.

I'm starting to feel that this is getting too complicated. Maybe we 
should just fix pg_standby to not trap SIGQUIT, and live with the core 
dumps...

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Guillaume Smet
Date:
Subject: Re: [BUGS] BUG #4689: Expanding the length of a VARCHAR column should not induce a table rewrite
Next
From: Emmanuel Cecchet
Date:
Subject: Re: Regclass and quoted table names