Thread: Immediate shutdown and system(3)

Immediate shutdown and system(3)

From
Heikki Linnakangas
Date:
We're using SIGQUIT to signal immediate shutdown request. Upon receiving 
SIGQUIT, postmaster in turn kills all the child processes with SIGQUIT 
and exits.

This is a problem when child processes use system(3) to call other 
programs. We use system(3) in two places: to execute archive_command and 
restore_command. Fujii Masao identified this with pg_standby back in 
November:

http://archives.postgresql.org/message-id/3f0b79eb0811280156s78a3730en73aca49b6e95d3cb@mail.gmail.com
and recently discussed here
http://archives.postgresql.org/message-id/3f0b79eb0902260919l2675aaafq10e5b2d49ebfa3a1@mail.gmail.com

I'm starting a new thread to bring this to attention of those who 
haven't been following the hot standby stuff. pg_standby has a 
particular problem because it traps SIGQUIT to mean "end recovery, 
promote standby to master", which it shouldn't do IMHO. But ignoring 
that for a moment, the problem is generic.

SIGQUIT by default dumps core. That's not what we want to happen on 
immediate shutdown. All PostgreSQL processes trap SIGQUIT to exit 
immediately instead, but external commands will dump core. system(3) 
ignores SIGQUIT, so we can't trap it in the parent process; it is always 
relayed to the child.

There's a few options on how to fix that:

1. Implement a custom version of system(3) using fork+exec that let's us 
trap SIGQUIT and send e.g SIGTERM or SIGINT to the child instead. It 
might be a bit tricky to get this right in a portable way; Windows would 
certainly need a completely separate implementation.

2. Use a signal other than SIGQUIT for immediate shutdown of child 
processes. We can't change the signal sent to postmaster for 
backwards-compatibility reasons, but the signal sent by postmaster to 
child processes we could change. We've already used all signals in 
normal backends, but perhaps we could rearrange them.

3. Use SIGINT instead of SIGQUIT for immediate shutdown of the two child 
processes that use system(3): the archiver process and the startup 
process. Neither of them use SIGINT currently. SIGINT is ignored by 
system(3), like SIGQUIT, but the default action is to terminate the 
process rather than core dump. Unfortunately pg_standby traps SIGINT too 
to mean "promote to master", but we could change it to use SIGUSR1 
instead for that purpose. If someone has a script that uses "killall 
-INT pg_standby" to promote a standby server to master, it would need to 
be changed. Looking at the manual page of pg_standby, however, it seems 
that the kill-method of triggering a promotion isn't documented, so with 
a notice in release notes we could do that.

I'm leaning towards option 3, but I wonder if anyone sees a better solution.

This is all for CVS HEAD. In back-branches, I think we should just 
remove the signal handler for SIGQUIT from pg_standby and leave it at 
that. If you perform an immediate shutdown, you can get a core dump from 
archive_command or restore_command, but that's a minor inconvenience.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Immediate shutdown and system(3)

From
Greg Stark
Date:
On Fri, Feb 27, 2009 at 9:52 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
>
> 2. Use a signal other than SIGQUIT for immediate shutdown of child
> processes. We can't change the signal sent to postmaster for
> backwards-compatibility reasons, but the signal sent by postmaster to child
> processes we could change. We've already used all signals in normal
> backends, but perhaps we could rearrange them.

This isn't the first time we've run into the problem that we've run
out of signals. I think we need to multiplex all our event signals
onto a single signal and use some other mechanism to indicate the type
of message.

Perhaps we do need two signals though, so subprocesses don't need to
connect to shared memory to distinguish "exit now" from other events.
SIGINT for "exit now" and USR1 for every postgres-internal signal
using shared memory to determine the meaning sounds like the most
logical arrangement to me.

Do we really need a "promote to master" message at all? Is pg_standby
responsible for this or could the master write out the configuration
changes necessary itself?

-- 
greg


Re: Immediate shutdown and system(3)

From
Heikki Linnakangas
Date:
Greg Stark wrote:
> This isn't the first time we've run into the problem that we've run
> out of signals. I think we need to multiplex all our event signals
> onto a single signal and use some other mechanism to indicate the type
> of message.

Yeah. A patch to do that was discussed a while ago, as Fujii's 
synchronous replication patch bumped into that as well. I don't feel 
like changing the signaling so dramatically right now, however.

> Do we really need a "promote to master" message at all? Is pg_standby
> responsible for this or could the master write out the configuration
> changes necessary itself?

The way pg_standby works is that it keeps waiting for new WAL files to 
arrive, until it's told to stop and return a non-zero exit code. 
Non-zero exit code from restore_command basically means "file not 
found", making the startup process to end recovery and start up the 
database. There's two ways to tell pg_standby to stop: create a trigger 
file with a particular name, or signal it with SIGINT or SIGQUIT.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Immediate shutdown and system(3)

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Greg Stark wrote:
>> This isn't the first time we've run into the problem that we've run
>> out of signals. I think we need to multiplex all our event signals
>> onto a single signal and use some other mechanism to indicate the type
>> of message.

> Yeah. A patch to do that was discussed a while ago, as Fujii's 
> synchronous replication patch bumped into that as well. I don't feel 
> like changing the signaling so dramatically right now, however.

It's not really a feasible answer anyway for auxiliary processes that
have no need to be connected to shared memory.
        regards, tom lane


Re: Immediate shutdown and system(3)

From
Fujii Masao
Date:
Hi,

On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> We're using SIGQUIT to signal immediate shutdown request. Upon receiving
> SIGQUIT, postmaster in turn kills all the child processes with SIGQUIT and
> exits.
>
> This is a problem when child processes use system(3) to call other programs.
> We use system(3) in two places: to execute archive_command and
> restore_command. Fujii Masao identified this with pg_standby back in
> November:
>
> http://archives.postgresql.org/message-id/3f0b79eb0811280156s78a3730en73aca49b6e95d3cb@mail.gmail.com
> and recently discussed here
> http://archives.postgresql.org/message-id/3f0b79eb0902260919l2675aaafq10e5b2d49ebfa3a1@mail.gmail.com
>
> I'm starting a new thread to bring this to attention of those who haven't
> been following the hot standby stuff. pg_standby has a particular problem
> because it traps SIGQUIT to mean "end recovery, promote standby to master",
> which it shouldn't do IMHO. But ignoring that for a moment, the problem is
> generic.
>
> SIGQUIT by default dumps core. That's not what we want to happen on
> immediate shutdown. All PostgreSQL processes trap SIGQUIT to exit
> immediately instead, but external commands will dump core. system(3) ignores
> SIGQUIT, so we can't trap it in the parent process; it is always relayed to
> the child.
>
> There's a few options on how to fix that:
>
> 1. Implement a custom version of system(3) using fork+exec that let's us
> trap SIGQUIT and send e.g SIGTERM or SIGINT to the child instead. It might
> be a bit tricky to get this right in a portable way; Windows would certainly
> need a completely separate implementation.
>
> 2. Use a signal other than SIGQUIT for immediate shutdown of child
> processes. We can't change the signal sent to postmaster for
> backwards-compatibility reasons, but the signal sent by postmaster to child
> processes we could change. We've already used all signals in normal
> backends, but perhaps we could rearrange them.
>
> 3. Use SIGINT instead of SIGQUIT for immediate shutdown of the two child
> processes that use system(3): the archiver process and the startup process.
> Neither of them use SIGINT currently. SIGINT is ignored by system(3), like
> SIGQUIT, but the default action is to terminate the process rather than core
> dump. Unfortunately pg_standby traps SIGINT too to mean "promote to master",
> but we could change it to use SIGUSR1 instead for that purpose. If someone
> has a script that uses "killall -INT pg_standby" to promote a standby server
> to master, it would need to be changed. Looking at the manual page of
> pg_standby, however, it seems that the kill-method of triggering a promotion
> isn't documented, so with a notice in release notes we could do that.
>
> I'm leaning towards option 3, but I wonder if anyone sees a better solution.

4. Use the shared memory to tell the startup process about the shutdown state.
When a shutdown signal arrives, postmaster sets the corresponding shutdown
state to the shared memory before signaling to the child processes. The startup
process check the shutdown state whenever executing system(), and determine
how to exit according to that state. This solution doesn't change any existing
behavior of pg_standby. What is your opinion?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Immediate shutdown and system(3)

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> I'm leaning towards option 3, but I wonder if anyone sees a better solution.
> 
> 4. Use the shared memory to tell the startup process about the shutdown state.
> When a shutdown signal arrives, postmaster sets the corresponding shutdown
> state to the shared memory before signaling to the child processes. The startup
> process check the shutdown state whenever executing system(), and determine
> how to exit according to that state. This solution doesn't change any existing
> behavior of pg_standby. What is your opinion?

That would only solve the problem for pg_standby. Other programs you 
might use as a restore_command or archive_command like "cp" or "rsync" 
would still core dump on the SIGQUIT.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Immediate shutdown and system(3)

From
ITAGAKI Takahiro
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

> 1. Implement a custom version of system(3) using fork+exec that let's us 
> trap SIGQUIT and send e.g SIGTERM or SIGINT to the child instead. It 
> might be a bit tricky to get this right in a portable way; Windows would 
> certainly need a completely separate implementation.

I think the custom system() approach is the most ideal plan for us because
it could open the door for faster recovery; If there were an asynchronous
version of system(), startup process could parallelly execute both
restoring archived wal files and redoing operations in them.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




Re: Immediate shutdown and system(3)

From
Zdenek Kotala
Date:
Dne  2.03.09 08:59, Heikki Linnakangas napsal(a):
> Fujii Masao wrote:
>> On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>> I'm leaning towards option 3, but I wonder if anyone sees a better 
>>> solution.
>>
>> 4. Use the shared memory to tell the startup process about the 
>> shutdown state.
>> When a shutdown signal arrives, postmaster sets the corresponding 
>> shutdown
>> state to the shared memory before signaling to the child processes. 
>> The startup
>> process check the shutdown state whenever executing system(), and 
>> determine
>> how to exit according to that state. This solution doesn't change any 
>> existing
>> behavior of pg_standby. What is your opinion?
> 
> That would only solve the problem for pg_standby. Other programs you 
> might use as a restore_command or archive_command like "cp" or "rsync" 
> would still core dump on the SIGQUIT.
> 

I think that we could have two methods. Extended method will use share 
memory to say what child should do and standard which send appropriate 
signal to child. For example pg_ctl could use extended communication to 
better postmaster controlling.
Zdenek


Re: Immediate shutdown and system(3)

From
Heikki Linnakangas
Date:
Zdenek Kotala wrote:
> Dne  2.03.09 08:59, Heikki Linnakangas napsal(a):
>> Fujii Masao wrote:
>>> On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
>>> <heikki.linnakangas@enterprisedb.com> wrote:
>>>> I'm leaning towards option 3, but I wonder if anyone sees a better 
>>>> solution.
>>>
>>> 4. Use the shared memory to tell the startup process about the 
>>> shutdown state.
>>> When a shutdown signal arrives, postmaster sets the corresponding 
>>> shutdown
>>> state to the shared memory before signaling to the child processes. 
>>> The startup
>>> process check the shutdown state whenever executing system(), and 
>>> determine
>>> how to exit according to that state. This solution doesn't change any 
>>> existing
>>> behavior of pg_standby. What is your opinion?
>>
>> That would only solve the problem for pg_standby. Other programs you 
>> might use as a restore_command or archive_command like "cp" or "rsync" 
>> would still core dump on the SIGQUIT.
>>
> 
> I think that we could have two methods. Extended method will use share 
> memory to say what child should do and standard which send appropriate 
> signal to child. For example pg_ctl could use extended communication to 
> better postmaster controlling.

The problem isn't in the signaling between external tools like pg_ctl 
and postmaster, but the signaling between postmaster and the child 
processes.

Signal multiplexing would help by releasing some signals, but to kill a 
child process that can be executing an external command with system(3), 
we'd still want to use a signal that does the right thing for external 
commands, per usual Unix semantics. Also, the archiver process currently 
detaches itself from shared memory at start, so using shared memory 
doesn't seem like an improvement.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Immediate shutdown and system(3)

From
Fujii Masao
Date:
Hi,

On Mon, Mar 2, 2009 at 4:59 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Fujii Masao wrote:
>>
>> On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>>
>>> I'm leaning towards option 3, but I wonder if anyone sees a better
>>> solution.
>>
>> 4. Use the shared memory to tell the startup process about the shutdown
>> state.
>> When a shutdown signal arrives, postmaster sets the corresponding shutdown
>> state to the shared memory before signaling to the child processes. The
>> startup
>> process check the shutdown state whenever executing system(), and
>> determine
>> how to exit according to that state. This solution doesn't change any
>> existing
>> behavior of pg_standby. What is your opinion?
>
> That would only solve the problem for pg_standby. Other programs you might
> use as a restore_command or archive_command like "cp" or "rsync" would still
> core dump on the SIGQUIT.

Right. I've just understood your intention. I also agree with option 3 if nobody
complains about lack of backward compatibility of pg_standby. If no, how about
using SIGUSR2 instead of SIGINT for immediate shutdown of only the archiver
and the startup process. SIGUSR2 by default terminates the process.
The archiver already uses SIGUSR2 for pgarch_waken_stop, so we need to
reassign that function to another signal (SIGINT is suitable, I think).
This solution doesn't need signal multiplexing. Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Immediate shutdown and system(3)

From
Heikki Linnakangas
Date:
Per discussion, here's a patch for pg_standby in REL8_3_STABLE. The
signal handling is changed so that SIGQUIT no longer triggers failover,
but immediately kills pg_standby, triggering FATAL death of the startup
process too. That's what you want with immediate shutdown.

SIGUSR1 is now accepted as a signal to trigger failover. SIGINT is still
accepted too, but that should be considered deprecated since we're
likely to use SIGINT for immediate shutdown (for startup process) in 8.4.

We should document the use of signals to trigger failover in the
manual... Any volunteers?

This should be noted in the release notes:

If you are using pg_standby, and if you are using signals (e.g "killall
-SIGINT pg_standby") to trigger failover, change your scripts to use
SIGUSR1 instead of SIGQUIT or SIGINT. SIGQUIT no longer triggers
failover, but aborts the recovery and shuts down the standby database.
SIGINT is still accepted as failover trigger, but should be considered
as deprecated and will also be changed to trigger immediate shutdown in
a future release.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
Index: pg_standby.c
===================================================================
RCS file: /cvsroot/pgsql/contrib/pg_standby/pg_standby.c,v
retrieving revision 1.10.2.3
diff -c -r1.10.2.3 pg_standby.c
*** pg_standby.c    6 Jan 2009 17:27:19 -0000    1.10.2.3
--- pg_standby.c    4 Mar 2009 09:13:34 -0000
***************
*** 451,464 ****
      signaled = true;
  }

  /*------------ MAIN ----------------------------------------*/
  int
  main(int argc, char **argv)
  {
      int            c;

!     (void) signal(SIGINT, sighandler);
!     (void) signal(SIGQUIT, sighandler);

      while ((c = getopt(argc, argv, "cdk:lr:s:t:w:")) != -1)
      {
--- 451,487 ----
      signaled = true;
  }

+ /* We don't want SIGQUIT to core dump */
+ static void
+ sigquit_handler(int sig)
+ {
+     signal(SIGINT, SIG_DFL);
+     kill(getpid(), SIGINT);
+ }
+
+
  /*------------ MAIN ----------------------------------------*/
  int
  main(int argc, char **argv)
  {
      int            c;

!     /*
!      * You can send SIGUSR1 to trigger failover.
!      *
!      * Postmaster uses SIGQUIT to request immediate shutdown. The default
!      * action is to core dump, but we don't want that, so trap it and
!      * commit suicide without core dump.
!      *
!      * We used to use SIGINT and SIGQUIT to trigger failover, but that
!      * turned out to be a bad idea because postmaster uses SIGQUIT to
!      * request immediate shutdown. We still trap SIGINT, but that is
!      * deprecated. We will likely switch to using SIGINT for immediate
!      * shutdown in future releases.
!      */
!     (void) signal(SIGUSR1, sighandler);
!     (void) signal(SIGINT, sighandler); /* deprecated, use SIGUSR1 */
!     (void) signal(SIGQUIT, sigquit_handler);

      while ((c = getopt(argc, argv, "cdk:lr:s:t:w:")) != -1)
      {

Re: Immediate shutdown and system(3)

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> Hi,
> 
> On Mon, Mar 2, 2009 at 4:59 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Fujii Masao wrote:
>>> On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
>>> <heikki.linnakangas@enterprisedb.com> wrote:
>>>> I'm leaning towards option 3, but I wonder if anyone sees a better
>>>> solution.
>>> 4. Use the shared memory to tell the startup process about the shutdown
>>> state.
>>> When a shutdown signal arrives, postmaster sets the corresponding shutdown
>>> state to the shared memory before signaling to the child processes. The
>>> startup
>>> process check the shutdown state whenever executing system(), and
>>> determine
>>> how to exit according to that state. This solution doesn't change any
>>> existing
>>> behavior of pg_standby. What is your opinion?
>> That would only solve the problem for pg_standby. Other programs you might
>> use as a restore_command or archive_command like "cp" or "rsync" would still
>> core dump on the SIGQUIT.
> 
> Right. I've just understood your intention. I also agree with option 3 if nobody
> complains about lack of backward compatibility of pg_standby. If no, how about
> using SIGUSR2 instead of SIGINT for immediate shutdown of only the archiver
> and the startup process. SIGUSR2 by default terminates the process.
> The archiver already uses SIGUSR2 for pgarch_waken_stop, so we need to
> reassign that function to another signal (SIGINT is suitable, I think).
> This solution doesn't need signal multiplexing. Thought?

Hmm, the startup/archiver process would then in turn need to kill the 
external command with SIGINT. I guess that would work.

There's a problem with my idea of just using SIGINT instead of SIGQUIT. 
Some (arguably bad-behaving) programs trap SIGINT and exit() with a 
return code. The startup process won't recognize that as "killed by 
signal", and we're back to same problem we have with pg_standby that the 
startup process doesn't die but continues with the startup. Notably 
rsync seems to behave like that.

BTW, searching the archive, I found this long thread about this same issue:

http://archives.postgresql.org/pgsql-hackers/2006-11/msg00406.php

The idea of SIGUSR2 was mentioned there as well, as well as the idea of 
reimplementing system(3). The conclusion of that thread was the usage of 
setsid() and process groups, to ensure that the SIGQUIT is delivered to 
the archive/recovery_command.

I'm starting to feel that this is getting too complicated. Maybe we 
should just fix pg_standby to not trap SIGQUIT, and live with the core 
dumps...

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Immediate shutdown and system(3)

From
Heikki Linnakangas
Date:
Ok, I've committed a minimal patch to pg_standby in CVS HEAD and 
REL8_3_STABLE to not interpret SIGQUIT as a signal for failover. I added 
a signal handler for SIGUSR1 to trigger failover; that should be 
considered the preferred signal for that, even though SIGINT still works 
too.

SIGQUIT is trapped to just die immediately, but without core dumping. As 
we still use SIGQUIT for immediate shutdown, any other archive_command 
or restore_command will still receive SIGQUIT on immediate shutdown, and 
by default dump core. Let's just live with that for now..

This should be mentioned in release notes, as any script that might be 
using SIGQUIT at the moment needs to be changed to use SIGUSR1 or SIGINT 
instead. Where should I make a note of that so that we don't forget?

Heikki Linnakangas wrote:
> Fujii Masao wrote:
>> Hi,
>>
>> On Mon, Mar 2, 2009 at 4:59 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>> Fujii Masao wrote:
>>>> On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
>>>> <heikki.linnakangas@enterprisedb.com> wrote:
>>>>> I'm leaning towards option 3, but I wonder if anyone sees a better
>>>>> solution.
>>>> 4. Use the shared memory to tell the startup process about the shutdown
>>>> state.
>>>> When a shutdown signal arrives, postmaster sets the corresponding 
>>>> shutdown
>>>> state to the shared memory before signaling to the child processes. The
>>>> startup
>>>> process check the shutdown state whenever executing system(), and
>>>> determine
>>>> how to exit according to that state. This solution doesn't change any
>>>> existing
>>>> behavior of pg_standby. What is your opinion?
>>> That would only solve the problem for pg_standby. Other programs you 
>>> might
>>> use as a restore_command or archive_command like "cp" or "rsync" 
>>> would still
>>> core dump on the SIGQUIT.
>>
>> Right. I've just understood your intention. I also agree with option 3 
>> if nobody
>> complains about lack of backward compatibility of pg_standby. If no, 
>> how about
>> using SIGUSR2 instead of SIGINT for immediate shutdown of only the 
>> archiver
>> and the startup process. SIGUSR2 by default terminates the process.
>> The archiver already uses SIGUSR2 for pgarch_waken_stop, so we need to
>> reassign that function to another signal (SIGINT is suitable, I think).
>> This solution doesn't need signal multiplexing. Thought?
> 
> Hmm, the startup/archiver process would then in turn need to kill the 
> external command with SIGINT. I guess that would work.
> 
> There's a problem with my idea of just using SIGINT instead of SIGQUIT. 
> Some (arguably bad-behaving) programs trap SIGINT and exit() with a 
> return code. The startup process won't recognize that as "killed by 
> signal", and we're back to same problem we have with pg_standby that the 
> startup process doesn't die but continues with the startup. Notably 
> rsync seems to behave like that.
> 
> BTW, searching the archive, I found this long thread about this same issue:
> 
> http://archives.postgresql.org/pgsql-hackers/2006-11/msg00406.php
> 
> The idea of SIGUSR2 was mentioned there as well, as well as the idea of 
> reimplementing system(3). The conclusion of that thread was the usage of 
> setsid() and process groups, to ensure that the SIGQUIT is delivered to 
> the archive/recovery_command.
> 
> I'm starting to feel that this is getting too complicated. Maybe we 
> should just fix pg_standby to not trap SIGQUIT, and live with the core 
> dumps...


--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Immediate shutdown and system(3)

From
Andrew Dunstan
Date:

Heikki Linnakangas wrote:
> Ok, I've committed a minimal patch to pg_standby in CVS HEAD and 
> REL8_3_STABLE to not interpret SIGQUIT as a signal for failover. I 
> added a signal handler for SIGUSR1 to trigger failover; that should be 
> considered the preferred signal for that, even though SIGINT still 
> works too.
>
> SIGQUIT is trapped to just die immediately, but without core dumping. 
> As we still use SIGQUIT for immediate shutdown, any other 
> archive_command or restore_command will still receive SIGQUIT on 
> immediate shutdown, and by default dump core. Let's just live with 
> that for now..
>
> This should be mentioned in release notes, as any script that might be 
> using SIGQUIT at the moment needs to be changed to use SIGUSR1 or 
> SIGINT instead. Where should I make a note of that so that we don't 
> forget?
>
>

Unless I'm missing it the use of signals to trigger failover is not 
documented AT ALL. So why anyone would expect such behaviour is 
something of a mystery.

Perhaps doing that would be even more important than release notes.

cheers

andrew


Re: Immediate shutdown and system(3)

From
Heikki Linnakangas
Date:
Andrew Dunstan wrote:
> Heikki Linnakangas wrote:
>> This should be mentioned in release notes, as any script that might be 
>> using SIGQUIT at the moment needs to be changed to use SIGUSR1 or 
>> SIGINT instead. Where should I make a note of that so that we don't 
>> forget?
> 
> Unless I'm missing it the use of signals to trigger failover is not 
> documented AT ALL. So why anyone would expect such behaviour is 
> something of a mystery.

Well, some people do read source code. If it was more widely known, I 
would hesitate more to change it, though.

> Perhaps doing that would be even more important than release notes.

Agreed it should be documented.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Immediate shutdown and system(3)

From
Bruce Momjian
Date:
Heikki Linnakangas wrote:
> This should be mentioned in release notes, as any script that might be 
> using SIGQUIT at the moment needs to be changed to use SIGUSR1 or SIGINT 
> instead. Where should I make a note of that so that we don't forget?

The CVS commit message.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Immediate shutdown and system(3)

From
Robert Haas
Date:
On Wed, Mar 18, 2009 at 4:40 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Heikki Linnakangas wrote:
>> This should be mentioned in release notes, as any script that might be
>> using SIGQUIT at the moment needs to be changed to use SIGUSR1 or SIGINT
>> instead. Where should I make a note of that so that we don't forget?
>
> The CVS commit message.

Is there some reason we don't just put it in the release notes as
*part* of the commit?  Someone can always go back and edit it later.
It seems like that would be easier and less error-prone than grepping
the CVS commit logs for "release notes"...

...Robert


Re: Immediate shutdown and system(3)

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Mar 18, 2009 at 4:40 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> The CVS commit message.

> Is there some reason we don't just put it in the release notes as
> *part* of the commit?  Someone can always go back and edit it later.

That was suggested before, and I think we actually tried it for a few
months.  It didn't work.

Putting an item in the release notes *properly* is a whole lot more
work than putting a short bit of text in the CVS log (especially for
committers whose first language isn't English).  It would also
create a lot more merge-collision issues for unrelated patches.

It's less trouble overall to do the editing, organizing, and SGML-ifying
of all the release notes at once.  Also you end up with a better
product, assuming that whoever is doing the notes puts in reasonable
editorial effort.
        regards, tom lane


Re: Immediate shutdown and system(3)

From
Robert Haas
Date:
On Wed, Mar 18, 2009 at 5:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Mar 18, 2009 at 4:40 PM, Bruce Momjian <bruce@momjian.us> wrote:
>>> The CVS commit message.
>
>> Is there some reason we don't just put it in the release notes as
>> *part* of the commit?  Someone can always go back and edit it later.
>
> That was suggested before, and I think we actually tried it for a few
> months.  It didn't work.
>
> Putting an item in the release notes *properly* is a whole lot more
> work than putting a short bit of text in the CVS log (especially for
> committers whose first language isn't English).  It would also
> create a lot more merge-collision issues for unrelated patches.

Yeah, I wouldn't ask people to include it in the patches they post.
That would be a pain, and people would probably tend (with the best of
intentions) to inflate the relative importance of their own work.  I
was thinking that the committer could make a quick entry at the time
they actually committed the patch, so that the step you describe below
could start with something other than an email box.

> It's less trouble overall to do the editing, organizing, and SGML-ifying
> of all the release notes at once.  Also you end up with a better
> product, assuming that whoever is doing the notes puts in reasonable
> editorial effort.

If it works for the people who are doing it, good enough.

...Robert