Thread: pg_receivewal and SIGTERM
There's a smallish backup tool called pg_backupcluster in Debian's postgresql-common which also ships a systemd service that runs pg_receivewal for wal archiving, and supplies a pg_getwal script for reading the files back on restore, including support for .partial files. So far the machinery was using plain files and relied on compressing the WALs from time to time, but now I wanted to compress the files directly from pg_receivewal --compress=5. Unfortunately this broke the regression tests that include a test for the .partial files where pg_receivewal.service is shut down before the segment is full. The problem was that systemd's default KillSignal is SIGTERM, while pg_receivewal flushes the output compression buffers on SIGINT only. The attached patch makes it do the same for SIGTERM as well. (Most places in PG that install a SIGINT handler also install a SIGTERM handler already.) Christoph
Attachment
> On 15 Aug 2022, at 14:45, Christoph Berg <myon@debian.org> wrote: > The problem was that systemd's default KillSignal is SIGTERM, while > pg_receivewal flushes the output compression buffers on SIGINT only. Supporting SIGTERM here makes sense, especially given how systemd works. > The attached patch makes it do the same for SIGTERM as well. (Most > places in PG that install a SIGINT handler also install a SIGTERM > handler already.) Not really when it comes to utilities though; initdb, pg_dump and pg_test_fsync seems to be the ones doing so. (That's probably mostly due to them not running in a daemon-like way as what's discussed here.) Do you think pg_recvlogical should support SIGTERM as well? (The signals which it does trap should be added to the documentation which just now says "until terminated by a signal" but that's a separate thing.) pqsignal(SIGINT, sigint_handler); + pqsignal(SIGTERM, sigint_handler); Tiny nitpick, I think we should rename sigint_handler to just sig_handler as it does handle more than sigint. In relation to this. Reading over this and looking around I realized that the documentation for pg_waldump lacks a closing parenthesis on Ctrl+C so I will be pushing the below to fix it: --- a/doc/src/sgml/ref/pg_waldump.sgml +++ b/doc/src/sgml/ref/pg_waldump.sgml @@ -263,7 +263,7 @@ PostgreSQL documentation <para> If <application>pg_waldump</application> is terminated by signal <systemitem>SIGINT</systemitem> - (<keycombo action="simul"><keycap>Control</keycap><keycap>C</keycap></keycombo>, + (<keycombo action="simul"><keycap>Control</keycap><keycap>C</keycap></keycombo>), the summary of the statistics computed is displayed up to the termination point. This operation is not supported on <productname>Windows</productname>. -- Daniel Gustafsson https://vmware.com/
Re: Daniel Gustafsson > Do you think pg_recvlogical should support SIGTERM as well? (The signals which > it does trap should be added to the documentation which just now says "until > terminated by a signal" but that's a separate thing.) Ack, that makes sense, added in the attached updated patch. > pqsignal(SIGINT, sigint_handler); > + pqsignal(SIGTERM, sigint_handler); > Tiny nitpick, I think we should rename sigint_handler to just sig_handler as it > does handle more than sigint. I went with sigexit_handler since pg_recvlogical has also a sighup_handler and "sig_handler" would be confusing there. Christoph
Attachment
On Tue, Aug 16, 2022 at 5:06 PM Christoph Berg <myon@debian.org> wrote: > > Re: Daniel Gustafsson > > Do you think pg_recvlogical should support SIGTERM as well? (The signals which > > it does trap should be added to the documentation which just now says "until > > terminated by a signal" but that's a separate thing.) > > Ack, that makes sense, added in the attached updated patch. > > > pqsignal(SIGINT, sigint_handler); > > + pqsignal(SIGTERM, sigint_handler); > > Tiny nitpick, I think we should rename sigint_handler to just sig_handler as it > > does handle more than sigint. > > I went with sigexit_handler since pg_recvlogical has also a > sighup_handler and "sig_handler" would be confusing there. Can we move these signal handlers to streamutil.h/.c so that both pg_receivewal and pg_recvlogical can make use of it avoiding duplicate code? -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
> On 16 Aug 2022, at 13:40, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Tue, Aug 16, 2022 at 5:06 PM Christoph Berg <myon@debian.org> wrote: >> I went with sigexit_handler since pg_recvlogical has also a >> sighup_handler and "sig_handler" would be confusing there. > > Can we move these signal handlers to streamutil.h/.c so that both > pg_receivewal and pg_recvlogical can make use of it avoiding duplicate > code? In general that's a good idea, but they are so trivial that I don't really see much point in doing that in this particular case. -- Daniel Gustafsson https://vmware.com/
Re: Daniel Gustafsson > In general that's a good idea, but they are so trivial that I don't really see > much point in doing that in this particular case. Plus the variable they set is called differently... Christoph
> On 16 Aug 2022, at 13:36, Christoph Berg <myon@debian.org> wrote: >> pqsignal(SIGINT, sigint_handler); >> + pqsignal(SIGTERM, sigint_handler); >> Tiny nitpick, I think we should rename sigint_handler to just sig_handler as it >> does handle more than sigint. > > I went with sigexit_handler since pg_recvlogical has also a > sighup_handler and "sig_handler" would be confusing there. Good point, sigexit_handler is a better name here. -- Daniel Gustafsson https://vmware.com/
On Tue, Aug 16, 2022 at 5:15 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 16 Aug 2022, at 13:36, Christoph Berg <myon@debian.org> wrote: > > >> pqsignal(SIGINT, sigint_handler); > >> + pqsignal(SIGTERM, sigint_handler); > >> Tiny nitpick, I think we should rename sigint_handler to just sig_handler as it > >> does handle more than sigint. > > > > I went with sigexit_handler since pg_recvlogical has also a > > sighup_handler and "sig_handler" would be confusing there. > > Good point, sigexit_handler is a better name here. +1. Don't we need a similar explanation [1] for pg_recvlogical docs? [1] <para> In the absence of fatal errors, <application>pg_receivewal</application> - will run until terminated by the <systemitem>SIGINT</systemitem> signal - (<keycombo action="simul"><keycap>Control</keycap><keycap>C</keycap></keycombo>). + will run until terminated by the <systemitem>SIGINT</systemitem> + (<keycombo action="simul"><keycap>Control</keycap><keycap>C</keycap></keycombo>) + or <systemitem>SIGTERM</systemitem> signal. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Re: Bharath Rupireddy > Don't we need a similar explanation [1] for pg_recvlogical docs? > > [1] > <para> > In the absence of fatal errors, <application>pg_receivewal</application> > - will run until terminated by the <systemitem>SIGINT</systemitem> signal > - (<keycombo action="simul"><keycap>Control</keycap><keycap>C</keycap></keycombo>). > + will run until terminated by the <systemitem>SIGINT</systemitem> > + (<keycombo action="simul"><keycap>Control</keycap><keycap>C</keycap></keycombo>) > + or <systemitem>SIGTERM</systemitem> signal. Coped that from pg_receivewal(1) now. Christoph
Attachment
On Tue, Aug 16, 2022 at 7:26 PM Christoph Berg <myon@debian.org> wrote: > > Re: Bharath Rupireddy > > Don't we need a similar explanation [1] for pg_recvlogical docs? > > > > [1] > > <para> > > In the absence of fatal errors, <application>pg_receivewal</application> > > - will run until terminated by the <systemitem>SIGINT</systemitem> signal > > - (<keycombo action="simul"><keycap>Control</keycap><keycap>C</keycap></keycombo>). > > + will run until terminated by the <systemitem>SIGINT</systemitem> > > + (<keycombo action="simul"><keycap>Control</keycap><keycap>C</keycap></keycombo>) > > + or <systemitem>SIGTERM</systemitem> signal. > > Coped that from pg_receivewal(1) now. Thanks. <application>pg_receivewal</application> will exit with status 0 when - terminated by the <systemitem>SIGINT</systemitem> signal. (That is the + terminated by the <systemitem>SIGINT</systemitem> or + <systemitem>SIGTERM</systemitem> signal. (That is the normal way to end it. Hence it is not an error.) For fatal errors or other signals, the exit status will be nonzero. Can we specify the reason in the docs why a SIGTERM causes (which typically would cause a program to end with non-zero exit code) pg_receivewal and pg_recvlogical exit with zero exit code? Having this in the commit message would help developers but the documentation will help users out there. Thoughts? [1] pg_receivewal, pg_recvlogical: Exit cleanly on SIGTERM In pg_receivewal, compressed output is only flushed on clean exits. The reason to support SIGTERM here as well is that pg_receivewal might well be running as a daemon, and systemd's default KillSignal is SIGTERM. Since pg_recvlogical is also supposed to run as a daemon, teach it about SIGTERM as well. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Re: Bharath Rupireddy > <application>pg_receivewal</application> will exit with status 0 when > - terminated by the <systemitem>SIGINT</systemitem> signal. (That is the > + terminated by the <systemitem>SIGINT</systemitem> or > + <systemitem>SIGTERM</systemitem> signal. (That is the > normal way to end it. Hence it is not an error.) For fatal errors or > other signals, the exit status will be nonzero. > > Can we specify the reason in the docs why a SIGTERM causes (which > typically would cause a program to end with non-zero exit code) > pg_receivewal and pg_recvlogical exit with zero exit code? Having this > in the commit message would help developers but the documentation will > help users out there. We could add "because you want that if it's running as a daemon", but TBH, I'd rather remove the parentheses part. It sounds too much like "it works that way because that way is the sane way". Christoph
On Fri, Aug 19, 2022 at 4:24 PM Christoph Berg <myon@debian.org> wrote: > > Re: Bharath Rupireddy > > <application>pg_receivewal</application> will exit with status 0 when > > - terminated by the <systemitem>SIGINT</systemitem> signal. (That is the > > + terminated by the <systemitem>SIGINT</systemitem> or > > + <systemitem>SIGTERM</systemitem> signal. (That is the > > normal way to end it. Hence it is not an error.) For fatal errors or > > other signals, the exit status will be nonzero. > > > > Can we specify the reason in the docs why a SIGTERM causes (which > > typically would cause a program to end with non-zero exit code) > > pg_receivewal and pg_recvlogical exit with zero exit code? Having this > > in the commit message would help developers but the documentation will > > help users out there. > > We could add "because you want that if it's running as a daemon", but +1 to add "some" info in the docs (I'm not sure about the better wording though), we can try to be more specific of the use case if required. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
On Fri, Aug 19, 2022 at 05:34:56PM +0530, Bharath Rupireddy wrote: > +1 to add "some" info in the docs (I'm not sure about the better > wording though), we can try to be more specific of the use case if > required. Yes, the amount of extra docs provided by the patch proposed by Christoph looks fine by me. FWIW, grouping the signal handlers into a common area like streamutil.c seems rather confusing to me, as they set different variable names that rely on their own assumptions in their local file, so I would leave that out, like the patch. While looking at the last patch proposed, it strikes me that time_to_stop should be sig_atomic_t in pg_receivewal.c, as the safe type of variable to set in a signal handler. We could change that, while on it.. Backpatching this stuff is not an issue here. -- Michael
Attachment
Re: Michael Paquier > While looking at the last patch proposed, it strikes me that > time_to_stop should be sig_atomic_t in pg_receivewal.c, as the safe > type of variable to set in a signal handler. We could change that, > while on it.. Done in the attached patch. > Backpatching this stuff is not an issue here. Do you mean it can, or can not be backpatched? (I'd argue for backpatching since the behaviour is slightly broken at the moment.) Christoph
Attachment
On Mon, Aug 22, 2022 at 04:05:16PM +0200, Christoph Berg wrote: > Do you mean it can, or can not be backpatched? (I'd argue for > backpatching since the behaviour is slightly broken at the moment.) I mean that it is fine to backpatch that, in my opinion. -- Michael
Attachment
> On 23 Aug 2022, at 02:15, Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Aug 22, 2022 at 04:05:16PM +0200, Christoph Berg wrote: >> Do you mean it can, or can not be backpatched? (I'd argue for >> backpatching since the behaviour is slightly broken at the moment.) > > I mean that it is fine to backpatch that, in my opinion. I think this can be argued both for and against backpatching. Catching SIGTERM makes a lot of sense, especially given systemd's behavior. On the other hand, This adds functionality to something arguably working as intended, regardless of what one thinks about the intent. The attached adds the Exit Status section to pg_recvlogical docs which is present in pg_receivewal to make them more aligned, and tweaks comments to pgindent standards. This is the version I think is ready to commit. -- Daniel Gustafsson https://vmware.com/
Attachment
On Thu, Aug 25, 2022 at 11:19:05AM +0200, Daniel Gustafsson wrote: > I think this can be argued both for and against backpatching. Catching SIGTERM > makes a lot of sense, especially given systemd's behavior. On the other hand, > This adds functionality to something arguably working as intended, regardless > of what one thinks about the intent. Sure. My view on this matter is that the behavior of the patch is more useful to users as, on HEAD, a SIGTERM is equivalent to a drop of the connection followed by a retry when not using -n. Or do you think that there could be cases where the behavior of HEAD (force a connection drop with the backend and handle the retry infinitely in pg_receivewal/recvlogical) is more useful? systemd can also do retries a certain given of times, so that's moving the ball one layer to the other, at the end. We could also say to just set KillSignal to SIGINT in the docs, but my guess is that few users would actually notice that until they see how pg_receiwal/recvlogical work with systemd's default. FWIW, I've worked on an archiver integration a few years ago and got annoyed that we use SIGINT while SIGTERM was the default (systemd was not directly used there but the signal problem was the same, so we had to go through some loops to make the stop signal configurable, like systemd). -- Michael
Attachment
Re: Michael Paquier > FWIW, I've worked on an archiver integration a few years ago and got > annoyed that we use SIGINT while SIGTERM was the default (systemd was > not directly used there but the signal problem was the same, so we had > to go through some loops to make the stop signal configurable, like > systemd). SIGTERM is really the default for any init system or run-a-daemon system. Christoph
On Thu, Aug 25, 2022 at 5:13 PM Christoph Berg <myon@debian.org> wrote: > > Re: Michael Paquier > > FWIW, I've worked on an archiver integration a few years ago and got > > annoyed that we use SIGINT while SIGTERM was the default (systemd was > > not directly used there but the signal problem was the same, so we had > > to go through some loops to make the stop signal configurable, like > > systemd). > > SIGTERM is really the default for any init system or run-a-daemon system. It is, but there is also precedent for not using it for graceful shutdown. Apache, for example, will do what we do today on SIGTERM and you use SIGWINCH to make it shut down gracefully (which would be the equivalent of us flushing the compression buffers, I'd say). I'm not saying we shouldn't change -- I fully approve of making the change. But the world is full of fairly prominent examples of the other way as well. I'm leaning towards considering it a feature-change and thus not something to backpatch (I'd be OK sneaking it into 15 though, as that one is not released yet and it feels like a perfectly *safe* change). Not enough to insist on it, but it seems "slightly more correct". -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On Thu, Aug 25, 2022 at 08:45:05PM +0200, Magnus Hagander wrote: > I'm leaning towards considering it a feature-change and thus not > something to backpatch (I'd be OK sneaking it into 15 though, as that > one is not released yet and it feels like a perfectly *safe* change). > Not enough to insist on it, but it seems "slightly more correct". Fine by me if both you and Daniel want to be more careful with this change. We could always argue about a backpatch later if there is more ask for it, as well. -- Michael
Attachment
Re: Daniel Gustafsson > The attached adds the Exit Status section to pg_recvlogical docs which is > present in pg_receivewal to make them more aligned, and tweaks comments to > pgindent standards. This is the version I think is ready to commit. Looks good to me. Thanks, Christoph
On Fri, Aug 26, 2022 at 09:51:26AM +0900, Michael Paquier wrote: > Fine by me if both you and Daniel want to be more careful with this > change. We could always argue about a backpatch later if there is > more ask for it, as well. Daniel, are you planning to apply this one on HEAD? -- Michael
Attachment
> On 2 Sep 2022, at 10:00, Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Aug 26, 2022 at 09:51:26AM +0900, Michael Paquier wrote: >> Fine by me if both you and Daniel want to be more careful with this >> change. We could always argue about a backpatch later if there is >> more ask for it, as well. > > Daniel, are you planning to apply this one on HEAD? Yes, it's on my TODO for this CF. -- Daniel Gustafsson https://vmware.com/
> On 2 Sep 2022, at 10:00, Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Aug 26, 2022 at 09:51:26AM +0900, Michael Paquier wrote: >> Fine by me if both you and Daniel want to be more careful with this >> change. We could always argue about a backpatch later if there is >> more ask for it, as well. > > Daniel, are you planning to apply this one on HEAD? I had another look over this and pushed it. -- Daniel Gustafsson https://vmware.com/