Thread: archive_command failures report confusing exit status
Failures of archive_command calls report a confusing exit status such as: LOG: archive command "cp 'pg_xlog/000000010000000000000000' '/nonexistent/000000010000000000000000'" failed: return code256 The actual return code is 1; it neglects to apply WEXITSTATUS(). I figured it would make sense if pgarch.c used the same mechanism that postmaster.c uses to report the various variants of regular and signal exits. I have attached a patch in that direction. It obviously needs a bit of string struggling to get all the cases right, but the idea should be clear. Comments? -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut wrote: > I figured it would make sense if pgarch.c used the same mechanism that > postmaster.c uses to report the various variants of regular and signal > exits. Hmm. Getting rid of the "(PID 0)" is going to be a mess enough for translations that I think it is worth pgarch.c having its own routine for this. Furthermore I think the detailed archive command should be reported in an errdetail() field, which makes it even farther off. (Hmm, there is nearly duplicate code in pclose_check already). -- Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J "Nunca confiaré en un traidor. Ni siquiera si el traidor lo he creado yo" (Barón Vladimir Harkonnen)
On Mon, 2007-12-10 at 18:27 +0100, Peter Eisentraut wrote: > Failures of archive_command calls report a confusing exit status such as: > > LOG: archive command "cp 'pg_xlog/000000010000000000000000' '/nonexistent/000000010000000000000000'" failed: return code256 > > The actual return code is 1; it neglects to apply WEXITSTATUS(). Yes, you're right. > I figured it would make sense if pgarch.c used the same mechanism that > postmaster.c uses to report the various variants of regular and signal > exits. OK, you could sell me on that. How will you get rid of (PID = 0)? Maybe it would be better to do that on two lines: LOG: archive command "............." failed FATAL: archive_command was terminated by signal ... -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Peter Eisentraut wrote: >> I figured it would make sense if pgarch.c used the same mechanism that >> postmaster.c uses to report the various variants of regular and signal >> exits. > Hmm. Getting rid of the "(PID 0)" is going to be a mess enough for > translations that I think it is worth pgarch.c having its own routine > for this. Furthermore I think the detailed archive command should be > reported in an errdetail() field, which makes it even farther off. I agree with Alvaro --- trying to make LogChildExit serve two masters will be uglier and less maintainable than having two copies of the not-really-so-complex logic involved. Leave postmaster.c alone and just make pgarch.c smarter. (But having said that, +1 for improving the message.) regards, tom lane
Am Montag, 10. Dezember 2007 schrieb Alvaro Herrera: > Peter Eisentraut wrote: > > I figured it would make sense if pgarch.c used the same mechanism that > > postmaster.c uses to report the various variants of regular and signal > > exits. > > Hmm. Getting rid of the "(PID 0)" is going to be a mess enough for > translations that I think it is worth pgarch.c having its own routine > for this. Furthermore I think the detailed archive command should be > reported in an errdetail() field, which makes it even farther off. Better patch. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Tue, 2007-12-11 at 17:27 +0100, Peter Eisentraut wrote: > Am Montag, 10. Dezember 2007 schrieb Alvaro Herrera: > > Peter Eisentraut wrote: > > > I figured it would make sense if pgarch.c used the same mechanism that > > > postmaster.c uses to report the various variants of regular and signal > > > exits. > > > > Hmm. Getting rid of the "(PID 0)" is going to be a mess enough for > > translations that I think it is worth pgarch.c having its own routine > > for this. Furthermore I think the detailed archive command should be > > reported in an errdetail() field, which makes it even farther off. > > Better patch. Looks much easier to understand. I prefer archive_command = ... for the errdetail though. The commands can be quite long so the extra words don't really add anything, plus they require translation. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Peter Eisentraut <peter_e@gmx.net> writes: > Better patch. Doesn't this patch break the behavior that is documented in the comment? Specifically, the case where the restore_command dies on a signal and this is reported to us by the controlling shell as exitcode > 128. We want the archiver to die, but this patch makes it not do so. regards, tom lane
Am Dienstag, 11. Dezember 2007 schrieb Tom Lane: > Peter Eisentraut <peter_e@gmx.net> writes: > > Better patch. > > Doesn't this patch break the behavior that is documented in the comment? > Specifically, the case where the restore_command dies on a signal and > this is reported to us by the controlling shell as exitcode > 128. > We want the archiver to die, but this patch makes it not do so. AFAICT, the coding WIFSIGNALED(rc) || WEXITSTATUS(rc) > 128 is simply redundant, because a signal happened exactly when WIFSIGNALED(rc) is true. I have tested this: LOG: database system was shut down at 2007-12-11 17:15:43 CET LOG: autovacuum launcher started LOG: database system is ready to accept connections FATAL: archive command was terminated by signal 1: Hangup DETAIL: The archive command was "kill -1 $$". LOG: archiver process (PID 22572) exited with exit code 1 -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes: > Am Dienstag, 11. Dezember 2007 schrieb Tom Lane: >> Doesn't this patch break the behavior that is documented in the comment? >> Specifically, the case where the restore_command dies on a signal and >> this is reported to us by the controlling shell as exitcode > 128. >> We want the archiver to die, but this patch makes it not do so. > AFAICT, the coding > WIFSIGNALED(rc) || WEXITSTATUS(rc) > 128 > is simply redundant, because a signal happened exactly when WIFSIGNALED(rc) is > true. No, you are confusing the cases "called shell was killed by a signal" and "called command was killed by a signal, which the shell then turned around and reported to us as exit > 128". > I have tested this: A single test case proves little. You need to consider race conditions and cases where the shell is ignoring the particular signal. I'm fairly certain that both of these exit statuses can happen in practice, at least with some shells/platforms. regards, tom lane
Simon Riggs wrote: > I prefer archive_command = ... for the errdetail though. The commands > can be quite long so the extra words don't really add anything, plus > they require translation. I did like this suggestion, but then I noticed, we don't actually report the setting of the archive_command setting but the actual command that was executed, with the placeholders filled out. So I think the way I posted it is more correct. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Tom Lane wrote: > No, you are confusing the cases "called shell was killed by a signal" > and "called command was killed by a signal, which the shell then turned > around and reported to us as exit > 128". Yes, I had missed that difference. Next try ... -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Tue, 2007-12-11 at 23:31 +0100, Peter Eisentraut wrote: > Simon Riggs wrote: > > I prefer archive_command = ... for the errdetail though. The commands > > can be quite long so the extra words don't really add anything, plus > > they require translation. > > I did like this suggestion, but then I noticed, we don't actually report the > setting of the archive_command setting but the actual command that was > executed, with the placeholders filled out. So I think the way I posted it > is more correct. I think you should lose the "The", if nothing else. Most Postgres messages I recall say "return code = xx" not "The return code ..." -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs wrote: > On Tue, 2007-12-11 at 23:31 +0100, Peter Eisentraut wrote: > > Simon Riggs wrote: > > > I prefer archive_command = ... for the errdetail though. The commands > > > can be quite long so the extra words don't really add anything, plus > > > they require translation. > > > > I did like this suggestion, but then I noticed, we don't actually report the > > setting of the archive_command setting but the actual command that was > > executed, with the placeholders filled out. So I think the way I posted it > > is more correct. > > I think you should lose the "The", if nothing else. Most Postgres > messages I recall say "return code = xx" not "The return code ..." Right, that's because they are in errmessage() and not errdetail(). The guidelines are different. (The reason they are different really escapes me, but you have to note that (1) Peter invented them, and (2) this grammatical difference always make sense when you read the logs.) -- Alvaro Herrera http://www.PlanetPostgreSQL.org/ "There is evil in the world. There are dark, awful things. Occasionally, we get a glimpse of them. But there are dark corners; horrors almost impossible to imagine... even in our worst nightmares." (Van Helsing, Dracula A.D. 1972)
Peter Eisentraut <peter_e@gmx.net> writes: > Yes, I had missed that difference. Next try ... Looks sane to me. regards, tom lane
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Simon Riggs wrote: >> I think you should lose the "The", if nothing else. Most Postgres >> messages I recall say "return code = xx" not "The return code ..." > Right, that's because they are in errmessage() and not errdetail(). The > guidelines are different. Right --- errdetail is supposed to be a complete sentence. (The rationale for that is explained in TFM.) I do find the current phrasing a bit awkward, though; what's bugging me is putting quotes around the command string. It seems highly likely that the command string will itself contain quotes and thus that the added quotes will be confusing. I wonder if it'd be OK to do The failed archive command was: %s which is stretching the style guidelines by omitting a trailing period, but I think that might be justifiable in this context. regards, tom lane