Hello Shveta,
12.12.2023 11:44, shveta malik wrote:
>
>> The postmaster process exits with exit code 1, but pg_ctl can't get the
>> code and just reports that stop was completed successfully.
>>
> For what it's worth, there is another thread which stated the similar problem:
> https://www.postgresql.org/message-id/flat/2366244.1651681550%40sss.pgh.pa.us
>
Thank you for the reference!
So I refreshed a first part of the question Tom Lane raised before...
I've made a quick experiment with leaving postmaster.pid intact in case of
abnormal shutdown:
@@ -1113,6 +1113,7 @@ UnlinkLockFiles(int status, Datum arg)
{
char *curfile = (char *) lfirst(l);
+if (strcmp(curfile, DIRECTORY_LOCK_FILE) != 0 || status == 0)
unlink(curfile);
/* Should we complain if the unlink fails? */
}
and `make check-world` passed for me with no failure.
(In the meantime, the assertion failure forced as above is detected.)
Though there is a minor issue with a couple of tests. Namely,
003_recovery_targets.pl does the following:
# wait for the error message in the standby log
foreach my $i (0 .. 10 * $PostgreSQL::Test::Utils::timeout_default)
{
$logfile = slurp_file($node_primary->logfile());
$res = ($logfile =~
qr/FATAL: .* recovery ended before configured recovery target was reached/);
if ($res) {
last;
}
usleep(100_000);
}
ok($res,
'recovery end before target reached is a fatal error');
With postmaster.pid left after unclean shutdown, the test waits for 300
seconds by default and then completes successfully.
If rewrite that loop as follows:
# wait for the error message in the standby log
foreach my $i (0 .. 10 * $PostgreSQL::Test::Utils::timeout_default)
{
$logfile = slurp_file($node_primary->logfile());
$res = ($logfile =~
qr/FATAL: .* recovery ended before configured recovery target was reached/);
if ($res) {
last;
}
usleep(100_000);
}
ok($res,
'recovery end before target reached is a fatal error');
the test completes as quickly as before.
(standby.log is only 2kb, so rereading it isn't a big deal, IMO)
So maybe it's the way to go?
Another way I can think of is sending some signal to pg_ctl in case
postmaster terminates with status 0. Though I think it would complicate
things a little as it allows for three different states:
postmaster.pid preserved (in case postmaster killed with -9),
postmaster.pid removed and the signal received/not received.
Best regards,
Alexander