On Mon, Jun 27, 2022 at 12:04:57AM -0700, Noah Misch wrote:
> For me, it reproduces consistently with a sleep just before the startup
> process exits:
Nice catch.
> One can adapt the test to the server behavior by having the test wait for the
> archiver to start, as attached. This is sufficient to make check-world pass
> with the above sleep in place. I think we should also modify the PostgresNode
> archive_command to log a message. That lack of logging was a obstacle
> upthread (as seen in commit 3279cef) and again here.
? qq{copy "%p" "$path\\\\%f"}
- : qq{cp "%p" "$path/%f"};
+ : qq{echo >&2 "ARCHIVE_COMMAND %p"; cp "%p" "$path/%f"};
This is a bit inelegant. Perhaps it would be better through a perl
wrapper like cp_history_files?
> An alternative would be to declare that the test is right and the server is
> wrong. The postmaster knows how to start the checkpointer if the checkpointer
> is not running when the postmaster needs a shutdown checkpoint. It could
> start the archiver around that same area:
>
> /* Start the checkpointer if not running */
> if (CheckpointerPID == 0)
> CheckpointerPID = StartCheckpointer();
> /* And tell it to shut down */
> if (CheckpointerPID != 0)
> {
> signal_child(CheckpointerPID, SIGUSR2);
> pmState = PM_SHUTDOWN;
> }
>
> Any opinions between the change-test and change-server approaches?
The startup sequence can be sometimes tricky. Though I don't have a
specific argument coming into mind, I would stick to a fix in the
test.
--
Michael