Race conditions with checkpointer and shutdown - Mailing list pgsql-hackers

From Michael Paquier
Subject Race conditions with checkpointer and shutdown
Date
Msg-id 20190416070119.GK2673@paquier.xyz
Whole thread Raw
Responses Re: Race conditions with checkpointer and shutdown
List pgsql-hackers
Hi all,

This is a continuation of the following thread, but I prefer spawning
a new thread for clarity:
https://www.postgresql.org/message-id/20190416064512.GJ2673@paquier.xyz

The buildfarm has reported two similar failures when shutting down a
node:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet&dt=2019-03-23%2022%3A28%3A59
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet&dt=2019-04-16%2006%3A14%3A01

In both cases, the instance cannot shut down because it times out,
waiting for the shutdown checkpoint to finish but I suspect that this
checkpoint actually never happens.

The first case involves piculet which has --disable-atomics, gcc 6 and
the recovery test 016_min_consistency where we trigger a checkpoint,
then issue a fast shutdown on a standby.  And at this point the test
waits forever.

The second case involves dragonet which has JIT enabled and clang.
The failure is on test 009_twophase.pl.  The failure happens after
test preparing transaction xact_009_11, where a *standby* gets
restarted.  Again, the test waits forever for the instance to shut
down.

The most recent commits which have touched checkpoints are 0dfe3d0e
and c6c9474a, which maps roughly to the point where the failures
began to happen, and that something related to standby clean shutdowns
has broken since.

Thanks,
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [PATCH v20] GSSAPI encryption support
Next
From: Magnus Hagander
Date:
Subject: Re: Commit message / hash in commitfest page.