Thread: pgsql: Fix race in test of pg_switch_wal().
Fix race in test of pg_switch_wal(). The test failed when something added WAL between pg_switch_wal() and pg_current_wal_lsn(), seen on buildfarm members hornet and sungazer. Fix v10, v9.6 and v9.5 by making this code mirror its v13+ counterpart. v12 and v11 lack a counterpart. Branch ------ REL9_5_STABLE Details ------- https://git.postgresql.org/pg/commitdiff/6a7a5fce9a7421cc0e07341921787f55a814249b Modified Files -------------- src/test/recovery/t/020_archive_status.pl | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
On Mon, Sep 14, 2020 at 06:19:38AM +0000, Noah Misch wrote: > Fix race in test of pg_switch_wal(). > > The test failed when something added WAL between pg_switch_wal() and > pg_current_wal_lsn(), seen on buildfarm members hornet and sungazer. > Fix v10, v9.6 and v9.5 by making this code mirror its v13+ counterpart. > v12 and v11 lack a counterpart. Thanks Noah for that, I did not notice those buildfarm failures: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2020-07-24%2010%3A24%3A56 It is fine to ping me or just to begin a thread if there is a problem, I would have taken care of it. Thanks, -- Michael
Attachment
On Mon, Sep 14, 2020 at 05:27:07PM +0900, Michael Paquier wrote: > On Mon, Sep 14, 2020 at 06:19:38AM +0000, Noah Misch wrote: > > Fix race in test of pg_switch_wal(). > > > > The test failed when something added WAL between pg_switch_wal() and > > pg_current_wal_lsn(), seen on buildfarm members hornet and sungazer. > > Fix v10, v9.6 and v9.5 by making this code mirror its v13+ counterpart. > > v12 and v11 lack a counterpart. > > Thanks Noah for that, I did not notice those buildfarm failures: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2020-07-24%2010%3A24%3A56 > > It is fine to ping me or just to begin a thread if there is a problem, > I would have taken care of it. There's a new 020_archive_status.pl failure: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2020-10-05%2023%3A02%3A17 Would you like to diagnose/fix that one?
On Tue, Oct 06, 2020 at 07:03:27PM -0700, Noah Misch wrote: > There's a new 020_archive_status.pl failure: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2020-10-05%2023%3A02%3A17 > > Would you like to diagnose/fix that one? Wow, thanks. This does not looks like an issue coming directly from the test though: 2020-10-06 00:20:46.786 UTC [20906622:8] LOG: restored log file "000000010000000000000003" from archive 2020-10-06 00:20:46.803 UTC [10748670:1] ERROR: could not open file "pg_xlog/000000010000000000000003": No such file ordirectory 2020-10-06 00:20:46.880 UTC [21496712:4] psql ERROR: checkpoint request failed 2020-10-06 00:20:46.880 UTC [21496712:5] psql HINT: Consult recent messages in the server log for details. [...] error running SQL: 'psql:<stdin>:1: ERROR: checkpoint request failed HINT: Consult recent messages in the server log for details.' And it looks like a race condition between the checkpointer and the startup process. This failure involves the first checkpoint triggered in $standby2 after it gets created, with this standby reaching a consistent point before triggering a manual restartpoint. That's a bit strange though, the startup process considers that this segment is restored, but the checkpointer complains that it does not actually exist, so that's in contradiction with what the startup process tells us. :/ -- Michael