Re: Race condition in recovery? - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Race condition in recovery?
Date
Msg-id 20210611.142644.1872001951622668861.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Race condition in recovery?  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Race condition in recovery?
List pgsql-hackers
At Fri, 11 Jun 2021 14:07:45 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Thu, 10 Jun 2021 21:53:18 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> > conchuela's failure is evidently not every time, but this test
> > definitely postdates the "fix":

conchuela failed recovery_check this time, and

> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-06-10%2014%3A09%3A08
> So the standby2 was stuck after selecting the new timeline and before
> updating control file and its postmaster couldn't even respond to
> SIGQUIT.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-06-09%2021%3A12%3A25

  This is before the "fix"

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-06-08%2014%3A07%3A46

  failed in pg_verifybackupCheck

> ==~_~===-=-===~_~== pgsql.build/src/bin/pg_verifybackup/tmp_check/log/regress_log_003_corruption ==~_~===-=-===~_~==
...
> #   Failed test 'base backup ok'
> #   at t/003_corruption.pl line 115.
> # Running: pg_verifybackup
/home/pgbf/buildroot/HEAD/pgsql.build/src/bin/pg_verifybackup/tmp_check/t_003_corruption_primary_data/backup/open_directory_fails
> pg_verifybackup: fatal: could not open file
"/home/pgbf/buildroot/HEAD/pgsql.build/src/bin/pg_verifybackup/tmp_check/t_003_corruption_primary_data/backup/open_directory_fails/backup_manifest":
Nosuch file or directory
 
> not ok 38 - intact backup verified

The manifest file is missing in backup. In this case also the servers
failed to handle SIGQUIT.

> ==~_~===-=-===~_~== pgsql.build/src/bin/pg_verifybackup/tmp_check/log/003_corruption_primary.log ==~_~===-=-===~_~==
...
> 2021-06-08 16:17:41.706 CEST [51792:9] 003_corruption.pl LOG:  received replication command: START_REPLICATION SLOT
"pg_basebackup_51792"0/B000000 TIMELINE 1
 
> 2021-06-08 16:17:41.706 CEST [51792:10] 003_corruption.pl STATEMENT:  START_REPLICATION SLOT "pg_basebackup_51792"
0/B000000TIMELINE 1
 
(log ends here)

There seems like some hardware failure?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Yura Sokolov
Date:
Subject: Re: Add PortalDrop in exec_execute_message
Next
From: Amit Kapila
Date:
Subject: Re: logical replication of truncate command with trigger causes Assert