Re: Race condition in recovery? - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Race condition in recovery?
Date
Msg-id 20210611.140745.1586374762292931816.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Race condition in recovery?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Race condition in recovery?
List pgsql-hackers
At Thu, 10 Jun 2021 21:53:18 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> > At Thu, 10 Jun 2021 09:56:51 -0400, Robert Haas <robertmhaas@gmail.com> wrote in 
> >> Thanks for the analysis and the patches. I have committed them.
> 
> > Thanks for committing it.
> 
> Please note that conchuela and jacana are still failing ...
> 
> conchuela's failure is evidently not every time, but this test
> definitely postdates the "fix":
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-06-10%2014%3A09%3A08

A different test is failing there.  Maybe from different issue.


> ==~_~===-=-===~_~== pgsql.build/src/test/recovery/tmp_check/log/regress_log_002_archiving ==~_~===-=-===~_~==
...
> # Postmaster PID for node "standby2" is 342349
> ### Promoting node "standby2"
> # Running: pg_ctl -D
/home/pgbf/buildroot/HEAD/pgsql.build/src/test/recovery/tmp_check/t_002_archiving_standby2_data/pgdata-l
/home/pgbf/buildroot/HEAD/pgsql.build/src/test/recovery/tmp_check/log/002_archiving_standby2.logpromote
 
> waiting for server to
promote................................................................................................................
stoppedwaiting
 
> pg_ctl: server did not promote in time
> Bail out!  system pg_ctl failed

> ==~_~===-=-===~_~== pgsql.build/src/test/recovery/tmp_check/log/002_archiving_standby2.log ==~_~===-=-===~_~==
...
> 2021-06-10 16:21:21.870 CEST [342350:9] LOG:  received promote request
> 2021-06-10 16:21:21.870 CEST [342350:10] LOG:  redo done at 0/3030200 system usage: CPU: user: 0.00 s, system: 0.00
s,elapsed: 0.07 s
 
> 2021-06-10 16:21:21.870 CEST [342350:11] LOG:  last completed transaction was at log time 2021-06-10
16:21:21.010599+02
> 2021-06-10 16:21:21.893 CEST [342350:12] LOG:  restored log file "000000010000000000000003" from archive
> cp:
/home/pgbf/buildroot/HEAD/pgsql.build/src/test/recovery/tmp_check/t_002_archiving_primary_data/archives/00000003.history:
Nosuch file or directory
 
> 2021-06-10 16:21:21.896 CEST [342350:13] LOG:  selected new timeline ID: 3
(log ends here)


> ==~_~===-=-===~_~== pgsql.build/src/test/recovery/tmp_check/log/002_archiving_primary.log ==~_~===-=-===~_~==
...
> 2021-06-10 16:21:21.107 CEST [342322:4] 002_archiving.pl LOG:  disconnection: session time: 0:00:00.022 user=pgbf
database=postgreshost=[local]
 
> 2021-06-10 16:23:21.965 CEST [342279:4] LOG:  received immediate shutdown request

So the standby2 was stuck after selecting the new timeline and before
updating control file and its postmaster couldn't even respond to
SIGQUIT.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: libpq debug log
Next
From: vignesh C
Date:
Subject: Re: locking [user] catalog tables vs 2pc vs logical rep