Re: Race condition in recovery? - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: Race condition in recovery? |
Date | |
Msg-id | CAFiTN-vcTODEFuyktzvURgSRkULzfkNP6k=zHv4w96-s-5AX5A@mail.gmail.com Whole thread Raw |
In response to | Re: Race condition in recovery? (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Race condition in recovery?
|
List | pgsql-hackers |
On Wed, May 26, 2021 at 9:40 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, May 26, 2021 at 2:44 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > I think we need to create some content on promoted standby and check > > whether the cascade standby is able to get that or not, that will > > guarantee that it is actually following the promoted standby, I have > > added the test for that so that it matches the comments. > > OK, but I ran this test against an unpatched server and it passed. > That's not so great, because the test should fail without the bug fix. > It seems to be because there's only one actual test in this test case. > Looking at the log file, > src/test/recovery/tmp_check/log/regress_log_025_timeline_issue, the > only "ok" nor "not ok" line is: > > ok 1 - check streamed content on cascade standby > > So either that test is not right or some other test is needed. I think > there's something else going wrong here, because when I run my > original test.sh script, I see this: Thats strange, when I ran the test I can see below in log of cascade node (which shows that cascade get the history file but not the WAL file and then it select the old timeline and never go to the new timeline) ... 2021-05-26 21:46:54.412 IST [84080] LOG: restored log file "00000002.history" from archive cp: cannot stat ‘/home/dilipkumar/work/PG/postgresql/src/test/recovery/tmp_check/t_025_timeline_issue_primary_data/archives/00000003.history’: No such file or directory 2021-05-26 21:46:54.415 IST [84080] LOG: entering standby mode 2021-05-26 21:46:54.419 IST [84080] LOG: restored log file "00000002.history" from archive ..... 2021-05-26 21:46:54.429 IST [84085] LOG: started streaming WAL from primary at 0/2000000 on timeline 1 -> stream using previous TL 2021-05-26 21:46:54.466 IST [84080] LOG: redo starts at 0/2000028 2021-05-26 21:46:54.466 IST [84080] LOG: consistent recovery state reached at 0/3000000 2021-05-26 21:46:54.467 IST [84079] LOG: database system is ready to accept read only connections 2021-05-26 21:46:54.483 IST [84085] LOG: replication terminated by primary server 2021-05-26 21:46:54.483 IST [84085] DETAIL: End of WAL reached on timeline 1 at 0/3000000. cp: cannot stat ‘/home/dilipkumar/work/PG/postgresql/src/test/recovery/tmp_check/t_025_timeline_issue_primary_data/archives/00000003.history’: No such file or directory cp: cannot stat ‘/home/dilipkumar/work/PG/postgresql/src/test/recovery/tmp_check/t_025_timeline_issue_primary_data/archives/000000010000000000000003’: No such file or directory 2021-05-26 21:46:54.498 IST [84085] LOG: primary server contains no more WAL on requested timeline 1 <failure continues as it never go to timeline 2> And finally the test case fails because the cascade can never get the changes. I will check if there is any timing dependency in the test case. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: