Re: Race condition in recovery? - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Race condition in recovery?
Date
Msg-id CAFiTN-vcTODEFuyktzvURgSRkULzfkNP6k=zHv4w96-s-5AX5A@mail.gmail.com
Whole thread Raw
In response to Re: Race condition in recovery?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Race condition in recovery?
List pgsql-hackers
On Wed, May 26, 2021 at 9:40 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, May 26, 2021 at 2:44 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > I think we need to create some content on promoted standby and check
> > whether the cascade standby is able to get that or not, that will
> > guarantee that it is actually following the promoted standby,  I have
> > added the test for that so that it matches the comments.
>
> OK, but I ran this test against an unpatched server and it passed.
> That's not so great, because the test should fail without the bug fix.
> It seems to be because there's only one actual test in this test case.
> Looking at the log file,
> src/test/recovery/tmp_check/log/regress_log_025_timeline_issue, the
> only "ok" nor "not ok" line is:
>
> ok 1 - check streamed content on cascade standby
>
> So either that test is not right or some other test is needed. I think
> there's something else going wrong here, because when I run my
> original test.sh script, I see this:

Thats strange, when I ran the test I can see below in log of cascade
node (which shows that cascade get the history file but not the WAL
file and then it select the old timeline and never go to the new
timeline)

...
2021-05-26 21:46:54.412 IST [84080] LOG:  restored log file
"00000002.history" from archive
cp: cannot stat

‘/home/dilipkumar/work/PG/postgresql/src/test/recovery/tmp_check/t_025_timeline_issue_primary_data/archives/00000003.history’:
No such file or directory
2021-05-26 21:46:54.415 IST [84080] LOG:  entering standby mode
2021-05-26 21:46:54.419 IST [84080] LOG:  restored log file
"00000002.history" from archive
.....
2021-05-26 21:46:54.429 IST [84085] LOG:  started streaming WAL from
primary at 0/2000000 on timeline 1   -> stream using previous TL
2021-05-26 21:46:54.466 IST [84080] LOG:  redo starts at 0/2000028
2021-05-26 21:46:54.466 IST [84080] LOG:  consistent recovery state
reached at 0/3000000
2021-05-26 21:46:54.467 IST [84079] LOG:  database system is ready to
accept read only connections
2021-05-26 21:46:54.483 IST [84085] LOG:  replication terminated by
primary server
2021-05-26 21:46:54.483 IST [84085] DETAIL:  End of WAL reached on
timeline 1 at 0/3000000.
cp: cannot stat

‘/home/dilipkumar/work/PG/postgresql/src/test/recovery/tmp_check/t_025_timeline_issue_primary_data/archives/00000003.history’:
No such file or directory
cp: cannot stat

‘/home/dilipkumar/work/PG/postgresql/src/test/recovery/tmp_check/t_025_timeline_issue_primary_data/archives/000000010000000000000003’:
No such file or directory
2021-05-26 21:46:54.498 IST [84085] LOG:  primary server contains no
more WAL on requested timeline 1

<failure continues as it never go to timeline 2>

And finally the test case fails because the cascade can never get the changes.

I will check if there is any timing dependency in the test case.


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Race condition in recovery?
Next
From: Robert Haas
Date:
Subject: Re: Replacing pg_depend PIN entries with a fixed range check