Thread: TAP test to cover "EndOfLogTLI != replayTLI" case
Hi, Attached patch covers a case where TLI in the filename for a record being read is different from where it belongs to. In other words, it covers following case noted in StartupXLOG(): /* * EndOfLogTLI is the TLI in the filename of the XLOG segment containing * the end-of-log. It could be different from the timeline that EndOfLog * nominally belongs to, if there was a timeline switch in that segment, * and we were reading the old WAL from a segment belonging to a higher * timeline. */ EndOfLogTLI = xlogreader->seg.ws_tli; Test tried to set recovery target just before the end-of-recovery WAL record. Unfortunately, the test couldn't directly verify EndOfLogTLI != replayTLI case, I am not sure how to do that -- any suggestions will be greatly appreciated. Perhaps, having this test is good to improve xlog.c test coverage. Also, if you see in other angle test covers a case where recovery_target_lsn and recovery_target_inclusive=off which doesn't exist as of now and that is the reason I have added this test to 003_recovery_targets.pl file. Thoughts? Suggestions? -- Regards, Amul Sul EDB: http://www.enterprisedb.com
Attachment
Hi, On 2021-11-23 11:43:21 +0530, Amul Sul wrote: > Attached patch covers a case where TLI in the filename for a > record being read is different from where it belongs to. In other > words, it covers following case noted in StartupXLOG(): > Thoughts? Suggestions? It seems the test isn't quite reliable. It occasionally fails on freebsd, macos, linux and always on windows (starting with the new CI stuff, before the test wasn't run). See https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3427 Greetings, Andres Freund
On Mon, Jan 10, 2022 at 8:25 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2021-11-23 11:43:21 +0530, Amul Sul wrote: > > Attached patch covers a case where TLI in the filename for a > > record being read is different from where it belongs to. In other > > words, it covers following case noted in StartupXLOG(): > > > Thoughts? Suggestions? > > It seems the test isn't quite reliable. It occasionally fails on freebsd, > macos, linux and always on windows (starting with the new CI stuff, before the > test wasn't run). > > See https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3427 > Thanks for the note, I can see the same test is failing on my centos vm too with the latest master head(376ce3e404b). The failing reason is the "recovery_target_inclusive = off" setting which is unnecessary for this test, the attached patch removing the same. Regards, Amul
Attachment
Hi, On Mon, Jan 10, 2022 at 09:46:23AM +0530, Amul Sul wrote: > > Thanks for the note, I can see the same test is failing on my centos > vm too with the latest master head(376ce3e404b). The failing reason is > the "recovery_target_inclusive = off" setting which is unnecessary for > this test, the attached patch removing the same. This version still fails on windows according to the cfbot: https://cirrus-ci.com/task/5882621321281536 [18:14:02.639] c:\cirrus>call perl src/tools/msvc/vcregress.pl recoverycheck [18:14:56.114] [18:14:56.122] # Failed test 'check standby content before timeline switch 0/500FB30' [18:14:56.122] # at t/003_recovery_targets.pl line 234. [18:14:56.122] # got: '6000' [18:14:56.122] # expected: '7000' I'm switching the cf entry to Waiting on Author.
On Sat, Jan 15, 2022 at 11:35 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > Hi, > > On Mon, Jan 10, 2022 at 09:46:23AM +0530, Amul Sul wrote: > > > > Thanks for the note, I can see the same test is failing on my centos > > vm too with the latest master head(376ce3e404b). The failing reason is > > the "recovery_target_inclusive = off" setting which is unnecessary for > > this test, the attached patch removing the same. > > This version still fails on windows according to the cfbot: > > https://cirrus-ci.com/task/5882621321281536 > > [18:14:02.639] c:\cirrus>call perl src/tools/msvc/vcregress.pl recoverycheck > [18:14:56.114] > [18:14:56.122] # Failed test 'check standby content before timeline switch 0/500FB30' > [18:14:56.122] # at t/003_recovery_targets.pl line 234. > [18:14:56.122] # got: '6000' > [18:14:56.122] # expected: '7000' > > I'm switching the cf entry to Waiting on Author. Thanks for the note. I am not sure what really went wrong but I think the 'standby_9' server shutdown too early before it gets a chance to archive a required WAL file. The attached patch tries to shutdown that server after the required WAL are archived, unfortunately, I couldn't test that on the window, let see how cfbot reacts to this version. Regards, Amul
Attachment
Hi, On Mon, Jan 17, 2022 at 05:07:48PM +0530, Amul Sul wrote: > > I am not sure what really went wrong but I think the 'standby_9' > server shutdown too early before it gets a chance to archive a > required WAL file. The attached patch tries to shutdown that server > after the required WAL are archived, unfortunately, I couldn't test > that on the window, let see how cfbot reacts to this version. Thanks for the updated patch! Note that thanks to Andres and Thomas work, you can now easily rely on the exact same CI than the cfbot on your own github repository, if you need to debug something on a platform you don't have access to. You can check the documentation at [1] for more detail on how to setup the CI. [1] https://github.com/postgres/postgres/blob/master/src/tools/ci/README
Hi, On Mon, Jan 17, 2022 at 09:33:57PM +0800, Julien Rouhaud wrote: > > Thanks for the updated patch! Note that thanks to Andres and Thomas work, you > can now easily rely on the exact same CI than the cfbot on your own github > repository, if you need to debug something on a platform you don't have access > to. You can check the documentation at [1] for more detail on how to setup the > CI. The cfbot reports that the patch still fails on Windows but also failed on macos with the same error: https://cirrus-ci.com/task/5655777858813952: [14:20:43.950] # Failed test 'check standby content before timeline switch 0/500FAF8' [14:20:43.950] # at t/003_recovery_targets.pl line 239. [14:20:43.950] # got: '6000' [14:20:43.950] # expected: '7000' [14:20:43.950] # Looks like you failed 1 test of 10. I'm switching the CF entry to Waiting on Author.
At Tue, 18 Jan 2022 12:25:15 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in > Hi, > > On Mon, Jan 17, 2022 at 09:33:57PM +0800, Julien Rouhaud wrote: > > > > Thanks for the updated patch! Note that thanks to Andres and Thomas work, you > > can now easily rely on the exact same CI than the cfbot on your own github > > repository, if you need to debug something on a platform you don't have access > > to. You can check the documentation at [1] for more detail on how to setup the > > CI. > > The cfbot reports that the patch still fails on Windows but also failed on > macos with the same error: https://cirrus-ci.com/task/5655777858813952: > > [14:20:43.950] # Failed test 'check standby content before timeline switch 0/500FAF8' > [14:20:43.950] # at t/003_recovery_targets.pl line 239. > [14:20:43.950] # got: '6000' > [14:20:43.950] # expected: '7000' > [14:20:43.950] # Looks like you failed 1 test of 10. > > I'm switching the CF entry to Waiting on Author. The most significant advantages of the local CI setup are - CI immediately responds to push. - You can dirty the code with additional logging aid as much as you like to see closely what is going on. It makes us hesitant to do the same on this ML:p regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Jan 18, 2022 at 10:31 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Tue, 18 Jan 2022 12:25:15 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in > > Hi, > > > > On Mon, Jan 17, 2022 at 09:33:57PM +0800, Julien Rouhaud wrote: > > > > > > Thanks for the updated patch! Note that thanks to Andres and Thomas work, you > > > can now easily rely on the exact same CI than the cfbot on your own github > > > repository, if you need to debug something on a platform you don't have access > > > to. You can check the documentation at [1] for more detail on how to setup the > > > CI. > > > > The cfbot reports that the patch still fails on Windows but also failed on > > macos with the same error: https://cirrus-ci.com/task/5655777858813952: > > > > [14:20:43.950] # Failed test 'check standby content before timeline switch 0/500FAF8' > > [14:20:43.950] # at t/003_recovery_targets.pl line 239. > > [14:20:43.950] # got: '6000' > > [14:20:43.950] # expected: '7000' > > [14:20:43.950] # Looks like you failed 1 test of 10. > > > > I'm switching the CF entry to Waiting on Author. > > The most significant advantages of the local CI setup are > > - CI immediately responds to push. > > - You can dirty the code with additional logging aid as much as you > like to see closely what is going on. It makes us hesitant to do > the same on this ML:p > Indeed :) I found the cause for the test failing on window -- is due to the custom archive command setting which wasn't setting the correct window archive directory path. There is no option to choose a custom wal archive and restore patch in the TAP test. The attach 0001 patch proposes the same, which enables you to choose a custom WAL archive and restore directory. The only concern I have is that $node->info() will print the wrong archive directory path in that case, do we need to fix that? We might need to store archive_dir like base_dir, I wasn't sure about that. Thoughts? Comments? Regards, Amul
Attachment
On Thu, Jan 20, 2022 at 12:13:08PM +0530, Amul Sul wrote: > I found the cause for the test failing on window -- is due to the > custom archive command setting which wasn't setting the correct window > archive directory path. After reading this patch and this thread, I have noticed that you are testing the same thing as Heikki here, patch 0001: https://www.postgresql.org/message-id/52bc9ccd-8591-431b-0086-15d9acf25a3f@iki.fi The patch sent on the other thread has a better description and shape, so perhaps we'd better drop what is posted here in favor of the other version. Thoughts? -- Michael
Attachment
At Thu, 27 Jan 2022 15:31:37 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Thu, Jan 20, 2022 at 12:13:08PM +0530, Amul Sul wrote: > > I found the cause for the test failing on window -- is due to the > > custom archive command setting which wasn't setting the correct window > > archive directory path. > > After reading this patch and this thread, I have noticed that you are > testing the same thing as Heikki here, patch 0001: > https://www.postgresql.org/message-id/52bc9ccd-8591-431b-0086-15d9acf25a3f@iki.fi > > The patch sent on the other thread has a better description and shape, > so perhaps we'd better drop what is posted here in favor of the other > version. Thoughts? pg_switch_wal() is more preferable than filling-in a large number of records as the means to advance segments because it is stable against changes of wal_segment_size. As you said, using has_restoring on initializing node_ptr is simpler than explicitly setting archive_dir from that of another node. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, Jan 27, 2022 at 12:01 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Jan 20, 2022 at 12:13:08PM +0530, Amul Sul wrote: > > I found the cause for the test failing on window -- is due to the > > custom archive command setting which wasn't setting the correct window > > archive directory path. > > After reading this patch and this thread, I have noticed that you are > testing the same thing as Heikki here, patch 0001: > https://www.postgresql.org/message-id/52bc9ccd-8591-431b-0086-15d9acf25a3f@iki.fi > > The patch sent on the other thread has a better description and shape, > so perhaps we'd better drop what is posted here in favor of the other > version. Thoughts? Yes, I do agree with you. Thanks for the comparison. Regards, Amul
On 28/01/2022 12:10, Amul Sul wrote: > On Thu, Jan 27, 2022 at 12:01 PM Michael Paquier <michael@paquier.xyz> wrote: >> >> After reading this patch and this thread, I have noticed that you are >> testing the same thing as Heikki here, patch 0001: >> https://www.postgresql.org/message-id/52bc9ccd-8591-431b-0086-15d9acf25a3f@iki.fi >> >> The patch sent on the other thread has a better description and shape, >> so perhaps we'd better drop what is posted here in favor of the other >> version. Thoughts? > > Yes, I do agree with you. Thanks for the comparison. Pushed the test from that other thread now. Thanks! - Heikki
On Mon, Feb 14, 2022 at 3:07 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 28/01/2022 12:10, Amul Sul wrote: > > On Thu, Jan 27, 2022 at 12:01 PM Michael Paquier <michael@paquier.xyz> wrote: > >> > >> After reading this patch and this thread, I have noticed that you are > >> testing the same thing as Heikki here, patch 0001: > >> https://www.postgresql.org/message-id/52bc9ccd-8591-431b-0086-15d9acf25a3f@iki.fi > >> > >> The patch sent on the other thread has a better description and shape, > >> so perhaps we'd better drop what is posted here in favor of the other > >> version. Thoughts? > > > > Yes, I do agree with you. Thanks for the comparison. > > Pushed the test from that other thread now. Thanks! > Thank you ! Regards, Amul