Thread: TAP test for recovery_end_command
Hi, The attached patch adds a small test for recovery_end_command execution. Currently, patch tests execution of recovery_end_command by creating dummy file, I am not wedded only to this approach, other suggestions also welcome. Also, we don't have a good test for archive_cleanup_command as well, I am not sure how we could test that which executes with every restart-point. Thanks to my colleague Neha Sharma for confirming the test execution on Windows. Regards, Amul
Attachment
On Thu, Sep 9, 2021, at 8:18 AM, Amul Sul wrote:
The attached patch adds a small test for recovery_end_command execution.
Additional coverage is always a good thing.
Currently, patch tests execution of recovery_end_command by creatingdummy file, I am not wedded only to this approach, other suggestionsalso welcome.
This test file is for archiving only. It seems 020_archive_status.pl is more
appropriate for testing this parameter.
Also, we don't have a good test for archive_cleanup_command as well, Iam not sure how we could test that which executes with everyrestart-point.
Setup a replica and stop it. It triggers a restartpoint during the shutdown.
On Sun, Sep 12, 2021 at 09:25:32PM -0300, Euler Taveira wrote: > On Thu, Sep 9, 2021, at 8:18 AM, Amul Sul wrote: >> Also, we don't have a good test for archive_cleanup_command as well, I >> am not sure how we could test that which executes with every >> restart-point. > > Setup a replica and stop it. It triggers a restartpoint during the shutdown. +$node_standby2->append_conf('postgresql.conf', + "recovery_end_command='echo recovery_ended > $recovery_end_command_file'"); This is not going to work on Windows. -- Michael
Attachment
On Mon, Sep 13, 2021 at 8:44 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Sun, Sep 12, 2021 at 09:25:32PM -0300, Euler Taveira wrote: > > On Thu, Sep 9, 2021, at 8:18 AM, Amul Sul wrote: > >> Also, we don't have a good test for archive_cleanup_command as well, I > >> am not sure how we could test that which executes with every > >> restart-point. > > > > Setup a replica and stop it. It triggers a restartpoint during the shutdown. > > +$node_standby2->append_conf('postgresql.conf', > + "recovery_end_command='echo recovery_ended > $recovery_end_command_file'"); > This is not going to work on Windows. Unfortunately, I don't have Windows, but my colleague Neha Sharma has confirmed it works there. Regards, Amul
On Mon, Sep 13, 2021 at 5:56 AM Euler Taveira <euler@eulerto.com> wrote: > > On Thu, Sep 9, 2021, at 8:18 AM, Amul Sul wrote: > > The attached patch adds a small test for recovery_end_command execution. > > Additional coverage is always a good thing. > Thanks for the confirmation. > Currently, patch tests execution of recovery_end_command by creating > dummy file, I am not wedded only to this approach, other suggestions > also welcome. > > This test file is for archiving only. It seems 020_archive_status.pl is more > appropriate for testing this parameter. > Ok, moved to 020_archive_status.pl in the attached version. > Also, we don't have a good test for archive_cleanup_command as well, I > am not sure how we could test that which executes with every > restart-point. > > Setup a replica and stop it. It triggers a restartpoint during the shutdown. Yeah, added that test too. I triggered the restartpoint via a CHECKPOINT command in the attached version. Note that I haven't tested the current version on Windows, will cross-check that tomorrow. Regards, Amul
Attachment
On Mon, Sep 13, 2021, at 10:09 AM, Amul Sul wrote:
Yeah, added that test too. I triggered the restartpoint via aCHECKPOINT command in the attached version.
+# archive_cleanup_command executed with every restart points
+ok( !-f "$archive_cleanup_command_file",
+ 'archive_cleanup_command not executed yet');
Why are you including a test whose result is known? Fresh cluster does
not contain archive_cleanup_command.done or recovery_end_command.done.
+# Checkpoint will trigger restart point on standby.
+$standby3->safe_psql('postgres', q{CHECKPOINT});
+ok(-f "$archive_cleanup_command_file",
+ 'archive_cleanup_command executed on checkpoint');
Is this test reliable?
On Mon, Sep 13, 2021 at 8:39 PM Euler Taveira <euler@eulerto.com> wrote: > > On Mon, Sep 13, 2021, at 10:09 AM, Amul Sul wrote: > > Yeah, added that test too. I triggered the restartpoint via a > CHECKPOINT command in the attached version. > > +# archive_cleanup_command executed with every restart points > +ok( !-f "$archive_cleanup_command_file", > + 'archive_cleanup_command not executed yet'); > > Why are you including a test whose result is known? Fresh cluster does > not contain archive_cleanup_command.done or recovery_end_command.done. > Make sense, removed in the attached version. > +# Checkpoint will trigger restart point on standby. > +$standby3->safe_psql('postgres', q{CHECKPOINT}); > +ok(-f "$archive_cleanup_command_file", > + 'archive_cleanup_command executed on checkpoint'); > > Is this test reliable? > I think yes, it will be the same as you are suggesting a shutdown which eventually performs a checkpoint. That checkpoint on the recovery server performs the restart point instead. Still, there could be a case where archive_cleanup_command execution could be skipped, if there is no record replied after the previous restart point, which could happen in case of shutdown as well. Hope that makes sense. Regards, Amul
Attachment
Hi, On 2021-09-14 10:34:09 +0530, Amul Sul wrote: > +# recovery_end_command_file executed only on recovery end which can happen on > +# promotion. > +$standby3->promote; > +ok(-f "$recovery_end_command_file", > + 'recovery_end_command executed after promotion'); It'd be good to test what happens when recovery_end_command fails... Greetings, Andres Freund
On Wed, Oct 6, 2021 at 1:40 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2021-09-14 10:34:09 +0530, Amul Sul wrote: > > +# recovery_end_command_file executed only on recovery end which can happen on > > +# promotion. > > +$standby3->promote; > > +ok(-f "$recovery_end_command_file", > > + 'recovery_end_command executed after promotion'); > > It'd be good to test what happens when recovery_end_command fails... > Thanks for the suggestion, added the same in the attached version. Regards, Amul
Attachment
On Wed, Oct 06, 2021 at 06:49:10PM +0530, Amul Sul wrote: > Thanks for the suggestion, added the same in the attached version. Hmm. The run-time of 020_archive_status.p bumps from 4.7s to 5.8s on my laptop, so the change is noticeable. I agree that it would be good to have more coverage for those commands, but I also think that we should make things cheaper if we can, particularly knowing that those commands just touch a file. This patch creates two stanbys for its purpose, but do we need that much? On top of that, 020_archive_status.pl does not look like the correct place for this set of tests. 002_archiving.pl would be a better candidate, where we already have two standbys that get promoted, so you could have both the failure and success cases there. There should be no need for extra wait phases either. +$standby4->append_conf('postgresql.conf', + "recovery_end_command = 'echo recovery_ended > /non_existing_dir/file'"); I am wondering how this finishes on Windows. -- Michael
Attachment
On Wed, Oct 20, 2021 at 11:09 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Oct 06, 2021 at 06:49:10PM +0530, Amul Sul wrote: > > Thanks for the suggestion, added the same in the attached version. > > Hmm. The run-time of 020_archive_status.p bumps from 4.7s to 5.8s on > my laptop, so the change is noticeable. I agree that it would be good > to have more coverage for those commands, but I also think that we > should make things cheaper if we can, particularly knowing that those > commands just touch a file. This patch creates two stanbys for its > purpose, but do we need that much? > > On top of that, 020_archive_status.pl does not look like the correct > place for this set of tests. 002_archiving.pl would be a better > candidate, where we already have two standbys that get promoted, so > you could have both the failure and success cases there. There should > be no need for extra wait phases either. > Understood, moved tests to 002_archiving.pl in the attached version. > +$standby4->append_conf('postgresql.conf', > + "recovery_end_command = 'echo recovery_ended > /non_existing_dir/file'"); > I am wondering how this finishes on Windows. My colleague Neha Sharma has confirmed that the attached version is passing on the Windows. Regards, Amul
Attachment
On Mon, Oct 25, 2021 at 02:42:28PM +0530, Amul Sul wrote: > Understood, moved tests to 002_archiving.pl in the attached version. Thanks for the new patch. I have reviewed its contents, and there were a couple of things that caught my attention while putting my hands on it. +$node_standby->append_conf('postgresql.conf', + "archive_cleanup_command = 'echo archive_cleanuped > $archive_cleanup_command_file'"); +$node_standby->append_conf('postgresql.conf', + "recovery_end_command = 'echo recovery_ended > $recovery_end_command_file'"); This can be formatted with a single append_conf() call and qq() to have the correct set of quotes. +$node_primary->safe_psql('postgres', "CHECKPOINT"); my $current_lsn = $node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn();"); This had better say that the checkpoint is necessary because we need one before switching to a new segment on the primary, as much as the checkpoint on the first standby is needed to trigger the command whose execution is checked. +$node_standby2->append_conf('postgresql.conf', + "archive_cleanup_command = 'echo xyz > $data_dir/unexisting_dir/xyz.file'"); +$node_standby2->append_conf('postgresql.conf', + "recovery_end_command = 'echo xyz > $data_dir/unexisting_dir/xyz.file'"); [...] +# Failing to execute archive_cleanup_command and/or recovery_end_command does +# not affect promotion. +is($node_standby2->safe_psql( 'postgres', q{SELECT pg_is_in_recovery()}), 'f', + "standby promoted successfully despite incorrect archive_cleanup_command and recovery_end_command"); This SQL test is mostly useless IMO, as the promote() call done above ensures that this state is reached properly, and the same thing could be with the removals of RECOVERYHISTORY and RECOVERYXLOG. I think that it would be better to check directly if the commands are run or not. This is simple to test: look at the logs from a position just before the promotion, slurp the log file of $standby2 from this position, and finally compare its contents with a regex of your choice. I have chosen a simple "qr/WARNING:.*recovery_end_command/s" for the purpose of this test. Having a test for archive_cleanup_command here would be nice, but that would be much more costly than the end-of-recovery command, so I have left that out. Perhaps we could just append it in the conf as a dummy, as you did, though, but its execution is not deterministic in this test so we are better without for now IMO. perltidy was also complaining a bit, this is fixed as of the attached. I have checked things on my own Windows dev box, while on it. -- Michael
Attachment
On Wed, Oct 27, 2021 at 9:37 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Oct 25, 2021 at 02:42:28PM +0530, Amul Sul wrote: > > Understood, moved tests to 002_archiving.pl in the attached version. > > Thanks for the new patch. I have reviewed its contents, and there > were a couple of things that caught my attention while putting my > hands on it. > > +$node_standby->append_conf('postgresql.conf', > + "archive_cleanup_command = 'echo archive_cleanuped > $archive_cleanup_command_file'"); > +$node_standby->append_conf('postgresql.conf', > + "recovery_end_command = 'echo recovery_ended > $recovery_end_command_file'"); > This can be formatted with a single append_conf() call and qq() to > have the correct set of quotes. > > +$node_primary->safe_psql('postgres', "CHECKPOINT"); > my $current_lsn = > $node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn();"); > This had better say that the checkpoint is necessary because we need > one before switching to a new segment on the primary, as much as the > checkpoint on the first standby is needed to trigger the command whose > execution is checked. > > +$node_standby2->append_conf('postgresql.conf', > + "archive_cleanup_command = 'echo xyz > $data_dir/unexisting_dir/xyz.file'"); > +$node_standby2->append_conf('postgresql.conf', > + "recovery_end_command = 'echo xyz > $data_dir/unexisting_dir/xyz.file'"); > [...] > +# Failing to execute archive_cleanup_command and/or recovery_end_command does > +# not affect promotion. > +is($node_standby2->safe_psql( 'postgres', q{SELECT pg_is_in_recovery()}), 'f', > + "standby promoted successfully despite incorrect archive_cleanup_command and recovery_end_command"); > > This SQL test is mostly useless IMO, as the promote() call done above > ensures that this state is reached properly, and the same thing could > be with the removals of RECOVERYHISTORY and RECOVERYXLOG. I think > that it would be better to check directly if the commands are run or > not. This is simple to test: look at the logs from a position just > before the promotion, slurp the log file of $standby2 from this > position, and finally compare its contents with a regex of your > choice. I have chosen a simple "qr/WARNING:.*recovery_end_command/s" > for the purpose of this test. Having a test for > archive_cleanup_command here would be nice, but that would be much > more costly than the end-of-recovery command, so I have left that > out. Perhaps we could just append it in the conf as a dummy, as you > did, though, but its execution is not deterministic in this test so we > are better without for now IMO. > > perltidy was also complaining a bit, this is fixed as of the attached. > I have checked things on my own Windows dev box, while on it. Thanks for the updated version. The patch is much better than before except needing minor changes to the test description that testing recovery_end_command_file before promotion, I did the same in the attached version. Regards, Amul
Attachment
On Wed, Oct 27, 2021 at 10:32:22AM +0530, Amul Sul wrote: > Thanks for the updated version. The patch is much better than before > except needing minor changes to the test description that testing > recovery_end_command_file before promotion, I did the same in the > attached version. ok(!-f $recovery_end_command_file, - 'recovery_end_command executed after promotion'); + 'recovery_end_command not executed yet'); Indeed :p -- Michael
Attachment
On Wed, Oct 27, 2021 at 02:20:50PM +0900, Michael Paquier wrote: > ok(!-f $recovery_end_command_file, > - 'recovery_end_command executed after promotion'); > + 'recovery_end_command not executed yet'); > Indeed :p While looking at that this morning, I have noticed an extra bug. If the path of the data folder included a space, the command would have failed. I have to wonder if we should do something like cp_history_files, but that did not seem necessary to me once we rely on each command being executed from the root of the data folder. Anyway, I am curious to see what the buildfarm thinks about all that, particularly with Msys, so I have applied the patch. I am keeping an eye on things, though. -- Michael
Attachment
On Thu, Oct 28, 2021 at 7:24 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Oct 27, 2021 at 02:20:50PM +0900, Michael Paquier wrote: > > ok(!-f $recovery_end_command_file, > > - 'recovery_end_command executed after promotion'); > > + 'recovery_end_command not executed yet'); > > Indeed :p > > While looking at that this morning, I have noticed an extra bug. If > the path of the data folder included a space, the command would have > failed. I have to wonder if we should do something like > cp_history_files, but that did not seem necessary to me once we rely > on each command being executed from the root of the data folder. > > Anyway, I am curious to see what the buildfarm thinks about all that, > particularly with Msys, so I have applied the patch. I am keeping an > eye on things, though. Thanks a lot, Michael. Regards, Amul
On Thu, Oct 28, 2021 at 09:25:43AM +0530, Amul Sul wrote: > Thanks a lot, Michael. So.. The buildfarm members running on Windows and running the recovery tests are jacana (MinGW) and fairywren (Msys), both reporting green. drongo (MSVC) skips those tests, but I have tested MSVC by myself so I think that we are good here. -- Michael