Thread: TAP test for recovery_end_command

TAP test for recovery_end_command

From
Amul Sul
Date:
Hi,

The attached patch adds a small test for recovery_end_command execution.

Currently, patch tests execution of recovery_end_command by creating
dummy file, I am not wedded only to this approach, other suggestions
also welcome.

Also, we don't have a good test for archive_cleanup_command as well, I
am not sure how we could test that which executes with every
restart-point.

Thanks to my colleague Neha Sharma for confirming the test execution on Windows.

Regards,
Amul

Attachment

Re: TAP test for recovery_end_command

From
"Euler Taveira"
Date:
On Thu, Sep 9, 2021, at 8:18 AM, Amul Sul wrote:
The attached patch adds a small test for recovery_end_command execution.
Additional coverage is always a good thing.

Currently, patch tests execution of recovery_end_command by creating
dummy file, I am not wedded only to this approach, other suggestions
also welcome.
This test file is for archiving only. It seems 020_archive_status.pl is more
appropriate for testing this parameter.

Also, we don't have a good test for archive_cleanup_command as well, I
am not sure how we could test that which executes with every
restart-point.
Setup a replica and stop it. It triggers a restartpoint during the shutdown.


--
Euler Taveira

Re: TAP test for recovery_end_command

From
Michael Paquier
Date:
On Sun, Sep 12, 2021 at 09:25:32PM -0300, Euler Taveira wrote:
> On Thu, Sep 9, 2021, at 8:18 AM, Amul Sul wrote:
>> Also, we don't have a good test for archive_cleanup_command as well, I
>> am not sure how we could test that which executes with every
>> restart-point.
>
> Setup a replica and stop it. It triggers a restartpoint during the shutdown.

+$node_standby2->append_conf('postgresql.conf',
+       "recovery_end_command='echo recovery_ended > $recovery_end_command_file'");
This is not going to work on Windows.
--
Michael

Attachment

Re: TAP test for recovery_end_command

From
Amul Sul
Date:
On Mon, Sep 13, 2021 at 8:44 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Sun, Sep 12, 2021 at 09:25:32PM -0300, Euler Taveira wrote:
> > On Thu, Sep 9, 2021, at 8:18 AM, Amul Sul wrote:
> >> Also, we don't have a good test for archive_cleanup_command as well, I
> >> am not sure how we could test that which executes with every
> >> restart-point.
> >
> > Setup a replica and stop it. It triggers a restartpoint during the shutdown.
>
> +$node_standby2->append_conf('postgresql.conf',
> +       "recovery_end_command='echo recovery_ended > $recovery_end_command_file'");
> This is not going to work on Windows.

Unfortunately, I don't have Windows, but my colleague Neha Sharma has
confirmed it works there.

Regards,
Amul



Re: TAP test for recovery_end_command

From
Amul Sul
Date:
On Mon, Sep 13, 2021 at 5:56 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Sep 9, 2021, at 8:18 AM, Amul Sul wrote:
>
> The attached patch adds a small test for recovery_end_command execution.
>
> Additional coverage is always a good thing.
>

Thanks for the confirmation.

> Currently, patch tests execution of recovery_end_command by creating
> dummy file, I am not wedded only to this approach, other suggestions
> also welcome.
>
> This test file is for archiving only. It seems 020_archive_status.pl is more
> appropriate for testing this parameter.
>

Ok, moved to 020_archive_status.pl in the attached version.

> Also, we don't have a good test for archive_cleanup_command as well, I
> am not sure how we could test that which executes with every
> restart-point.
>
> Setup a replica and stop it. It triggers a restartpoint during the shutdown.

Yeah, added that test too. I triggered the restartpoint via a
CHECKPOINT command in the attached version.

Note that I haven't tested the current version on Windows, will
cross-check that tomorrow.

Regards,
Amul

Attachment

Re: TAP test for recovery_end_command

From
"Euler Taveira"
Date:
On Mon, Sep 13, 2021, at 10:09 AM, Amul Sul wrote:
Yeah, added that test too. I triggered the restartpoint via a
CHECKPOINT command in the attached version.
+# archive_cleanup_command executed with every restart points
+ok( !-f "$archive_cleanup_command_file",
+ 'archive_cleanup_command not executed yet');

Why are you including a test whose result is known? Fresh cluster does
not contain archive_cleanup_command.done or recovery_end_command.done.

+# Checkpoint will trigger restart point on standby.
+$standby3->safe_psql('postgres', q{CHECKPOINT});
+ok(-f "$archive_cleanup_command_file",
+ 'archive_cleanup_command executed on checkpoint');

Is this test reliable?


--
Euler Taveira

Re: TAP test for recovery_end_command

From
Amul Sul
Date:
On Mon, Sep 13, 2021 at 8:39 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Mon, Sep 13, 2021, at 10:09 AM, Amul Sul wrote:
>
> Yeah, added that test too. I triggered the restartpoint via a
> CHECKPOINT command in the attached version.
>
> +# archive_cleanup_command executed with every restart points
> +ok( !-f "$archive_cleanup_command_file",
> + 'archive_cleanup_command not executed yet');
>
> Why are you including a test whose result is known? Fresh cluster does
> not contain archive_cleanup_command.done or recovery_end_command.done.
>

Make sense, removed in the attached version.

> +# Checkpoint will trigger restart point on standby.
> +$standby3->safe_psql('postgres', q{CHECKPOINT});
> +ok(-f "$archive_cleanup_command_file",
> + 'archive_cleanup_command executed on checkpoint');
>
> Is this test reliable?
>

I think yes, it will be the same as you are suggesting a shutdown
which eventually performs a checkpoint. That checkpoint on the
recovery server performs the restart point instead. Still, there could
be a case where archive_cleanup_command execution could be skipped,
if there is no record replied after the previous restart point, which
could happen in case of shutdown as well. Hope that makes sense.

Regards,
Amul

Attachment

Re: TAP test for recovery_end_command

From
Andres Freund
Date:
Hi,

On 2021-09-14 10:34:09 +0530, Amul Sul wrote:
> +# recovery_end_command_file executed only on recovery end which can happen on
> +# promotion.
> +$standby3->promote;
> +ok(-f "$recovery_end_command_file",
> +    'recovery_end_command executed after promotion');

It'd be good to test what happens when recovery_end_command fails...

Greetings,

Andres Freund



Re: TAP test for recovery_end_command

From
Amul Sul
Date:
On Wed, Oct 6, 2021 at 1:40 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2021-09-14 10:34:09 +0530, Amul Sul wrote:
> > +# recovery_end_command_file executed only on recovery end which can happen on
> > +# promotion.
> > +$standby3->promote;
> > +ok(-f "$recovery_end_command_file",
> > +     'recovery_end_command executed after promotion');
>
> It'd be good to test what happens when recovery_end_command fails...
>

Thanks for the suggestion, added the same in the attached version.

Regards,
Amul

Attachment

Re: TAP test for recovery_end_command

From
Michael Paquier
Date:
On Wed, Oct 06, 2021 at 06:49:10PM +0530, Amul Sul wrote:
> Thanks for the suggestion, added the same in the attached version.

Hmm.  The run-time of 020_archive_status.p bumps from 4.7s to 5.8s on
my laptop, so the change is noticeable.  I agree that it would be good
to have more coverage for those commands, but I also think that we
should make things cheaper if we can, particularly knowing that those
commands just touch a file.  This patch creates two stanbys for its
purpose, but do we need that much?

On top of that, 020_archive_status.pl does not look like the correct
place for this set of tests.  002_archiving.pl would be a better
candidate, where we already have two standbys that get promoted, so
you could have both the failure and success cases there.  There should
be no need for extra wait phases either.

+$standby4->append_conf('postgresql.conf',
+   "recovery_end_command = 'echo recovery_ended > /non_existing_dir/file'");
I am wondering how this finishes on Windows.
--
Michael

Attachment

Re: TAP test for recovery_end_command

From
Amul Sul
Date:
On Wed, Oct 20, 2021 at 11:09 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Oct 06, 2021 at 06:49:10PM +0530, Amul Sul wrote:
> > Thanks for the suggestion, added the same in the attached version.
>
> Hmm.  The run-time of 020_archive_status.p bumps from 4.7s to 5.8s on
> my laptop, so the change is noticeable.  I agree that it would be good
> to have more coverage for those commands, but I also think that we
> should make things cheaper if we can, particularly knowing that those
> commands just touch a file.  This patch creates two stanbys for its
> purpose, but do we need that much?
>
> On top of that, 020_archive_status.pl does not look like the correct
> place for this set of tests.  002_archiving.pl would be a better
> candidate, where we already have two standbys that get promoted, so
> you could have both the failure and success cases there.  There should
> be no need for extra wait phases either.
>

Understood, moved tests to 002_archiving.pl in the attached version.

> +$standby4->append_conf('postgresql.conf',
> +   "recovery_end_command = 'echo recovery_ended > /non_existing_dir/file'");
> I am wondering how this finishes on Windows.

My colleague Neha Sharma has confirmed that the attached version is
passing on the Windows.

Regards,
Amul

Attachment

Re: TAP test for recovery_end_command

From
Michael Paquier
Date:
On Mon, Oct 25, 2021 at 02:42:28PM +0530, Amul Sul wrote:
> Understood, moved tests to 002_archiving.pl in the attached version.

Thanks for the new patch.  I have reviewed its contents, and there
were a couple of things that caught my attention while putting my
hands on it.

+$node_standby->append_conf('postgresql.conf',
+    "archive_cleanup_command = 'echo archive_cleanuped > $archive_cleanup_command_file'");
+$node_standby->append_conf('postgresql.conf',
+    "recovery_end_command = 'echo recovery_ended > $recovery_end_command_file'");
This can be formatted with a single append_conf() call and qq() to
have the correct set of quotes.

+$node_primary->safe_psql('postgres', "CHECKPOINT");
 my $current_lsn =
   $node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
This had better say that the checkpoint is necessary because we need
one before switching to a new segment on the primary, as much as the
checkpoint on the first standby is needed to trigger the command whose
execution is checked.

+$node_standby2->append_conf('postgresql.conf',
+    "archive_cleanup_command = 'echo xyz > $data_dir/unexisting_dir/xyz.file'");
+$node_standby2->append_conf('postgresql.conf',
+    "recovery_end_command = 'echo xyz > $data_dir/unexisting_dir/xyz.file'");
[...]
+# Failing to execute archive_cleanup_command and/or recovery_end_command does
+# not affect promotion.
+is($node_standby2->safe_psql( 'postgres', q{SELECT pg_is_in_recovery()}), 'f',
+    "standby promoted successfully despite incorrect archive_cleanup_command and recovery_end_command");

This SQL test is mostly useless IMO, as the promote() call done above
ensures that this state is reached properly, and the same thing could
be with the removals of RECOVERYHISTORY and RECOVERYXLOG.  I think
that it would be better to check directly if the commands are run or
not.  This is simple to test: look at the logs from a position just
before the promotion, slurp the log file of $standby2 from this
position, and finally compare its contents with a regex of your
choice.  I have chosen a simple "qr/WARNING:.*recovery_end_command/s"
for the purpose of this test.  Having a test for
archive_cleanup_command here would be nice, but that would be much
more costly than the end-of-recovery command, so I have left that
out.  Perhaps we could just append it in the conf as a dummy, as you
did, though, but its execution is not deterministic in this test so we
are better without for now IMO.

perltidy was also complaining a bit, this is fixed as of the attached.
I have checked things on my own Windows dev box, while on it.
--
Michael

Attachment

Re: TAP test for recovery_end_command

From
Amul Sul
Date:
On Wed, Oct 27, 2021 at 9:37 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Oct 25, 2021 at 02:42:28PM +0530, Amul Sul wrote:
> > Understood, moved tests to 002_archiving.pl in the attached version.
>
> Thanks for the new patch.  I have reviewed its contents, and there
> were a couple of things that caught my attention while putting my
> hands on it.
>
> +$node_standby->append_conf('postgresql.conf',
> +       "archive_cleanup_command = 'echo archive_cleanuped > $archive_cleanup_command_file'");
> +$node_standby->append_conf('postgresql.conf',
> +       "recovery_end_command = 'echo recovery_ended > $recovery_end_command_file'");
> This can be formatted with a single append_conf() call and qq() to
> have the correct set of quotes.
>
> +$node_primary->safe_psql('postgres', "CHECKPOINT");
>  my $current_lsn =
>    $node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
> This had better say that the checkpoint is necessary because we need
> one before switching to a new segment on the primary, as much as the
> checkpoint on the first standby is needed to trigger the command whose
> execution is checked.
>
> +$node_standby2->append_conf('postgresql.conf',
> +       "archive_cleanup_command = 'echo xyz > $data_dir/unexisting_dir/xyz.file'");
> +$node_standby2->append_conf('postgresql.conf',
> +       "recovery_end_command = 'echo xyz > $data_dir/unexisting_dir/xyz.file'");
> [...]
> +# Failing to execute archive_cleanup_command and/or recovery_end_command does
> +# not affect promotion.
> +is($node_standby2->safe_psql( 'postgres', q{SELECT pg_is_in_recovery()}), 'f',
> +       "standby promoted successfully despite incorrect archive_cleanup_command and recovery_end_command");
>
> This SQL test is mostly useless IMO, as the promote() call done above
> ensures that this state is reached properly, and the same thing could
> be with the removals of RECOVERYHISTORY and RECOVERYXLOG.  I think
> that it would be better to check directly if the commands are run or
> not.  This is simple to test: look at the logs from a position just
> before the promotion, slurp the log file of $standby2 from this
> position, and finally compare its contents with a regex of your
> choice.  I have chosen a simple "qr/WARNING:.*recovery_end_command/s"
> for the purpose of this test.  Having a test for
> archive_cleanup_command here would be nice, but that would be much
> more costly than the end-of-recovery command, so I have left that
> out.  Perhaps we could just append it in the conf as a dummy, as you
> did, though, but its execution is not deterministic in this test so we
> are better without for now IMO.
>
> perltidy was also complaining a bit, this is fixed as of the attached.
> I have checked things on my own Windows dev box, while on it.

Thanks for the updated version. The patch is much better than before
except needing minor changes to the test description that testing
recovery_end_command_file before promotion, I did the same in the
attached version.

Regards,
Amul

Attachment

Re: TAP test for recovery_end_command

From
Michael Paquier
Date:
On Wed, Oct 27, 2021 at 10:32:22AM +0530, Amul Sul wrote:
> Thanks for the updated version. The patch is much better than before
> except needing minor changes to the test description that testing
> recovery_end_command_file before promotion, I did the same in the
> attached version.

 ok(!-f $recovery_end_command_file,
-   'recovery_end_command executed after promotion');
+   'recovery_end_command not executed yet');
Indeed :p
--
Michael

Attachment

Re: TAP test for recovery_end_command

From
Michael Paquier
Date:
On Wed, Oct 27, 2021 at 02:20:50PM +0900, Michael Paquier wrote:
>  ok(!-f $recovery_end_command_file,
> -   'recovery_end_command executed after promotion');
> +   'recovery_end_command not executed yet');
> Indeed :p

While looking at that this morning, I have noticed an extra bug.  If
the path of the data folder included a space, the command would have
failed.  I have to wonder if we should do something like
cp_history_files, but that did not seem necessary to me once we rely
on each command being executed from the root of the data folder.

Anyway, I am curious to see what the buildfarm thinks about all that,
particularly with Msys, so I have applied the patch.  I am keeping an
eye on things, though.
--
Michael

Attachment

Re: TAP test for recovery_end_command

From
Amul Sul
Date:
On Thu, Oct 28, 2021 at 7:24 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Oct 27, 2021 at 02:20:50PM +0900, Michael Paquier wrote:
> >  ok(!-f $recovery_end_command_file,
> > -   'recovery_end_command executed after promotion');
> > +   'recovery_end_command not executed yet');
> > Indeed :p
>
> While looking at that this morning, I have noticed an extra bug.  If
> the path of the data folder included a space, the command would have
> failed.  I have to wonder if we should do something like
> cp_history_files, but that did not seem necessary to me once we rely
> on each command being executed from the root of the data folder.
>
> Anyway, I am curious to see what the buildfarm thinks about all that,
> particularly with Msys, so I have applied the patch.  I am keeping an
> eye on things, though.

Thanks a lot, Michael.

Regards,
Amul



Re: TAP test for recovery_end_command

From
Michael Paquier
Date:
On Thu, Oct 28, 2021 at 09:25:43AM +0530, Amul Sul wrote:
> Thanks a lot, Michael.

So..  The buildfarm members running on Windows and running the
recovery tests are jacana (MinGW) and fairywren (Msys), both reporting
green.  drongo (MSVC) skips those tests, but I have tested MSVC by
myself so I think that we are good here.
--
Michael

Attachment