Thread: TAP / recovery-test fs-level backups, psql enhancements etc
Hi all
--
I've been working with the new TAP tests for recovery and have a number of enhancements I'd like to make to the tooling to make writing tests easier and nicer. I've also included two improvements proposed by Kyotaro HORIGUCHI in the prior thread about the now-committed TAP recovery tests.
I developed these changes as part of testing failover slots and logical decoding timeline following, where I found a need for better control over psql, the ability to make filesystem level backups, etc. It doesn't make sense to try to jam all that into my test script when it belongs in the infrastructure.
Patches attached, each explains what it does and what for.
Attachment
- 0001-Allow-multiple-temp-config-arguments-to-pg_regress.patch
- 0002-TAP-Add-filtering-to-RecursiveCopy.patch
- 0003-TAP-Add-easier-more-flexible-ways-to-invoke-psql.patch
- 0004-TAP-Add-support-for-taking-filesystem-level-backups.patch
- 0005-TAP-Suffix-temporary-directories-with-node-name.patch
- 0006-TAP-Retain-tempdirs-for-failed-tests.patch
- 0007-TAP-Perltidy-PostgresNode.pm.patch
On Tue, Mar 1, 2016 at 2:48 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
You are using both "die_on_error" and "on_error_die" in your code. That looks like a mismatch!
Hi allI've been working with the new TAP tests for recovery and have a number of enhancements I'd like to make to the tooling to make writing tests easier and nicer. I've also included two improvements proposed by Kyotaro HORIGUCHI in the prior thread about the now-committed TAP recovery tests.I developed these changes as part of testing failover slots and logical decoding timeline following, where I found a need for better control over psql, the ability to make filesystem level backups, etc. It doesn't make sense to try to jam all that into my test script when it belongs in the infrastructure.Patches attached, each explains what it does and what for.
You are using both "die_on_error" and "on_error_die" in your code. That looks like a mismatch!
On 1 March 2016 at 22:08, salvador fandino <sfandino@gmail.com> wrote:
--
On Tue, Mar 1, 2016 at 2:48 PM, Craig Ringer <craig@2ndquadrant.com> wrote:Hi allI've been working with the new TAP tests for recovery and have a number of enhancements I'd like to make to the tooling to make writing tests easier and nicer. I've also included two improvements proposed by Kyotaro HORIGUCHI in the prior thread about the now-committed TAP recovery tests.I developed these changes as part of testing failover slots and logical decoding timeline following, where I found a need for better control over psql, the ability to make filesystem level backups, etc. It doesn't make sense to try to jam all that into my test script when it belongs in the infrastructure.Patches attached, each explains what it does and what for.
You are using both "die_on_error" and "on_error_die" in your code. That looks like a mismatch!
Thanks for taking a look.
In this case it has no effect since it's just specifying the default explicitly, but it's certainly wrong. Fixed in git.
I'll wait to see what else comes up before posting another series.
Craig Ringer wrote: > Hi all > > I've been working with the new TAP tests for recovery and have a number of > enhancements I'd like to make to the tooling to make writing tests easier > and nicer. I think we should change the existing psql method to be what you propose as psql_expert. I don't see any advantage in keeping the old one. Many of the existing uses of psql should become what you call psql_check; but we should probably call that psql_ok() instead, and also have a psql_fails() for src/test/recovery/t/001_stream_rep.pl (and others to come). > +=item $node->backup_fs_hot(backup_name) > + > +Create a backup with a filesystem level copy in $node->backup_dir, > +including transaction logs. Archiving must be enabled as pg_start_backup > +and pg_stop_backup are used. This is not checked or enforced. Perhaps "WAL archiving or streaming must be enabled ..." I would rather have the patches be submitted with a reasonable approximation at indentation rather than submit a separate indent patch. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Just pushed 0006. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Craig Ringer wrote: > diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm > index 3d11cbb..8c13655 100644 > --- a/src/test/perl/TestLib.pm > +++ b/src/test/perl/TestLib.pm > @@ -112,9 +112,11 @@ INIT > # > sub tempdir > { > + my ($prefix) = @_; > + $prefix = "tmp_test" if (!$prefix); This should be "unless defined $prefix". Otherwise if you pass the string literal "0" as prefix it will be ignored. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2 March 2016 at 07:07, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Craig Ringer wrote:
> diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm
> index 3d11cbb..8c13655 100644
> --- a/src/test/perl/TestLib.pm
> +++ b/src/test/perl/TestLib.pm
> @@ -112,9 +112,11 @@ INIT
> #
> sub tempdir
> {
> + my ($prefix) = @_;
> + $prefix = "tmp_test" if (!$prefix);
This should be "unless defined $prefix". Otherwise if you pass the
string literal "0" as prefix it will be ignored.
Ha. I thought something was funny with !$prefix when splitting that out of Kyotaro Horiguchi's patch but couldn't put my finger on what.
Will amend in the next push. I'll keep the committed 006 Retain tempdirs for failed tests in that series but amend it to show it's committed upstream.
On 2 March 2016 at 05:46, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
I think we should change the existing psql method to be what you propose
as psql_expert. I don't see any advantage in keeping the old one. Many
of the existing uses of psql should become what you call psql_check; but
we should probably call that psql_ok() instead, and also have a
psql_fails() for src/test/recovery/t/001_stream_rep.pl (and others to
come).
I agree and that's what I really wanted to do. I just didn't want to produce a massive diff that renames the method across all of src/bin etc too, since I thought that'd be harder to commit and might have backporting consequences.
If you think that's the way to go I'm very happy with that and will proceed.
> +=item $node->backup_fs_hot(backup_name)
> +
> +Create a backup with a filesystem level copy in $node->backup_dir,
> +including transaction logs. Archiving must be enabled as pg_start_backup
> +and pg_stop_backup are used. This is not checked or enforced.
Perhaps "WAL archiving or streaming must be enabled ..."
Good point, will do.
I would rather have the patches be submitted with a reasonable
approximation at indentation rather than submit a separate indent patch.
The reason I didn't do that is that the indenting in PostgresNode.pm is already well out of whack and, TBH, I didn't want to rebase on top of a perltidy'd version. I can bite the bullet and move the perltidy to the start of the patch series then make sure each subsequent patch is tidy'd but I'd want to be very sure you'd be OK to commit the perltidy of PostgresNode.pm otherwise I'd have to rebase messily all over again...
2016-03-02 6:57 GMT+08:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Just pushed 0006.
This upset buildfarm members running prehistoric Perl versions because is_passing was added after 5.8.8.
Fix attached.
I think I'm going to have to do an archaeology-grade Perl install, there's just too much to keep an eye on manually. Ugh.
Why do we requite a 10yo Perl anyway?
--
On 2 March 2016 at 11:22, Craig Ringer <craig@2ndquadrant.com> wrote:
2016-03-02 6:57 GMT+08:00 Alvaro Herrera <alvherre@2ndquadrant.com>:Just pushed 0006.This upset buildfarm members running prehistoric Perl versions because is_passing was added after 5.8.8.Fix attached.
Really, this time.
Attachment
Attachment
On 2 March 2016 at 10:07, Craig Ringer <craig@2ndquadrant.com> wrote:
-- On 2 March 2016 at 05:46, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
I think we should change the existing psql method to be what you propose
as psql_expert. I don't see any advantage in keeping the old one. Many
of the existing uses of psql should become what you call psql_check; but
we should probably call that psql_ok() instead, and also have a
psql_fails() for src/test/recovery/t/001_stream_rep.pl (and others to
come).I agree and that's what I really wanted to do. I just didn't want to produce a massive diff that renames the method across all of src/bin etc too, since I thought that'd be harder to commit and might have backporting consequences.If you think that's the way to go I'm very happy with that and will proceed.
I'll make the change you suggested to make 'psql_expert' into 'psql' and change call sites to use it or psql_check as appropriate. I'll probably make it an immediately following patch in the series so it's easier to separate the bulk-rename from the functional changes, but it can be trivially squashed for commit.
On reflection I want to keep the name as psql_check, rather than psql_ok. I'll insert another patch that changes src/bin to use psql_check where appropriate.
The reason I used _check rather than psql_ok is partly that psql_check isn't a test. It doesn't run any Test::More checks, it die()s on failure because failure isn't expected but is incidental to the test that's using psql. I did it that way because I don't think the psql invocation should be a test in its self - then you'd have to pass a test name to every psql_ok invocation and you'd get a flood of meaningless micro-tests showing up that obscure the real thing being tested. It'd also be a PITA maintaining the number of tests in the tests => 'n' argument to Test::More.
So I'm inclined to keep it as psql_check, to avoid confusion with the names 'ok' and 'fails' that Test::More uses. It's not actually a test. I don't think we need or should have a psql_ok wrapper, since with this change you can now just write:
is($node->psql('db', 'SELECT syntaxerror;'), 3, 'psql exits with code 3 on syntax error');
which is clearer and more specific than:
$node->psql_ok('db', 'SELECT syntaxerror;', test => 'psql exits on syntax error');
The reason I didn't do that is that the indenting in PostgresNode.pm is already well out of whack and, TBH, I didn't want to rebase on top of a perltidy'd version. I can bite the bullet and move the perltidy to the start of the patch series then make sure each subsequent patch is tidy'd but I'd want to be very sure you'd be OK to commit the perltidy of PostgresNode.pm otherwise I'd have to rebase messily all over again...
This wasn't as bad as I thought. I pulled the tidy changes to the $self->psql stuff into that patch and rebased the rest to the start of the series so it only touches what's currently committed. I agree that's better.
Updated tree pushed. I'll send a new patch series once I've done the psql_ok part.
It's funny that as part of implementing timeline following in logical decoding and implementing failover slots I'm writing perl test framework improvements....
Craig Ringer <craig@2ndquadrant.com> writes: > This upset buildfarm members running prehistoric Perl versions because > is_passing was added after 5.8.8. Sir, RHEL6 is not prehistoric ... and this is failing on my server too. I'm not sure when "is_passing" was added, but it was later than 5.10.1. > Fix attached. Will check and apply. > I think I'm going to have to do an archaeology-grade Perl install, there's > just too much to keep an eye on manually. Ugh. > Why do we requite a 10yo Perl anyway? The point of running ancient versions in the buildfarm is so that we don't move the goalposts on software compatibility without knowing it. When we come across a good reason to desupport an old Perl or Tcl or whatever version, we'll do it; but having to add another ten lines or so of code doesn't seem like a good reason. regards, tom lane
On Wed, Mar 2, 2016 at 1:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Craig Ringer <craig@2ndquadrant.com> writes: >> This upset buildfarm members running prehistoric Perl versions because >> is_passing was added after 5.8.8. > > Sir, RHEL6 is not prehistoric ... and this is failing on my server too. > I'm not sure when "is_passing" was added, but it was later than 5.10.1. Test::Builder is managing this variable, as documented here for people wondering: http://perldoc.perl.org/Test/Builder.html I am tracking the first version as being 5.12.0. {details} would prove to work, it is available in 5.8.8. -- Michael
Craig Ringer <craig@2ndquadrant.com> writes: > Really, really this time, the version in git that actually works, not a > format-patch'd version before I made a last fix. Sigh. I can't even blame > lack of coffee... Hmm, still doesn't work for me: make check-world dies with Can't use string ("Test::Builder") as a HASH ref while "strict refs" in use at / usr/share/perl5/Test/Builder.pm line 1798. END failed--call queue aborted. # Looks like your test exited with 22 just after 14. The referenced line number is the end of the file, which is right in keeping with the TAP tests' infrastructure's habit of being utterly impenetrable when it's not working. My small amount of Perl-fu is not sufficient to debug this. perl-5.10.1-141.el6_7.1.x86_64 perl-Test-Harness-3.17-141.el6_7.1.x86_64 perl-Test-Simple-0.92-141.el6_7.1.x86_64 regards, tom lane
I wrote: > Can't use string ("Test::Builder") as a HASH ref while "strict refs" in use at /usr/share/perl5/Test/Builder.pm line 1798. > The referenced line number is the end of the file, Oh, scratch that; I was looking at the wrong file. Actually, /usr/share/perl5/Test/Builder.pm has sub details { my $self = shift; return @{ $self->{Test_Results} }; } and line 1798 is the "return" statement in that. I still lack enough Perl-fu to decipher this, though. regards, tom lane
Craig Ringer wrote: > On 2 March 2016 at 07:07, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > > Craig Ringer wrote: > > > > > diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm > > > index 3d11cbb..8c13655 100644 > > > --- a/src/test/perl/TestLib.pm > > > +++ b/src/test/perl/TestLib.pm > > > @@ -112,9 +112,11 @@ INIT > > > # > > > sub tempdir > > > { > > > + my ($prefix) = @_; > > > + $prefix = "tmp_test" if (!$prefix); > > > > This should be "unless defined $prefix". Otherwise if you pass the > > string literal "0" as prefix it will be ignored. > > > Ha. I thought something was funny with !$prefix when splitting that out of > Kyotaro Horiguchi's patch but couldn't put my finger on what. > > Will amend in the next push. Pushed it with that fix. I also added a further "data_" prefix, so it's "data_${name}_XXXX" now. Hopefully this is less problematic than yesterday's ... -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3 March 2016 at 05:26, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Pushed it with that fix. I also added a further "data_" prefix, so it's
"data_${name}_XXXX" now. Hopefully this is less problematic than
yesterday's ...
On the Perl 5.8.8 test env I've set up now, per
master currently fails with
t/004_timeline_switch....."remove_tree" is not exported by the File::Path module
remove_tree is only in File::Path 2.07 but Perl 5.8.8 has 1.08. No buildfarm member has complained, so clearly we don't actually test with the stated supported Perl version.
The attached patch fixes it to use the legacy File::Path interface 'rmtree' so master runs on 588 again.
Attachment
On Thu, Mar 3, 2016 at 2:20 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > On the Perl 5.8.8 test env I've set up now, per > > http://www.postgresql.org/message-id/CAMsr+YGR6pU-gUyp-FT98XwXAsc9n6j-awZAqxvW_+P3RTC7cg@mail.gmail.com > > master currently fails with > > t/004_timeline_switch....."remove_tree" is not exported by the File::Path > module > > remove_tree is only in File::Path 2.07 but Perl 5.8.8 has 1.08. No buildfarm > member has complained, so clearly we don't actually test with the stated > supported Perl version. > > The attached patch fixes it to use the legacy File::Path interface 'rmtree' > so master runs on 588 again. Crap. Thanks for spotting that, I was careless. You could still use qw() and not use File::Path::rmtree, but that's a matter of taste. -- Michael
On 3 March 2016 at 13:23, Michael Paquier <michael.paquier@gmail.com> wrote:
On Thu, Mar 3, 2016 at 2:20 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On the Perl 5.8.8 test env I've set up now, per
>
> http://www.postgresql.org/message-id/CAMsr+YGR6pU-gUyp-FT98XwXAsc9n6j-awZAqxvW_+P3RTC7cg@mail.gmail.com
>
> master currently fails with
>
> t/004_timeline_switch....."remove_tree" is not exported by the File::Path
> module
>
> remove_tree is only in File::Path 2.07 but Perl 5.8.8 has 1.08. No buildfarm
> member has complained, so clearly we don't actually test with the stated
> supported Perl version.
>
> The attached patch fixes it to use the legacy File::Path interface 'rmtree'
> so master runs on 588 again.
Crap. Thanks for spotting that, I was careless. You could still use
qw() and not use File::Path::rmtree, but that's a matter of taste.
New patch series.
The first three are simple fixes that should go in without fuss:
001 fixes the above 5.8.8 compat issue.
002 fixes another minor whoopsie, a syntax error in src/test/recovery/t/003_recovery_targets.pl that never got noticed because exit codes are ignored.
003 runs perltidy on PostgresNode.pm to bring it back into conformance after the recovery tests commit.
The rest are feature patches:
004 allows filtering on RecursiveCopy by a predicate function. Needed for filesystem level backups (007). It could probably be squashed with 007 if desired.
005 adds the new psql functions psql_expert and psql_check. Then 006 renames psql_expert to psql and fixes up all call sites across the tree to use the new interface. I found the bug in 002 as part of that process. I anticipate that 005 and 006 would be squashed into one commit to master, but I've kept them separate in my tree for easier review.
007 adds PostgresNode support for hot and cold filesystem-level backups using pg_start_backup and pg_stop_backup, which will be required for some coming tests and are useful by themselves.
Each patch is pretty simple except for the psql patch, and even that's not exactly hairy stuff. The potential risks are low as this is code that's not part of the server and isn't even run on 'make check' by default. I've tested it all under a real Perl 5.8.8 on CentOS 5.
I don't expect to be doing much more on the framework at this point as I want to be able to get back to the code I had to enhance the framework in order to test....
Attachment
- 0001-TAP-File-Path-remove_tree-is-new-in-2.x-use-1.x-rmtr.patch
- 0002-TAP-Fix-syntax-error-in-recovery-tests.patch
- 0003-TAP-Perltidy-PostgresNode.pm.patch
- 0004-TAP-Add-filtering-to-RecursiveCopy.patch
- 0005-TAP-Add-easier-more-flexible-ways-to-invoke-psql.patch
- 0006-TAP-Rename-psql_expert-to-psql-and-use-it-in-all-tes.patch
- 0007-TAP-Add-support-for-taking-filesystem-level-backups.patch
On Thu, Mar 3, 2016 at 4:11 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > The first three are simple fixes that should go in without fuss: > > 001 fixes the above 5.8.8 compat issue. > 002 fixes another minor whoopsie, a syntax error in > src/test/recovery/t/003_recovery_targets.pl that never got noticed because > exit codes are ignored. Those two are embarrassing things... However: -$node_master->psql('postgres', - "SELECT pg_create_restore_point('$recovery_name'"); +$node_master->psql_check('postgres', + "SELECT pg_create_restore_point('$recovery_name');"); In 0002 this is not correct, psql_check is part of a routine you introduce later on. > 003 runs perltidy on PostgresNode.pm to bring it back into conformance after > the recovery tests commit. No objections to that. > The rest are feature patches: > 004 allows filtering on RecursiveCopy by a predicate function. Needed for > filesystem level backups (007). It could probably be squashed with 007 if > desired. Adding perldoc to this module should be done separately, let's not mix things. Applying a filter is useful as well to remove for example the contents of pg_xlog, so no objections to it. > 005 adds the new psql functions psql_expert and psql_check. Then 006 renames > psql_expert to psql and fixes up all call sites across the tree to use the > new interface. I found the bug in 002 as part of that process. I anticipate > that 005 and 006 would be squashed into one commit to master, but I've kept > them separate in my tree for easier review. psql_check sounds wrong to me. I thought first that this triggers a test. Why not psql_simple or psql_basic, or just keep psql. > 007 adds PostgresNode support for hot and cold filesystem-level backups > using pg_start_backup and pg_stop_backup, which will be required for some > coming tests and are useful by themselves. It would be nice as well to have a flag in _backup_fs to filter the contents of pg_xlog at will. log_collector is not enabled in the nodes deployed by PostgresNode, so why filtering it? -- Michael
On 3 March 2016 at 21:16, Michael Paquier <michael.paquier@gmail.com> wrote:
On Thu, Mar 3, 2016 at 4:11 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> The first three are simple fixes that should go in without fuss:
>
> 001 fixes the above 5.8.8 compat issue.
> 002 fixes another minor whoopsie, a syntax error in
> src/test/recovery/t/003_recovery_targets.pl that never got noticed because
> exit codes are ignored.
Those two are embarrassing things... However:
-$node_master->psql('postgres',
- "SELECT pg_create_restore_point('$recovery_name'");
+$node_master->psql_check('postgres',
+ "SELECT pg_create_restore_point('$recovery_name');");
In 0002 this is not correct, psql_check is part of a routine you
introduce later on.
Damn, I meant to fix that when I rebased it back in the history but forgot. Thankyou.
> The rest are feature patches:
> 004 allows filtering on RecursiveCopy by a predicate function. Needed for
> filesystem level backups (007). It could probably be squashed with 007 if
> desired.
Adding perldoc to this module should be done separately, let's not mix
things. Applying a filter is useful as well to remove for example the
contents of pg_xlog, so no objections to it.
Eh, ok. I figured it was so trivial it didn't matter, but will split.
> 005 adds the new psql functions psql_expert and psql_check. Then 006 renames
> psql_expert to psql and fixes up all call sites across the tree to use the
> new interface. I found the bug in 002 as part of that process. I anticipate
> that 005 and 006 would be squashed into one commit to master, but I've kept
> them separate in my tree for easier review.
psql_check sounds wrong to me. I thought first that this triggers a
test. Why not psql_simple or psql_basic, or just keep psql.
I guess I'm used to Python's subprocess.check_call so to me it's natural.
I want something that makes it clear that failure is a fatal error condition, i.e. "do this in psql and if it produces an error, treat it like you would any other error in Perl and die appropriately".
> 007 adds PostgresNode support for hot and cold filesystem-level backups
> using pg_start_backup and pg_stop_backup, which will be required for some
> coming tests and are useful by themselves.
It would be nice as well to have a flag in _backup_fs to filter the
contents of pg_xlog at will.
i.e. copy without any pg_xlog at all? without_xlog => 1 ?
log_collector is not enabled in the nodes
deployed by PostgresNode, so why filtering it?
Because at the time I wrote that the test dirs were deleted unconditionally and I didn't bother checking if we actually wrote anything to pg_log. Also, because if it _does_ get enabled by someone I can't imagine why we'd want to copy the logs.
Craig Ringer wrote: > On 3 March 2016 at 21:16, Michael Paquier <michael.paquier@gmail.com> wrote: > > > The rest are feature patches: > > > 004 allows filtering on RecursiveCopy by a predicate function. Needed for > > > filesystem level backups (007). It could probably be squashed with 007 if > > > desired. > > > > Adding perldoc to this module should be done separately, let's not mix > > things. Applying a filter is useful as well to remove for example the > > contents of pg_xlog, so no objections to it. > > Eh, ok. I figured it was so trivial it didn't matter, but will split. Please don't mess with this one. > > psql_check sounds wrong to me. I thought first that this triggers a > > test. Why not psql_simple or psql_basic, or just keep psql. > > I guess I'm used to Python's subprocess.check_call so to me it's natural. > > I want something that makes it clear that failure is a fatal error > condition, i.e. "do this in psql and if it produces an error, treat it like > you would any other error in Perl and die appropriately". Shrug. psql_check seemed reasonable to me for that. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Craig Ringer wrote: > The first three are simple fixes that should go in without fuss: > > 001 fixes the above 5.8.8 compat issue. > > 002 fixes another minor whoopsie, a syntax error in src/test/recovery/t/ > 003_recovery_targets.pl that never got noticed because exit codes are > ignored. > > 003 runs perltidy on PostgresNode.pm to bring it back into conformance > after the recovery tests commit. > > The rest are feature patches: > > 004 allows filtering on RecursiveCopy by a predicate function. Needed for > filesystem level backups (007). It could probably be squashed with 007 if > desired. > > 005 adds the new psql functions psql_expert and psql_check. Then 006 > renames psql_expert to psql and fixes up all call sites across the tree to > use the new interface. I found the bug in 002 as part of that process. I > anticipate that 005 and 006 would be squashed into one commit to master, > but I've kept them separate in my tree for easier review. > > 007 adds PostgresNode support for hot and cold filesystem-level backups > using pg_start_backup and pg_stop_backup, which will be required for some > coming tests and are useful by themselves. Okay, so far I have pushed 0001 and 0002 squashed (commit 5bec1ad4648), 0003 (commit 7d9a4301c08), 0005 and 0006 squashed (commit 2c83f435a3de). In the last one I chose to rename your psql_check to safe_psql and tweaked a few other things, not worthy of individual mention. I think the result should still work on Perl 5.8 though I didn't actually verify that -- I don't think I made any changes that would affect portability. I will be downloading your Dockerized stuff shortly while I still have a convenient network connection, just in case. Patches 0004 and 0007 remain. I think 0007 is uncontroversial; I reworked 0004 a bit and gave it to someone else to finish a couple of didn't I didn't quite like -- hopefully she'll be submitting a new version soonish. Once we have that I'm happy to push them too. > I don't expect to be doing much more on the framework at this point as I > want to be able to get back to the code I had to enhance the framework in > order to test.... How come!?!? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 4 March 2016 at 05:08, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Okay, so far I have pushed 0001 and 0002 squashed (commit 5bec1ad4648),
0003 (commit 7d9a4301c08), 0005 and 0006 squashed (commit 2c83f435a3de).
In the last one I chose to rename your psql_check to safe_psql and
tweaked a few other things, not worthy of individual mention. I think
the result should still work on Perl 5.8 though I didn't actually verify
that -- I don't think I made any changes that would affect portability.
I will be downloading your Dockerized stuff shortly while I still have a
convenient network connection, just in case.
Thanks very much. Your commit messages are much better too.
Patches 0004 and 0007 remain.
For readers who're not following closely that's the filtering support for RecursiveCopy and the support for taking filesystem-level backups in PostgresNode that uses it.
I think 0007 is uncontroversial; I
reworked 0004 a bit and gave it to someone else to finish a couple of
didn't I didn't quite like -- hopefully she'll be submitting a new
version soonish. Once we have that I'm happy to push them too.
Great, thanks.
> I don't expect to be doing much more on the framework at this point as I
> want to be able to get back to the code I had to enhance the framework in
> order to test....
How come!?!?
This yak grows hair faster than I can shave it ;)
On Fri, Mar 4, 2016 at 8:22 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 4 March 2016 at 05:08, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> Patches 0004 and 0007 remain. > > For readers who're not following closely that's the filtering support for > RecursiveCopy and the support for taking filesystem-level backups in > PostgresNode that uses it. I am still not sure that the filter for pg_log by default is that useful but... >> I think 0007 is uncontroversial; I >> reworked 0004 a bit and gave it to someone else to finish a couple of >> didn't I didn't quite like -- hopefully she'll be submitting a new >> version soonish. Once we have that I'm happy to push them too. > > Great, thanks. No objections from here as well for the feature itself. I am glad to see interest in extending the current infrastructure, and just wondering what kind of tests are going to show up. -- Michael
On 4 March 2016 at 12:54, Michael Paquier <michael.paquier@gmail.com> wrote:
No objections from here as well for the feature itself. I am glad to
see interest in extending the current infrastructure, and just
wondering what kind of tests are going to show up.
The tests themselves really aren't that exciting. Create a slot, online base-backup the node, start it in archive recovery, immediately fail over to it or do some amount of writes first, do some more writes on it, then read from the logical slot on the replica and prove that we can successfully read the logical change change stream across the timeline switch.
I'm now adding a module that goes a step further by exposing some functions to SQL (test only, if you use it in production you're insane) to permit creation of a logical slot on a replica and to advance that slot's persistent data. It basically lets you advance the slot state on the replica while it's in recovery so you can retain slot states copied with a filesystem-level base backup and have the slots ready to use when you promote the server.
I'm doing it to prove the concept of doing logical failover with an extension and without full WAL-based failover slots in order to get the timeline following for logical decoding patch committed separately to, and prior to, failover slots. It's complicated enough that it needs a variety of tests, but self-contained and non-intrusive enough to be a fairly easy commit if it's shown to work well.
On 4 March 2016 at 05:08, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Okay, so far I have pushed 0001 and 0002 squashed (commit 5bec1ad4648),
0003 (commit 7d9a4301c08), 0005 and 0006 squashed (commit 2c83f435a3de).
In the last one I chose to rename your psql_check to safe_psql and
tweaked a few other things, not worthy of individual mention.
I've just noticed that I failed to initialized $$timed_out to zero in PostgresNode::psql if a ref is passed for it.
Fix attached.
The tests are proving useful already; I've shown that timeline following works great with a filesystem-level copy, but that there are WAL retention issues (unsurprisingly) when the backup is created without slots then the slot state is synced over by an external client.
On 4 March 2016 at 20:35, Craig Ringer <craig@2ndquadrant.com> wrote:
Fix attached.
Apparently I need a "remind me if you see the word attach in an email" plugin. Sigh.
Attachment
Craig Ringer wrote: > 004 allows filtering on RecursiveCopy by a predicate function. Needed for > filesystem level backups (007). It could probably be squashed with 007 if > desired. I pushed this after some tinkering: * filtering applies to all directory entries, not just files. So you can filter a subdirectory, for example. If you have symlinks (which you know this module will bomb out on) you can skip them too. * The filter sub receives the path relative to the initial source dir, rather than the absolute path. That part was just making it difficult to use; in your POD example you were testing /pg_log/ as a regex, which skipped the *files* but not the directory itself. Now you can do simply "$var ne 'pg_log'". (Pallavi Sontakke implemented most of this.) * Your test for "ref $filterfn" allowed any reference to be passed, not just a code reference. I didn't test passing some other type of reference; I assume you'd just get a very ugly error message. Thanks, -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Craig Ringer wrote: > 007 adds PostgresNode support for hot and cold filesystem-level backups > using pg_start_backup and pg_stop_backup, which will be required for some > coming tests and are useful by themselves. Finally, pushed this one after rebasing on top of the changes in the others. I think we're done here, at last ... -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 10 March 2016 at 05:30, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Craig Ringer wrote:
> 004 allows filtering on RecursiveCopy by a predicate function. Needed for
> filesystem level backups (007). It could probably be squashed with 007 if
> desired.
I pushed this after some tinkering:
* filtering applies to all directory entries, not just files. So you
can filter a subdirectory, for example. If you have symlinks (which you
know this module will bomb out on) you can skip them too.
* The filter sub receives the path relative to the initial source dir,
rather than the absolute path. That part was just making it difficult
to use; in your POD example you were testing /pg_log/ as a regex, which
skipped the *files* but not the directory itself.
That was actually intentional, but I agree that your change is better.
Thanks for pushing.
I found a minor issue with the new psql method while writing tests for failover slots. Patch attached.
Attachment
Craig Ringer wrote: > I found a minor issue with the new psql method while writing tests for > failover slots. Patch attached. Thanks, pushed. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services