Thread: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
"Amir Rohan"
Date:
On 10/02/2015 03:33 PM, Michael Paquier wrote: > Any server instances created during the tests should never use a > user-defined port for portability. Hence using those ports as keys > just made sense. We could have for example custom names, that have > port values assigned to them, but that's actually an overkill and > complicates the whole facility. > Something like: global nPortsAssigned = 0; AssignPort() -> return is_ok(nPortsAssigned++) was what I used. >> 2) Behaviour (paths in particular) is hardwired rather then overridable >> defaults. > > This is the case of all the TAP tests. We could always use the same > base directory for all the nodes and then embed a sub-directory whose > name is decided using the port number. But I am not really sure if > that's a win. > I understand, but it eliminates the kind of scenarios this convenience package lets you express... conveniently. >> This is exactly what I needed to test, problems: >> 3) Can't stop server without clearing its testing data (the maps holding >> paths and things). But that data might be specifically >> needed, in particular the backup shouldn't disappear when the >> server melts down or we have a very low-grade DBA on our hands. > > OK, you have a point here. You may indeed want routines for to enable > and disable a node completely decoupled from start and stop, with > something like enable_node and disable_node that basically registers > or unregisters it from the list of active nodes. I have updated the > patch this way. > Excellent. >> 4) Port assignment relies on liveness checks on running servers. >> If a server is shut down and a new instantiated, the port will get >> reused, data will get trashed, and various confusing things can happen. > > Right. The safest way to do that is to check in get_free_port if a > port number is used by a registered node, and continue to loop in if > that's the case. So done. > That eliminates the "sweet and gentle" variant of the scenario, but it's susceptible to the "ABA problem": https://en.wikipedia.org/wiki/ABA_problem https://youtu.be/CmxkPChOcvw?t=786 Granted, you have to try fairly hard to shoot yourself in the leg, but since the solution is so simple, why not? If we never reuse ports within a single test, this goes away. >> 5) Servers are shutdown with -m 'immediate', which can lead to races >> in the script when archiving is turned on. That may be good for some >> tests, but there's no control over it. > > I hesitated with fast here actually. So changed this way. We would > want as wall a teardown command to stop the node with immediate and > unregister the node from the active list. > In particular, I was shutting down an archiving node and the archiving was truncated. I *think* smart doesn't do this. But again, it's really that the test writer can't easily override, not that the default is wrong. >> Other issues: >> 6. Directory structure, used one directory per thing but more logical >> to place all things related to an instance under a single directory, >> and name them according to role (57333_backup, and so on). > > Er, well. The first version of the patch did so, and then I switched > to an approach closer to what the existing TAP facility is doing. But > well let's simplify things a bit. > I know, I know, but: 1) an instance is a "thing" in your script, so having its associated paraphernalia in one place makes more sense (maybe only to me). 2) That's often how folks (well, how I) arrange things in deployment, at least with archive/backup as symlinks to the nas. Alternatively, naming the dirs with a prefix (srv_foo_HASH, backup_foo_backupname_HASH, etc') would work as well. >> 7. enable_restoring() uses "cp -i" 'archive_command', not a good fit >> for an automated test. > > This seems like a good default to me, and actually that's portable on > Windows easily. One could always append a custom archive_command in a > test when for example testing conflicting archiving when archive_mode > = always. > Ok, I wasn't sure about this, but specifically activating a switch that asks for input from the user during a test? hmm. >> 8. No canned way to output a pprinted overview of the running system >> (paths, ports, for manual checking). > > Hm. Why not... Are you suggesting something like print_current_conf > that goes through all the registered nodes and outputs that? How would > you use it? > For one thin, I could open a few terminals and `$(print_env_for_server 5437), so psql just worked. I wish PEG had that as well. >> 10. If a test passes/fails or dies due to a bug, everything is cleaned. >> Great for testing, bad for postmortem. > > That's something directly related to TestLib.pm where > File:Temp:tempdir creates a temporary path with CLEANUP = 1. We had > discussions regarding that actually... > >> 11. a canned "server is responding to queries" helper would be convenient. > > Do you mean a wrapper on pg_isready? Do you have use cases in mind for it? > Block until recovery is finished, before testing. eliminate races, and avoid the stupid sleep(3) I used. >> It might be a good idea to: >> 1) Never reuse ports during a test. Liveness checking is used >> to avoid collisions, but should not determine order of assignment. > > Agreed. As far as I can see the problem here is related to the fact > that the port of non-running server may be fetched by another one. > That's a bug of my patch. > >> 2) Decouple cleanup from server shutdown. Do the cleanup as the end of >> test only, and allow the user to keep things around. > > Agreed here. > >> 3) Adjust the directory structure to one top directory per server with >> (PGDATA, backup, archive) subdirs. > > Hm. OK. The first version of the patch actually did so. > Well, why does "consistency with TAP test" trump the advantages I mentioned? does TAP actually care? >> 4) Instead of passing ports around as keys, have _explicit functions >> which can be called directly by the user (I'd like the backup *HERE* >> please), with the current functions refactored to merely invoke them >> by interpolating in the values associated with the port they were given. > > I don't really see in what this would be a win. We definitely should > have all the data depending on temporary paths during the tests to > facilitate the cleanup wrapper work. > The trouble was that I reused paths between servers. If shutdown/cleanup are decoupled, this is probably not needed. >> 4b) server shutdown should perhaps be "smart" by default, or segmented >> into calmly_bring_to_a_close(), pull_electric_plug() and >> drop_down_the_stairs_into_swimming_pool(). > > Nope, not agreeing here. "immediate" is rather violent to stop a node, > hence I have switched it to use "fast" and there is now a > teardown_node routine that uses immediate, that's more aimed at > cleanup up existing things fiercely. > Ok, not as the default, but possible to request a specific kind of shutdown. I needed smart in my case. Plus, in a scenario, you might expressly be testing behavior for a specific mode, it needs to be controllable. > I have as well moved RecoveryTest.pm to src/test/perl so as all the > consumers of prove_check can use it by default, and decoupled > start_node from make_master and make_*_standby so as it is possible to > add for example new parameters to their postgresql.conf and > recovery.conf files before starting them. > > Thanks a lot for the feedback! Attached is an updated patch with all > the things mentioned above done. Are included as well the typo fixes > you sent upthread. > Regards, > Great! I'll look at it and post again if there's more. If any of the above extra explanations make it clearer why I suggested some changes you didn't like... Thanks, Amir
Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Fri, Oct 2, 2015 at 11:10 PM, Amir Rohan wrote: > On 10/02/2015 03:33 PM, Michael Paquier wrote: > > Any server instances created during the tests should never use a > > user-defined port for portability. Hence using those ports as keys > > just made sense. We could have for example custom names, that have > > port values assigned to them, but that's actually an overkill and > > complicates the whole facility. > > > > Something like: > > global nPortsAssigned = 0; > AssignPort() -> return is_ok(nPortsAssigned++) > > was what I used. Why do you need that. Creating a node is in the most basic way a matter of calling make_master, make_*_standby where a port number, or identifier gets uniquely assigned to a node. The main point of the approach taken by this patch is to make port assignment transparent for the caller. > >> 4) Port assignment relies on liveness checks on running servers. > >> If a server is shut down and a new instantiated, the port will get > >> reused, data will get trashed, and various confusing things can happen. > > > > Right. The safest way to do that is to check in get_free_port if a > > port number is used by a registered node, and continue to loop in if > > that's the case. So done. > > > > That eliminates the "sweet and gentle" variant of the scenario, but it's > susceptible to the "ABA problem": > https://en.wikipedia.org/wiki/ABA_problem > https://youtu.be/CmxkPChOcvw?t=786 I learnt a new thing here. That's basically an existing problem even with the existing perl test infrastructure relying on TestLib.pm when tests are run in parallel. What we would need here is a global mapping file storing all the port numbers used by all the nodes currently in the tests. > > Granted, you have to try fairly hard to shoot yourself in the leg, > but since the solution is so simple, why not? If we never reuse ports > within a single test, this goes away. Well, you can reuse the same port number in a test. Simply teardown the existing node and then recreate a new one. I think that port number assignment to a node should be transparent to the caller, in our case the perl test script holding a scenario. > >> 5) Servers are shutdown with -m 'immediate', which can lead to races > >> in the script when archiving is turned on. That may be good for some > >> tests, but there's no control over it. > > > > I hesitated with fast here actually. So changed this way. We would > > want as wall a teardown command to stop the node with immediate and > > unregister the node from the active list. > > > > In particular, I was shutting down an archiving node and the archiving > was truncated. I *think* smart doesn't do this. But again, it's really > that the test writer can't easily override, not that the default is wrong. Ah, OK. Then fast is just fine. It shuts down the node correctly. "smart" would wait for all the current connections to finish but I am wondering if it currently matters here: I don't see yet a clear use case yet where it would make sense to have multi-threaded script... If somebody comes back with a clear idea here perhaps we could revisit that but it does not seem worth it now. > >> Other issues: > >> 6. Directory structure, used one directory per thing but more logical > >> to place all things related to an instance under a single directory, > >> and name them according to role (57333_backup, and so on). > > > > Er, well. The first version of the patch did so, and then I switched > > to an approach closer to what the existing TAP facility is doing. But > > well let's simplify things a bit. > > > > I know, I know, but: > 1) an instance is a "thing" in your script, so having its associated > paraphernalia in one place makes more sense (maybe only to me). > 2) That's often how folks (well, how I) arrange things in deployment, > at least with archive/backup as symlinks to the nas. > > Alternatively, naming the dirs with a prefix (srv_foo_HASH, > backup_foo_backupname_HASH, etc') would work as well. The useful portion about tempdir is that it cleans up itself automatically should an error happen. It does not seem to me we want use that. > >> 7. enable_restoring() uses "cp -i" 'archive_command', not a good fit > >> for an automated test. > > > > This seems like a good default to me, and actually that's portable on > > Windows easily. One could always append a custom archive_command in a > > test when for example testing conflicting archiving when archive_mode > > = always. > > > > Ok, I wasn't sure about this, but specifically activating a switch that > asks for input from the user during a test? hmm. Er... The -i switch is a bad idea. I removed it. Honestly I don't recall why it was here to begin with... > >> 8. No canned way to output a pprinted overview of the running system > >> (paths, ports, for manual checking). > > > > Hm. Why not... Are you suggesting something like print_current_conf > > that goes through all the registered nodes and outputs that? How would > > you use it? > > > > For one thin, I could open a few terminals and `$(print_env_for_server > 5437), so psql just worked. > I wish PEG had that as well. Hm. Isn't that coupled with the case where a failure happens then but tempdirs are cleaned up then? I would expect hackers to run those runs until the end. If a failure happens, it would then be useful to get a dump of what happens. However, it seems to me that we can get the same information by logging all the information when creating a node in the log file. I have added a routine in this sense, which is called each time a node is initialized. It seems helpful either way. > >> 11. a canned "server is responding to queries" helper would be convenient. > > > > Do you mean a wrapper on pg_isready? Do you have use cases in mind for it? > > > > Block until recovery is finished, before testing. eliminate races, and > avoid the stupid sleep(3) I used. TODO > >> 4b) server shutdown should perhaps be "smart" by default, or segmented > >> into calmly_bring_to_a_close(), pull_electric_plug() and > >> drop_down_the_stairs_into_swimming_pool(). > > > > Nope, not agreeing here. "immediate" is rather violent to stop a node, > > hence I have switched it to use "fast" and there is now a > > teardown_node routine that uses immediate, that's more aimed at > > cleanup up existing things fiercely. > > > > Ok, not as the default, but possible to request a specific kind of > shutdown. I needed smart in my case. Plus, in a scenario, you might > expressly be testing behavior for a specific mode, it needs to be > controllable. If your test script is running with a single thread, "fast" or "smart" would not really make a difference, no? > > I have as well moved RecoveryTest.pm to src/test/perl so as all the > > consumers of prove_check can use it by default, and decoupled > > start_node from make_master and make_*_standby so as it is possible to > > add for example new parameters to their postgresql.conf and > > recovery.conf files before starting them. > > > > Thanks a lot for the feedback! Attached is an updated patch with all > > the things mentioned above done. Are included as well the typo fixes > > you sent upthread. > > Regards, > > > > Great! I'll look at it and post again if there's more. If any of the > above extra explanations make it clearer why I suggested some changes > you didn't like... I am attaching a new version. I found a small bug in test case 001 when checking if standby 2 has caught up. There is also this dump function that is helpful. The -i switch in cp command has been removed as well. -- Michael
Attachment
Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Amir Rohan
Date:
On 10/03/2015 02:38 PM, Michael Paquier wrote: > On Fri, Oct 2, 2015 at 11:10 PM, Amir Rohan wrote: >> On 10/02/2015 03:33 PM, Michael Paquier wrote: >>> Any server instances created during the tests should never use a >>> user-defined port for portability. Hence using those ports as keys >>> just made sense. We could have for example custom names, that have >>> port values assigned to them, but that's actually an overkill and >>> complicates the whole facility. >>> >> >> Something like: >> >> global nPortsAssigned = 0; >> AssignPort() -> return is_ok(nPortsAssigned++) >> >> was what I used. > > Why do you need that. Creating a node is in the most basic way a > matter of calling make_master, make_*_standby where a port number, or > identifier gets uniquely assigned to a node. The main point of the > approach taken by this patch is to make port assignment transparent > for the caller. > See next. >> Granted, you have to try fairly hard to shoot yourself in the leg, >> but since the solution is so simple, why not? If we never reuse ports >> within a single test, this goes away. > > Well, you can reuse the same port number in a test. Simply teardown > the existing node and then recreate a new one. I think that port > number assignment to a node should be transparent to the caller, in > our case the perl test script holding a scenario. > What part of "Never assign the same port twice during one test" makes this "not transparent to the user"? If you're thinking about parallel test, I don't think you need to worry. Availability checks take care of one part, and the portnum-as-map-key-is-test-local takes care of the other. But, see next. > >>>> 4) Port assignment relies on liveness checks on running servers. >>>> If a server is shut down and a new instantiated, the port will get >>>> reused, data will get trashed, and various confusing things can happen. >>> >>> Right. The safest way to do that is to check in get_free_port if a >>> port number is used by a registered node, and continue to loop in if >>> that's the case. So done. >>> >> >> That eliminates the "sweet and gentle" variant of the scenario, but it's >> susceptible to the "ABA problem": >> https://en.wikipedia.org/wiki/ABA_problem >> https://youtu.be/CmxkPChOcvw?t=786 > > I learnt a new thing here. That's basically an existing problem even > with the existing perl test infrastructure relying on TestLib.pm when > tests are run in parallel. What we would need here is a global mapping > file storing all the port numbers used by all the nodes currently in > the tests. > Yeah, a poorman's way to ensure ports aren't reused (I wasn't very clear at top of post ) is something like: global nPortsAssigned = 0; AssignPort(): basePort = BASE_PORT; # the lowest port we use while(!available(basePort+nPortsAssigned)): basePort++ nPortsAssigned++ return basePort; It has its glaring faults, but would probably work ok. In any case, I'm sure you can do better. >> >> Granted, you have to try fairly hard to shoot yourself in the leg, >> but since the solution is so simple, why not? If we never reuse ports >> within a single test, this goes away. > > Well, you can reuse the same port number in a test. Simply teardown > the existing node and then recreate a new one. I think that port > number assignment to a node should be transparent to the caller, in > our case the perl test script holding a scenario. > I was using you *never* want to reuse port numbers. That is, as long as the lib ensures we never reuse ports within one test, all kinds of corner cases are eliminated. >>>> 5) Servers are shutdown with -m 'immediate', which can lead to races >>>> in the script when archiving is turned on. That may be good for some >>>> tests, but there's no control over it. >>> >>> I hesitated with fast here actually. So changed this way. We would >>> want as wall a teardown command to stop the node with immediate and >>> unregister the node from the active list. >>> >> >> In particular, I was shutting down an archiving node and the archiving >> was truncated. I *think* smart doesn't do this. But again, it's really >> that the test writer can't easily override, not that the default is wrong. > > Ah, OK. Then fast is just fine. It shuts down the node correctly. > "smart" would wait for all the current connections to finish but I am > wondering if it currently matters here: I don't see yet a clear use > case yet where it would make sense to have multi-threaded script... If > somebody comes back with a clear idea here perhaps we could revisit > that but it does not seem worth it now. > My mistake. Perhaps what would be useful though is a way to force "messy" shutdown. a kill -9, basically. It's a recovery test suite, right?. >>>> Other issues: >>>> 6. Directory structure, used one directory per thing but more logical >>>> to place all things related to an instance under a single directory, >>>> and name them according to role (57333_backup, and so on). >>> >>> Er, well. The first version of the patch did so, and then I switched >>> to an approach closer to what the existing TAP facility is doing. But >>> well let's simplify things a bit. >>> >> >> I know, I know, but: >> 1) an instance is a "thing" in your script, so having its associated >> paraphernalia in one place makes more sense (maybe only to me). >> 2) That's often how folks (well, how I) arrange things in deployment, >> at least with archive/backup as symlinks to the nas. >> >> Alternatively, naming the dirs with a prefix (srv_foo_HASH, >> backup_foo_backupname_HASH, etc') would work as well. > > The useful portion about tempdir is that it cleans up itself > automatically should an error happen. It does not seem to me we want > use that. > Ensuring cleanup and directory structure aren't inherently related. Testlib makes cleanup easy if you're willing to accept its flat structure. But writing something that does cleanup and lets yo control directory structure is perfectly doable. The question is only if you agree or not that having per-server directories could be convenient. Tying into the next, if you don't think anyone ever need to look into these directories (which I disagree with), then dir structure indeed doesn't matter. >> >> 8. No canned way to output a pprinted overview of the running system >>>> (paths, ports, for manual checking). >>> >>> Hm. Why not... Are you suggesting something like print_current_conf >>> that goes through all the registered nodes and outputs that? How would >>> you use it? >>> >> >> For one thin, I could open a few terminals and `$(print_env_for_server >> 5437), so psql just worked. >> I wish PEG had that as well. > > Hm. Isn't that coupled with the case where a failure happens then but > tempdirs are cleaned up then? But I've mentioned that's inconvenient as well. If you don't think /that/ should be fixed, then yes, there's no point adding this. Still, perhaps doing both (+previous) would make writing tests easier. At least for me, writing tests isn't simply a typing exercise. I always need to inspect the system at stages during the test. It's only after the test is ready that cleanup is useful, before that it actually hampers work. > I would expect hackers to run those runs > until the end. I agree -- when you're running them , but what about when you're /writing/ them? If a failure happens, it would then be useful to get a > dump of what happens. However, it seems to me that we can get the same > information by logging all the information when creating a node in the > log file. I have added a routine in this sense, which is called each > time a node is initialized. It seems helpful either way. > >>>> 11. a canned "server is responding to queries" helper would be convenient. >>> >>> Do you mean a wrapper on pg_isready? Do you have use cases in mind for it? >>> >> >> Block until recovery is finished, before testing. eliminate races, and >> avoid the stupid sleep(3) I used. > > TODO > >>>> 4b) server shutdown should perhaps be "smart" by default, or segmented >>>> into calmly_bring_to_a_close(), pull_electric_plug() and >>>> drop_down_the_stairs_into_swimming_pool(). >>> >>> Nope, not agreeing here. "immediate" is rather violent to stop a node, >>> hence I have switched it to use "fast" and there is now a >>> teardown_node routine that uses immediate, that's more aimed at >>> cleanup up existing things fiercely. >>> >> >> Ok, not as the default, but possible to request a specific kind of >> shutdown. I needed smart in my case. Plus, in a scenario, you might >> expressly be testing behavior for a specific mode, it needs to be >> controllable. > > If your test script is running with a single thread, "fast" or "smart" > would not really make a difference, no? > It would If there's a bug in one of them and I'm trying to write a regression test for it. Recall, this was part of broader view of "provide defaults, allow override" I was suggesting. >>> I have as well moved RecoveryTest.pm to src/test/perl so as all the >>> consumers of prove_check can use it by default, and decoupled >>> start_node from make_master and make_*_standby so as it is possible to >>> add for example new parameters to their postgresql.conf and >>> recovery.conf files before starting them. >>> >>> Thanks a lot for the feedback! Attached is an updated patch with all >>> the things mentioned above done. Are included as well the typo fixes >>> you sent upthread. >>> Regards, >>> >> >> Great! I'll look at it and post again if there's more. If any of the >> above extra explanations make it clearer why I suggested some changes >> you didn't like... > > I am attaching a new version. I found a small bug in test case 001 > when checking if standby 2 has caught up. There is also this dump > function that is helpful. The -i switch in cp command has been removed > as well. > I'm sorry I didn't review the code, but honestly my perl is so rusty I'm afraid I'll embarrass myself :) Thanks! Amir
Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Amir Rohan
Date:
On 10/03/2015 03:50 PM, Amir Rohan wrote: > On 10/03/2015 02:38 PM, Michael Paquier wrote: >> On Fri, Oct 2, 2015 at 11:10 PM, Amir Rohan wrote: >>> On 10/02/2015 03:33 PM, Michael Paquier wrote: >>> >>> Granted, you have to try fairly hard to shoot yourself in the leg, >>> but since the solution is so simple, why not? If we never reuse ports >>> within a single test, this goes away. >> >> Well, you can reuse the same port number in a test. Simply teardown >> the existing node and then recreate a new one. I think that port >> number assignment to a node should be transparent to the caller, in >> our case the perl test script holding a scenario. >> > > What part of "Never assign the same port twice during one test" > makes this "not transparent to the user"? > > If you're thinking about parallel tests, I don't think you > need to worry. Availability checks take care of one part, Except now that I think of it, that's definitely a race: Thread1: is_available(5432) -> True Thread2: is_available(5432) -> True Thread1: listen(5432) -> True Thread2: listen(5432) -> #$@#$&@#$^&$#@& I don't know if parallel tests are actually supported, though. If theye are, you're right that this is a shared global resource wrt concurrency. Amir
Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Sat, Oct 3, 2015 at 9:50 PM, Amir Rohan <amir.rohan@zoho.com> wrote: > On 10/03/2015 02:38 PM, Michael Paquier wrote: >> On Fri, Oct 2, 2015 at 11:10 PM, Amir Rohan wrote: >>> On 10/02/2015 03:33 PM, Michael Paquier wrote: >>>>> 4) Port assignment relies on liveness checks on running servers. >>>>> If a server is shut down and a new instantiated, the port will get >>>>> reused, data will get trashed, and various confusing things can happen. >>>> >>>> Right. The safest way to do that is to check in get_free_port if a >>>> port number is used by a registered node, and continue to loop in if >>>> that's the case. So done. >>>> >>> >>> That eliminates the "sweet and gentle" variant of the scenario, but it's >>> susceptible to the "ABA problem": >>> https://en.wikipedia.org/wiki/ABA_problem >>> https://youtu.be/CmxkPChOcvw?t=786 >> >> I learnt a new thing here. That's basically an existing problem even >> with the existing perl test infrastructure relying on TestLib.pm when >> tests are run in parallel. What we would need here is a global mapping >> file storing all the port numbers used by all the nodes currently in >> the tests. >> > > Yeah, a poorman's way to ensure ports aren't reused (I wasn't very > clear at top of post ) is something like: > > global nPortsAssigned = 0; > > AssignPort(): > basePort = BASE_PORT; # the lowest port we use > while(!available(basePort+nPortsAssigned)): > basePort++ > > nPortsAssigned++ > > return basePort; > > It has its glaring faults, but would probably work ok. > In any case, I'm sure you can do better. Yeah, this would improve the exiting port lookup. I don't mind adding a global variable in get_free_port for this purpose. This would accelerate finding a free port in may cases for sure. >>> >>> Granted, you have to try fairly hard to shoot yourself in the leg, >>> but since the solution is so simple, why not? If we never reuse ports >>> within a single test, this goes away. >> >> Well, you can reuse the same port number in a test. Simply teardown >> the existing node and then recreate a new one. I think that port >> number assignment to a node should be transparent to the caller, in >> our case the perl test script holding a scenario. >> > > I was using you *never* want to reuse port numbers. That is, as long > as the lib ensures we never reuse ports within one test, all kinds > of corner cases are eliminated. Hm, sure. Though I don't really why that would be mandatory to enforce this condition as long as the list of ports occupied is in a single place (as long as tests are not run in parallel...). >>>>> 5) Servers are shutdown with -m 'immediate', which can lead to races >>>>> in the script when archiving is turned on. That may be good for some >>>>> tests, but there's no control over it. >>>> >>>> I hesitated with fast here actually. So changed this way. We would >>>> want as wall a teardown command to stop the node with immediate and >>>> unregister the node from the active list. >>>> >>> >>> In particular, I was shutting down an archiving node and the archiving >>> was truncated. I *think* smart doesn't do this. But again, it's really >>> that the test writer can't easily override, not that the default is wrong. >> >> Ah, OK. Then fast is just fine. It shuts down the node correctly. >> "smart" would wait for all the current connections to finish but I am >> wondering if it currently matters here: I don't see yet a clear use >> case yet where it would make sense to have multi-threaded script... If >> somebody comes back with a clear idea here perhaps we could revisit >> that but it does not seem worth it now. >> > > My mistake. Perhaps what would be useful though is a way > to force "messy" shutdown. a kill -9, basically. It's a recovery > test suite, right?. That's what the teardown is aimed at having, the immediate stop mode would play that fairly good enough. There has been a patch from Tom Lane around to stop a server should its postmaster.pid be missing as well... >>>>> Other issues: >>>>> 6. Directory structure, used one directory per thing but more logical >>>>> to place all things related to an instance under a single directory, >>>>> and name them according to role (57333_backup, and so on). >>>> >>>> Er, well. The first version of the patch did so, and then I switched >>>> to an approach closer to what the existing TAP facility is doing. But >>>> well let's simplify things a bit. >>>> >>> >>> I know, I know, but: >>> 1) an instance is a "thing" in your script, so having its associated >>> paraphernalia in one place makes more sense (maybe only to me). >>> 2) That's often how folks (well, how I) arrange things in deployment, >>> at least with archive/backup as symlinks to the nas. >>> >>> Alternatively, naming the dirs with a prefix (srv_foo_HASH, >>> backup_foo_backupname_HASH, etc') would work as well. >> >> The useful portion about tempdir is that it cleans up itself >> automatically should an error happen. It does not seem to me we want >> use that. >> > > Ensuring cleanup and directory structure aren't inherently related. > Testlib makes cleanup easy if you're willing to accept its flat > structure. But writing something that does cleanup and lets yo > control directory structure is perfectly doable. > > The question is only if you agree or not that having per-server > directories could be convenient. Tying into the next, if you > don't think anyone ever need to look into these directories > (which I disagree with), then dir structure indeed doesn't matter. So your point is having one temp dir for the whole, right? I don't disagree with that. >> I would expect hackers to run those runs >> until the end. > > I agree -- when you're running them , but what about when you're > /writing/ them? Well, I enforce CLEANUP=0 manually in TestLib.pm for now. >>>>> 11. a canned "server is responding to queries" helper would be convenient. >>>> >>>> Do you mean a wrapper on pg_isready? Do you have use cases in mind for it? >>>> >>> >>> Block until recovery is finished, before testing. eliminate races, and >>> avoid the stupid sleep(3) I used. >> >> TODO Well. I just recalled this item in the list of things you mentioned. I marked it but forgot to address it. It sounds right that we may want something using pg_isready in a loop as a node in recovery would reject connections. >> >>>>> 4b) server shutdown should perhaps be "smart" by default, or segmented >>>>> into calmly_bring_to_a_close(), pull_electric_plug() and >>>>> drop_down_the_stairs_into_swimming_pool(). >>>> >>>> Nope, not agreeing here. "immediate" is rather violent to stop a node, >>>> hence I have switched it to use "fast" and there is now a >>>> teardown_node routine that uses immediate, that's more aimed at >>>> cleanup up existing things fiercely. >>>> >>> >>> Ok, not as the default, but possible to request a specific kind of >>> shutdown. I needed smart in my case. Plus, in a scenario, you might >>> expressly be testing behavior for a specific mode, it needs to be >>> controllable. >> >> If your test script is running with a single thread, "fast" or "smart" >> would not really make a difference, no? > > It would If there's a bug in one of them and I'm trying to write > a regression test for it. Recall, this was part of broader view > of "provide defaults, allow override" I was suggesting. We could then extend stop_node with an optional argument containing a mode, with fast being the default. Sounds right? >>>> I have as well moved RecoveryTest.pm to src/test/perl so as all the >>>> consumers of prove_check can use it by default, and decoupled >>>> start_node from make_master and make_*_standby so as it is possible to >>>> add for example new parameters to their postgresql.conf and >>>> recovery.conf files before starting them. >>>> >>>> Thanks a lot for the feedback! Attached is an updated patch with all >>>> the things mentioned above done. Are included as well the typo fixes >>>> you sent upthread. >>>> Regards, >>>> >>> >>> Great! I'll look at it and post again if there's more. If any of the >>> above extra explanations make it clearer why I suggested some changes >>> you didn't like... >> >> I am attaching a new version. I found a small bug in test case 001 >> when checking if standby 2 has caught up. There is also this dump >> function that is helpful. The -i switch in cp command has been removed >> as well. >> > > I'm sorry I didn't review the code, but honestly my perl is so rusty I'm > afraid I'll embarrass myself :) I don't pretend mine are good :) So we are two. -- Michael
Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Sat, Oct 3, 2015 at 10:47 PM, Michael Paquier wrote: > On Sat, Oct 3, 2015 at 9:50 PM, Amir Rohan wrote: >>>> Block until recovery is finished, before testing. eliminate races, and >>>> avoid the stupid sleep(3) I used. >>> >>> TODO > > Well. I just recalled this item in the list of things you mentioned. I > marked it but forgot to address it. It sounds right that we may want > something using pg_isready in a loop as a node in recovery would > reject connections. I just hacked up an updated version with the following things: - Optional argument for stop_node to define the stop mode of pg_ctl - Addition of wait_for_node where pg_isready is used to wait until a node is ready to accept queries - Addition of a local lookup variable to track the last port assigned. This accelerates get_free_port. Regards, -- Michael
Attachment
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Robert Haas
Date:
On Sat, Oct 3, 2015 at 7:38 AM, Michael Paquier <michael.paquier@gmail.com> wrote: >> Granted, you have to try fairly hard to shoot yourself in the leg, >> but since the solution is so simple, why not? If we never reuse ports >> within a single test, this goes away. > > Well, you can reuse the same port number in a test. Simply teardown > the existing node and then recreate a new one. I think that port > number assignment to a node should be transparent to the caller, in > our case the perl test script holding a scenario. It seems that these days 'make check' creates a directory in /tmp called /tmp/pg_regress-RANDOMSTUFF. Listening on TCP ports is disabled, and the socket goes inside this directory with a name like .s.PGSQL.PORT. You can connect with psql -h /tmp/pg_regress-RANDOMSTUFF -p PORT, but not over TCP. This basically removes the risk of TCP port number collisions, as well as the risk of your temporary instance being hijacked by a malicious user on the same machine. I'm not sure what we do on Windows, though. >> In particular, I was shutting down an archiving node and the archiving >> was truncated. I *think* smart doesn't do this. But again, it's really >> that the test writer can't easily override, not that the default is wrong. > > Ah, OK. Then fast is just fine. It shuts down the node correctly. > "smart" would wait for all the current connections to finish but I am > wondering if it currently matters here: I don't see yet a clear use > case yet where it would make sense to have multi-threaded script... If > somebody comes back with a clear idea here perhaps we could revisit > that but it does not seem worth it now. I don't have anything brilliant to say about this point, but here's a perhaps-not-brilliant comment: If there's a bug in one of smart and fast shutdown and the other works great, it would be nice to catch that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Wed, Oct 7, 2015 at 5:58 AM, Robert Haas wrote: > On Sat, Oct 3, 2015 at 7:38 AM, Michael Paquier wrote: > It seems that these days 'make check' creates a directory in /tmp > called /tmp/pg_regress-RANDOMSTUFF. Listening on TCP ports is > disabled, and the socket goes inside this directory with a name like > .s.PGSQL.PORT. You can connect with psql -h > /tmp/pg_regress-RANDOMSTUFF -p PORT, but not over TCP. This basically > removes the risk of TCP port number collisions, as well as the risk of > your temporary instance being hijacked by a malicious user on the same > machine. Right, that's for example /var/folders/ on OSX, and this is defined once per test run via $tempdir_short. PGHOST is set to that as well. > I'm not sure what we do on Windows, though. sspi with include_realm through 127.0.0.1. >>> In particular, I was shutting down an archiving node and the archiving >>> was truncated. I *think* smart doesn't do this. But again, it's really >>> that the test writer can't easily override, not that the default is wrong. >> >> Ah, OK. Then fast is just fine. It shuts down the node correctly. >> "smart" would wait for all the current connections to finish but I am >> wondering if it currently matters here: I don't see yet a clear use >> case yet where it would make sense to have multi-threaded script... If >> somebody comes back with a clear idea here perhaps we could revisit >> that but it does not seem worth it now. > > I don't have anything brilliant to say about this point, but here's a > perhaps-not-brilliant comment: > > If there's a bug in one of smart and fast shutdown and the other works > great, it would be nice to catch that. Yes, sure. I extended the patch to support other stop modes than fast, the default being kept to fast if none is defined. -- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Wed, Oct 7, 2015 at 7:43 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Oct 7, 2015 at 5:58 AM, Robert Haas wrote: >> On Sat, Oct 3, 2015 at 7:38 AM, Michael Paquier wrote: >> It seems that these days 'make check' creates a directory in /tmp >> called /tmp/pg_regress-RANDOMSTUFF. Listening on TCP ports is >> disabled, and the socket goes inside this directory with a name like >> .s.PGSQL.PORT. You can connect with psql -h >> /tmp/pg_regress-RANDOMSTUFF -p PORT, but not over TCP. This basically >> removes the risk of TCP port number collisions, as well as the risk of >> your temporary instance being hijacked by a malicious user on the same >> machine. > > Right, that's for example /var/folders/ on OSX, and this is defined > once per test run via $tempdir_short. PGHOST is set to that as well. Er, mistake here. That's actually once per standard_initdb, except that all the tests I have included in my patch run it just once to create a master node. It seems that it would be wiser to set one socket dir per node then, remove the port assignment stuff, and use tempdir_short as a key to define a node as well as in the connection string to this node. I'll update the patch later today... -- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Wed, Oct 7, 2015 at 7:51 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Oct 7, 2015 at 7:43 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Wed, Oct 7, 2015 at 5:58 AM, Robert Haas wrote: >>> On Sat, Oct 3, 2015 at 7:38 AM, Michael Paquier wrote: >>> It seems that these days 'make check' creates a directory in /tmp >>> called /tmp/pg_regress-RANDOMSTUFF. Listening on TCP ports is >>> disabled, and the socket goes inside this directory with a name like >>> .s.PGSQL.PORT. You can connect with psql -h >>> /tmp/pg_regress-RANDOMSTUFF -p PORT, but not over TCP. This basically >>> removes the risk of TCP port number collisions, as well as the risk of >>> your temporary instance being hijacked by a malicious user on the same >>> machine. >> >> Right, that's for example /var/folders/ on OSX, and this is defined >> once per test run via $tempdir_short. PGHOST is set to that as well. > > Er, mistake here. That's actually once per standard_initdb, except > that all the tests I have included in my patch run it just once to > create a master node. It seems that it would be wiser to set one > socket dir per node then, remove the port assignment stuff, and use > tempdir_short as a key to define a node as well as in the connection > string to this node. I'll update the patch later today... So, my conclusion regarding multiple calls of make_master is that we should not allow to do it. On Unix/Linux we could have a separate unix socket directory for each node, but not on Windows where listen_addresses is set to look after 127.0.0.1. On Unix/Linux, PGHOST is set by the master node to a tempdir once and for all. Hence, to make the code more consistent, I think that we should keep the port lookup machinery here. An updated patch is attached. -- Michael
Attachment
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Amir Rohan
Date:
On 10/07/2015 09:27 AM, Michael Paquier wrote: > On Wed, Oct 7, 2015 at 7:51 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Wed, Oct 7, 2015 at 7:43 AM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> On Wed, Oct 7, 2015 at 5:58 AM, Robert Haas wrote: >>>> On Sat, Oct 3, 2015 at 7:38 AM, Michael Paquier wrote: >>>> It seems that these days 'make check' creates a directory in /tmp >>>> called /tmp/pg_regress-RANDOMSTUFF. Listening on TCP ports is >>>> disabled, and the socket goes inside this directory with a name like >>>> .s.PGSQL.PORT. You can connect with psql -h >>>> /tmp/pg_regress-RANDOMSTUFF -p PORT, but not over TCP. This basically >>>> removes the risk of TCP port number collisions, as well as the risk of >>>> your temporary instance being hijacked by a malicious user on the same >>>> machine. >>> >>> Right, that's for example /var/folders/ on OSX, and this is defined >>> once per test run via $tempdir_short. PGHOST is set to that as well. >> >> Er, mistake here. That's actually once per standard_initdb, except >> that all the tests I have included in my patch run it just once to >> create a master node. It seems that it would be wiser to set one >> socket dir per node then, remove the port assignment stuff, and use >> tempdir_short as a key to define a node as well as in the connection >> string to this node. I'll update the patch later today... > > So, my conclusion regarding multiple calls of make_master is that we > should not allow to do it. On Unix/Linux we could have a separate unix > socket directory for each node, but not on Windows where > listen_addresses is set to look after 127.0.0.1. On Unix/Linux, PGHOST > is set by the master node to a tempdir once and for all. Hence, to > make the code more consistent, I think that we should keep the port > lookup machinery here. An updated patch is attached. > If parallel tests are supported, get_free_port is still racy even with last_port_found because it's: 1) process-local. 2) even if it were shared, there's the race window between the available-check and the listen() I mentioned upthread. If parallel tests are explicitly disallowed, a comment to that effect (and a note on things known to break) might help someone down the road. Also, the removal of poll_query_until from pg_rewind looks suspiciously like a copy-paste gone bad. Pardon if I'm missing something. Amir
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Wed, Oct 7, 2015 at 4:16 PM, Amir Rohan <amir.rohan@zoho.com> wrote: > On 10/07/2015 09:27 AM, Michael Paquier wrote: >> On Wed, Oct 7, 2015 at 7:51 AM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> On Wed, Oct 7, 2015 at 7:43 AM, Michael Paquier >>> <michael.paquier@gmail.com> wrote: >>>> On Wed, Oct 7, 2015 at 5:58 AM, Robert Haas wrote: >>>>> On Sat, Oct 3, 2015 at 7:38 AM, Michael Paquier wrote: >>>>> It seems that these days 'make check' creates a directory in /tmp >>>>> called /tmp/pg_regress-RANDOMSTUFF. Listening on TCP ports is >>>>> disabled, and the socket goes inside this directory with a name like >>>>> .s.PGSQL.PORT. You can connect with psql -h >>>>> /tmp/pg_regress-RANDOMSTUFF -p PORT, but not over TCP. This basically >>>>> removes the risk of TCP port number collisions, as well as the risk of >>>>> your temporary instance being hijacked by a malicious user on the same >>>>> machine. >>>> >>>> Right, that's for example /var/folders/ on OSX, and this is defined >>>> once per test run via $tempdir_short. PGHOST is set to that as well. >>> >>> Er, mistake here. That's actually once per standard_initdb, except >>> that all the tests I have included in my patch run it just once to >>> create a master node. It seems that it would be wiser to set one >>> socket dir per node then, remove the port assignment stuff, and use >>> tempdir_short as a key to define a node as well as in the connection >>> string to this node. I'll update the patch later today... >> >> So, my conclusion regarding multiple calls of make_master is that we >> should not allow to do it. On Unix/Linux we could have a separate unix >> socket directory for each node, but not on Windows where >> listen_addresses is set to look after 127.0.0.1. On Unix/Linux, PGHOST >> is set by the master node to a tempdir once and for all. Hence, to >> make the code more consistent, I think that we should keep the port >> lookup machinery here. An updated patch is attached. >> > If parallel tests are supported, get_free_port is still racy even > with last_port_found because it's: > 1) process-local. > 2) even if it were shared, there's the race window between the > available-check and the listen() I mentioned upthread. > > If parallel tests are explicitly disallowed, a comment to that > effect (and a note on things known to break) might help someone > down the road. Actually, no, port lookup will not map and parallel tests would work fine thinking more about it, each set of tests uses its own PGHOST to a private unix socket directory so even if multiple tests use the same port number they won't interact with each other because they connect to different socket paths. MinGW is a problem though, and an existing one in the perl test scripts, I recall that it can use make -j and that's on Windows where PGHOST is mapping to 127.0.0.1 only. > Also, the removal of poll_query_until from pg_rewind looks suspiciously > like a copy-paste gone bad. Pardon if I'm missing something. Perhaps. Do you have a suggestion regarding that? It seems to me that this is more useful in TestLib.pm as-is. -- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Amir Rohan
Date:
On 10/07/2015 10:29 AM, Michael Paquier wrote: > On Wed, Oct 7, 2015 at 4:16 PM, Amir Rohan <amir.rohan@zoho.com> wrote: >> On 10/07/2015 09:27 AM, Michael Paquier wrote: >>> On Wed, Oct 7, 2015 at 7:51 AM, Michael Paquier >>> <michael.paquier@gmail.com> wrote: >>>> On Wed, Oct 7, 2015 at 7:43 AM, Michael Paquier >>>> <michael.paquier@gmail.com> wrote: >>>>> On Wed, Oct 7, 2015 at 5:58 AM, Robert Haas wrote: >>>>>> On Sat, Oct 3, 2015 at 7:38 AM, Michael Paquier wrote: >>>>>> It seems that these days 'make check' creates a directory in /tmp >>>>>> called /tmp/pg_regress-RANDOMSTUFF. Listening on TCP ports is >>>>>> disabled, and the socket goes inside this directory with a name like >>>>>> .s.PGSQL.PORT. You can connect with psql -h >>>>>> /tmp/pg_regress-RANDOMSTUFF -p PORT, but not over TCP. This basically >>>>>> removes the risk of TCP port number collisions, as well as the risk of >>>>>> your temporary instance being hijacked by a malicious user on the same >>>>>> machine. >>>>> >>>>> Right, that's for example /var/folders/ on OSX, and this is defined >>>>> once per test run via $tempdir_short. PGHOST is set to that as well. >>>> >>>> Er, mistake here. That's actually once per standard_initdb, except >>>> that all the tests I have included in my patch run it just once to >>>> create a master node. It seems that it would be wiser to set one >>>> socket dir per node then, remove the port assignment stuff, and use >>>> tempdir_short as a key to define a node as well as in the connection >>>> string to this node. I'll update the patch later today... >>> >>> So, my conclusion regarding multiple calls of make_master is that we >>> should not allow to do it. On Unix/Linux we could have a separate unix >>> socket directory for each node, but not on Windows where >>> listen_addresses is set to look after 127.0.0.1. On Unix/Linux, PGHOST >>> is set by the master node to a tempdir once and for all. Hence, to >>> make the code more consistent, I think that we should keep the port >>> lookup machinery here. An updated patch is attached. >>> >> If parallel tests are supported, get_free_port is still racy even >> with last_port_found because it's: >> 1) process-local. >> 2) even if it were shared, there's the race window between the >> available-check and the listen() I mentioned upthread. >> >> If parallel tests are explicitly disallowed, a comment to that >> effect (and a note on things known to break) might help someone >> down the road. > > Actually, no, port lookup will not map and parallel tests would work > fine thinking more about it, each set of tests uses its own PGHOST to > a private unix socket directory so even if multiple tests use the same > port number they won't interact with each other because they connect > to different socket paths. MinGW is a problem though, and an existing > one in the perl test scripts, I recall that it can use make -j and > that's on Windows where PGHOST is mapping to 127.0.0.1 only. > ah, the portnum is actually a real tcp port only on windows, and the race is limited to that case as you say. Note that in the tcp case, using psql to check is wrong: $ nc -l 8001 # listen on 8001 $ psql -X -h lo -p 8001 postgres < /dev/null psql: could not connect to server: Connection refusedIs the server running on host "lo" (127.0.0.1) and acceptingTCP/IP connections on port 8001? The port isn't free, but psql is really only checking if pg is there and reports that the port is available. That's a fairly mild issue, though. >> Also, the removal of poll_query_until from pg_rewind looks suspiciously >> like a copy-paste gone bad. Pardon if I'm missing something. > > Perhaps. Do you have a suggestion regarding that? It seems to me that > this is more useful in TestLib.pm as-is. > My mistake, the patch only shows some internal function being deleted but RewindTest.pm (obviously) imports TestLib. You're right, TestLib is a better place for it.
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Wed, Oct 7, 2015 at 5:44 PM, Amir Rohan wrote: > On 10/07/2015 10:29 AM, Michael Paquier wrote: >> On Wed, Oct 7, 2015 at 4:16 PM, Amir Rohan wrote: >>> Also, the removal of poll_query_until from pg_rewind looks suspiciously >>> like a copy-paste gone bad. Pardon if I'm missing something. >> >> Perhaps. Do you have a suggestion regarding that? It seems to me that >> this is more useful in TestLib.pm as-is. >> > > My mistake, the patch only shows some internal function being deleted > but RewindTest.pm (obviously) imports TestLib. You're right, TestLib is > a better place for it. OK. Here is a new patch version. I have removed the restriction preventing to call make_master multiple times in the same script (one may actually want to test some stuff related to logical decoding or FDW for example, who knows...), forcing PGHOST to always use the same value after it has been initialized. I have added a sanity check though, it is not possible to create a node based on a base backup if no master are defined. This looks like a cheap insurance... I also refactored a bit the code, using the new init_node_info to fill in the fields of a newly-initialized node, and I removed get_free_port, init_node, init_node_from_backup, enable_restoring and enable_streaming from the list of routines exposed to the users, those can be used directly with make_master, make_warm_standby and make_hot_standby. We could add them again if need be, somebody may want to be able to get a free port, set up a node without those generic routines, just that it does not seem necessary now. Regards, -- Michael
Attachment
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Amir Rohan
Date:
On 10/08/2015 08:19 AM, Michael Paquier wrote: > On Wed, Oct 7, 2015 at 5:44 PM, Amir Rohan wrote: >> On 10/07/2015 10:29 AM, Michael Paquier wrote: >>> On Wed, Oct 7, 2015 at 4:16 PM, Amir Rohan wrote: >>>> Also, the removal of poll_query_until from pg_rewind looks suspiciously >>>> like a copy-paste gone bad. Pardon if I'm missing something. >>> >>> Perhaps. Do you have a suggestion regarding that? It seems to me that >>> this is more useful in TestLib.pm as-is. >>> >> >> My mistake, the patch only shows some internal function being deleted >> but RewindTest.pm (obviously) imports TestLib. You're right, TestLib is >> a better place for it. > > OK. Here is a new patch version. I have removed the restriction > preventing to call make_master multiple times in the same script (one > may actually want to test some stuff related to logical decoding or > FDW for example, who knows...), forcing PGHOST to always use the same > value after it has been initialized. I have added a sanity check > though, it is not possible to create a node based on a base backup if > no master are defined. This looks like a cheap insurance... I also > refactored a bit the code, using the new init_node_info to fill in the > fields of a newly-initialized node, and I removed get_free_port, > init_node, init_node_from_backup, enable_restoring and > enable_streaming from the list of routines exposed to the users, those > can be used directly with make_master, make_warm_standby and > make_hot_standby. We could add them again if need be, somebody may > want to be able to get a free port, set up a node without those > generic routines, just that it does not seem necessary now. > Regards, > If you'd like, I can write up some tests for cascading replication which are currently missing. Someone mentioned a daisy chain setup which sounds fun. Anything else in particular? Also, it would be nice to have some canned way to measure end-to-end replication latency for variable number of nodes. What about going back through the commit log and writing some regression tests for the real stinkers, if someone care to volunteer some candidate bugs Amir
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Thu, Oct 8, 2015 at 3:59 PM, Amir Rohan <amir.rohan@zoho.com> wrote: > On 10/08/2015 08:19 AM, Michael Paquier wrote: >> On Wed, Oct 7, 2015 at 5:44 PM, Amir Rohan wrote: >>> On 10/07/2015 10:29 AM, Michael Paquier wrote: >>>> On Wed, Oct 7, 2015 at 4:16 PM, Amir Rohan wrote: >>>>> Also, the removal of poll_query_until from pg_rewind looks suspiciously >>>>> like a copy-paste gone bad. Pardon if I'm missing something. >>>> >>>> Perhaps. Do you have a suggestion regarding that? It seems to me that >>>> this is more useful in TestLib.pm as-is. >>>> >>> >>> My mistake, the patch only shows some internal function being deleted >>> but RewindTest.pm (obviously) imports TestLib. You're right, TestLib is >>> a better place for it. >> >> OK. Here is a new patch version. I have removed the restriction >> preventing to call make_master multiple times in the same script (one >> may actually want to test some stuff related to logical decoding or >> FDW for example, who knows...), forcing PGHOST to always use the same >> value after it has been initialized. I have added a sanity check >> though, it is not possible to create a node based on a base backup if >> no master are defined. This looks like a cheap insurance... I also >> refactored a bit the code, using the new init_node_info to fill in the >> fields of a newly-initialized node, and I removed get_free_port, >> init_node, init_node_from_backup, enable_restoring and >> enable_streaming from the list of routines exposed to the users, those >> can be used directly with make_master, make_warm_standby and >> make_hot_standby. We could add them again if need be, somebody may >> want to be able to get a free port, set up a node without those >> generic routines, just that it does not seem necessary now. >> Regards, >> > > If you'd like, I can write up some tests for cascading replication which > are currently missing. 001 is testing cascading, like that node1 -> node2 -> node3. > Someone mentioned a daisy chain setup which sounds fun. Anything else in > particular? Also, it would be nice to have some canned way to measure > end-to-end replication latency for variable number of nodes. Hm. Do you mean comparing the LSN position between two nodes even if both nodes are not connected to each other? What would you use it for? > What about going back through the commit log and writing some regression > tests for the real stinkers, if someone care to volunteer some candidate > bugs I have drafted a list with a couple of items upthread: http://www.postgresql.org/message-id/CAB7nPqSgffSPhOcrhFoAsDAnipvn6WsH2nYkf1KayRm+9_MTGw@mail.gmail.com So based on the existing patch the list becomes as follows: - wal_retrieve_retry_interval with a high value, say setting to for example 2/3s and loop until it is applied by checking it is it has been received by the standby every second. - recovery_target_action - archive_cleanup_command - recovery_end_command - pg_xlog_replay_pause and pg_xlog_replay_resume In the list of things that could have a test, I recall that we should test as well 2PC with the recovery delay, look at a1105c3d. This could be included in 005. The advantage of implementing that now is that we could see if the existing routines are solid enough or not. Still, looking at what the patch has now I think that we had better get a committer look at it, and if the core portion gets integrated we could already use it for the patch implementing quorum synchronous replication and in doing more advanced tests with pg_rewind regarding the timeline handling (both patches of this CF). I don't mind adding more now, though I think that the set of sample tests included in this version is enough as a base implementation of the facility and shows what it can do. Regards, -- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Amir Rohan
Date:
On 10/08/2015 10:39 AM, Michael Paquier wrote: > On Thu, Oct 8, 2015 at 3:59 PM, Amir Rohan <amir.rohan@zoho.com> wrote: >> On 10/08/2015 08:19 AM, Michael Paquier wrote: >>> On Wed, Oct 7, 2015 at 5:44 PM, Amir Rohan wrote: >>>> On 10/07/2015 10:29 AM, Michael Paquier wrote: >>>>> On Wed, Oct 7, 2015 at 4:16 PM, Amir Rohan wrote: >>>>>> Also, the removal of poll_query_until from pg_rewind looks suspiciously >>>>>> like a copy-paste gone bad. Pardon if I'm missing something. >>>>> >>>>> Perhaps. Do you have a suggestion regarding that? It seems to me that >>>>> this is more useful in TestLib.pm as-is. >>>>> >>>> >>>> My mistake, the patch only shows some internal function being deleted >>>> but RewindTest.pm (obviously) imports TestLib. You're right, TestLib is >>>> a better place for it. >>> >>> OK. Here is a new patch version. I have removed the restriction >>> preventing to call make_master multiple times in the same script (one >>> may actually want to test some stuff related to logical decoding or >>> FDW for example, who knows...), forcing PGHOST to always use the same >>> value after it has been initialized. I have added a sanity check >>> though, it is not possible to create a node based on a base backup if >>> no master are defined. This looks like a cheap insurance... I also >>> refactored a bit the code, using the new init_node_info to fill in the >>> fields of a newly-initialized node, and I removed get_free_port, >>> init_node, init_node_from_backup, enable_restoring and >>> enable_streaming from the list of routines exposed to the users, those >>> can be used directly with make_master, make_warm_standby and >>> make_hot_standby. We could add them again if need be, somebody may >>> want to be able to get a free port, set up a node without those >>> generic routines, just that it does not seem necessary now. >>> Regards, >>> >> >> If you'd like, I can write up some tests for cascading replication which >> are currently missing. > > 001 is testing cascading, like that node1 -> node2 -> node3. > >> Someone mentioned a daisy chain setup which sounds fun. Anything else in >> particular? Also, it would be nice to have some canned way to measure >> end-to-end replication latency for variable number of nodes. > > Hm. Do you mean comparing the LSN position between two nodes even if > both nodes are not connected to each other? What would you use it for? > In a cascading replication setup, the typical _time_ it takes for a COMMIT on master to reach the slave (assuming constant WAL generation rate) is an important operational metric. It would be useful to catch future regressions for that metric, which may happen even when a patch doesn't outright break cascading replication. Just automating the measurement could be useful if there's no pg facility that tracks performance over time in a regimented fashion. I've seen multiple projects which consider a "benchmark suite" to be part of its testing strategy. As for the "daisy chain" thing, it was (IIRC) mentioned in a josh berkus talk I caught on youtube. It's possible to setup cascading replication, take down the master, and then reinsert it as replicating slave, so that you end up with *all* servers replicating from the ancestor in the chain, and no master. I think it was more a fun hack then anything, but also an interesting corner case to investigate. >> What about going back through the commit log and writing some regression >> tests for the real stinkers, if someone care to volunteer some candidate >> bugs > > I have drafted a list with a couple of items upthread: > http://www.postgresql.org/message-id/CAB7nPqSgffSPhOcrhFoAsDAnipvn6WsH2nYkf1KayRm+9_MTGw@mail.gmail.com > So based on the existing patch the list becomes as follows: > - wal_retrieve_retry_interval with a high value, say setting to for > example 2/3s and loop until it is applied by checking it is it has > been received by the standby every second. > - recovery_target_action > - archive_cleanup_command > - recovery_end_command > - pg_xlog_replay_pause and pg_xlog_replay_resume > In the list of things that could have a test, I recall that we should > test as well 2PC with the recovery delay, look at a1105c3d. This could > be included in 005. a1105c3 Mar 23 Fix copy & paste error in 4f1b890b137. Andres Freund 4f1b890 Mar 15 Merge the various forms of transaction commit & abort records. Andres Freund Is that the right commit? > The advantage of implementing that now is that we could see if the > existing routines are solid enough or not. I can do this if you point me at a self-contained thread/#issue. > Still, looking at what the > patch has now I think that we had better get a committer look at it, > and if the core portion gets integrated we could already use it for > the patch implementing quorum synchronous replication and in doing > more advanced tests with pg_rewind regarding the timeline handling > (both patches of this CF). > > I don't mind adding more now, though I > think that the set of sample tests included in this version is enough > as a base implementation of the facility and shows what it can do. Sure.
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Thu, Oct 8, 2015 at 6:03 PM, Amir Rohan wrote: > On 10/08/2015 10:39 AM, Michael Paquier wrote: >>> Someone mentioned a daisy chain setup which sounds fun. Anything else in >>> particular? Also, it would be nice to have some canned way to measure >>> end-to-end replication latency for variable number of nodes. >> >> Hm. Do you mean comparing the LSN position between two nodes even if >> both nodes are not connected to each other? What would you use it for? >> > > In a cascading replication setup, the typical _time_ it takes for a > COMMIT on master to reach the slave (assuming constant WAL generation > rate) is an important operational metric. Hm. You mean the exact amount of time it gets to be sure that a given WAL position has been flushed on a cascading standby, be it across multiple layers. Er, that's a bit tough without patching the backend where I guess we would need to keep a track of when a LSN position has been flushed. And calls of gettimeofday are expensive, so that does not sound like a plausible alternative here to me... > It would be useful to catch future regressions for that metric, > which may happen even when a patch doesn't outright break cascading > replication. Just automating the measurement could be useful if > there's no pg facility that tracks performance over time in > a regimented fashion. I've seen multiple projects which consider > a "benchmark suite" to be part of its testing strategy. Ah, OK. I see. That's a bit out of scope of this patch, and that's really OS-dependent, but as long as the comparisons can be done on the same OS it would make sense. > As for the "daisy chain" thing, it was (IIRC) mentioned in a josh berkus > talk I caught on youtube. It's possible to setup cascading replication, > take down the master, and then reinsert it as replicating slave, so that > you end up with *all* servers replicating from the > ancestor in the chain, and no master. I think it was more > a fun hack then anything, but also an interesting corner case to > investigate. Ah, yes. I recall this one. I am sure it made the audience smile. All the nodes link to each other in closed circle. >>> What about going back through the commit log and writing some regression >>> tests for the real stinkers, if someone care to volunteer some candidate >>> bugs >> >> I have drafted a list with a couple of items upthread: >> http://www.postgresql.org/message-id/CAB7nPqSgffSPhOcrhFoAsDAnipvn6WsH2nYkf1KayRm+9_MTGw@mail.gmail.com >> So based on the existing patch the list becomes as follows: >> - wal_retrieve_retry_interval with a high value, say setting to for >> example 2/3s and loop until it is applied by checking it is it has >> been received by the standby every second. >> - recovery_target_action >> - archive_cleanup_command >> - recovery_end_command >> - pg_xlog_replay_pause and pg_xlog_replay_resume >> In the list of things that could have a test, I recall that we should >> test as well 2PC with the recovery delay, look at a1105c3d. This could >> be included in 005. > > a1105c3 Mar 23 Fix copy & paste error in 4f1b890b137. Andres Freund > 4f1b890 Mar 15 Merge the various forms of transaction commit & abort > records. Andres Freund > > Is that the right commit? That's this one. a1105c3 was actually rather tricky... The idea is to simply check the WAL replay delay with COMMIT PREPARED. >> The advantage of implementing that now is that we could see if the >> existing routines are solid enough or not. > > I can do this if you point me at a self-contained thread/#issue. Hm. This patch is already 900 lines, perhaps it would be wiser not to make it more complicated for now.. -- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Amir Rohan
Date:
On 10/08/2015 04:47 PM, Michael Paquier wrote: > On Thu, Oct 8, 2015 at 6:03 PM, Amir Rohan wrote: >> On 10/08/2015 10:39 AM, Michael Paquier wrote: >>>> Someone mentioned a daisy chain setup which sounds fun. Anything else in >>>> particular? Also, it would be nice to have some canned way to measure >>>> end-to-end replication latency for variable number of nodes. >>> >>> Hm. Do you mean comparing the LSN position between two nodes even if >>> both nodes are not connected to each other? What would you use it for? >>> >> >> In a cascading replication setup, the typical _time_ it takes for a >> COMMIT on master to reach the slave (assuming constant WAL generation >> rate) is an important operational metric. > > Hm. You mean the exact amount of time it gets to be sure that a given > WAL position has been flushed on a cascading standby, be it across > multiple layers. Er, that's a bit tough without patching the backend > where I guess we would need to keep a track of when a LSN position has > been flushed. And calls of gettimeofday are expensive, so that does > not sound like a plausible alternative here to me... > Wouldn't this work? 1) start timer 2) Grab pg_stat_replication.sent_location from master 3) pg_switch_xlog() # I /think/ we want this, could be wrong 4) Poll slave's pg_last_xlog_replay_location() until LSN shows up 5) stop timer Amir
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Thu, Oct 8, 2015 at 11:28 PM, Amir Rohan wrote: > Wouldn't this work? > 1) start timer > 2) Grab pg_stat_replication.sent_location from master > 3) pg_switch_xlog() # I /think/ we want this, could be wrong For a warm standby, you would want that, but this depends on two factors: - The moment master completes archiving of this segment - The moment standby restores it. On slow machines, those two things become by far the bottleneck, imagine a PI restricted on I/O with a low-class SD card in the worst case (I maintain one, with a good card, still the I/O is a bottleneck). > 4) Poll slave's pg_last_xlog_replay_location() until LSN shows up > 5) stop timer That's not really solid, there is an interval of time between the moment the LSN position is taken from the master and the standby. An accurate method is to log/store on master when a given WAL position has been flushed to disk, and do the same on slave at replay for this LSN position. In any case this is doing to flood badly the logs of both nodes, and as the backend cares about the performance of operations in this code path we won't want to do that anyway. To make it short, it seems to me that simply waiting until the LSN a test is waiting for has been replayed is just but fine for this set of tests to ensure their run consistency, let's not forget that this is the goal here. Regards, -- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Fri, Oct 9, 2015 at 8:25 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Thu, Oct 8, 2015 at 11:28 PM, Amir Rohan wrote: >> Wouldn't this work? >> 1) start timer >> 2) Grab pg_stat_replication.sent_location from master >> 3) pg_switch_xlog() # I /think/ we want this, could be wrong > > For a warm standby, you would want that, but this depends on two factors: > - The moment master completes archiving of this segment > - The moment standby restores it. > On slow machines, those two things become by far the bottleneck, > imagine a PI restricted on I/O with a low-class SD card in the worst > case (I maintain one, with a good card, still the I/O is a > bottleneck). > >> 4) Poll slave's pg_last_xlog_replay_location() until LSN shows up >> 5) stop timer > > That's not really solid, there is an interval of time between the > moment the LSN position is taken from the master and the standby. An > accurate method is to log/store on master when a given WAL position > has been flushed to disk, and do the same on slave at replay for this > LSN position. In any case this is doing to flood badly the logs of > both nodes, and as the backend cares about the performance of > operations in this code path we won't want to do that anyway. > > To make it short, it seems to me that simply waiting until the LSN a > test is waiting for has been replayed is just but fine for this set of > tests to ensure their run consistency, let's not forget that this is > the goal here. In terms of features, it seems that this patch has everything it needs to allow one to design tests to work on both Linux and Windows, and it is careful regarding CVE-2014-0067. Thoughts about moving that as "Ready for committer"? -- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Amir Rohan
Date:
On 10/09/2015 02:12 PM, Michael Paquier wrote: > On Fri, Oct 9, 2015 at 8:25 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Thu, Oct 8, 2015 at 11:28 PM, Amir Rohan wrote: >>> Wouldn't this work? >>> 1) start timer >>> 2) Grab pg_stat_replication.sent_location from master >>> 3) pg_switch_xlog() # I /think/ we want this, could be wrong >> >> For a warm standby, you would want that, but this depends on two factors: >> - The moment master completes archiving of this segment >> - The moment standby restores it. >> On slow machines, those two things become by far the bottleneck, >> imagine a PI restricted on I/O with a low-class SD card in the worst >> case (I maintain one, with a good card, still the I/O is a >> bottleneck). >> >>> 4) Poll slave's pg_last_xlog_replay_location() until LSN shows up >>> 5) stop timer >> >> That's not really solid, there is an interval of time between the >> moment the LSN position is taken from the master and the standby. An >> accurate method is to log/store on master when a given WAL position >> has been flushed to disk, and do the same on slave at replay for this >> LSN position. In any case this is doing to flood badly the logs of >> both nodes, and as the backend cares about the performance of >> operations in this code path we won't want to do that anyway. >> >> To make it short, it seems to me that simply waiting until the LSN a >> test is waiting for has been replayed is just but fine for this set of >> tests to ensure their run consistency, let's not forget that this is >> the goal here. > > In terms of features, it seems that this patch has everything it needs > to allow one to design tests to work on both Linux and Windows, and it > is careful regarding CVE-2014-0067. Thoughts about moving that as > "Ready for committer"? > Ok, I've put myself down as reviewer in cfapp. I don't think I can provide any more useful feedback that would actually result in changes at this point, but I'll read through the entire discussion once last time and write down final comments/notes. After that I have no problem marking this for a committer to look at. Amir
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Fri, Oct 9, 2015 at 8:47 PM, Amir Rohan wrote: > Ok, I've put myself down as reviewer in cfapp. I don't think I can > provide any more useful feedback that would actually result in changes > at this point, but I'll read through the entire discussion once last > time and write down final comments/notes. After that I have no problem > marking this for a committer to look at. OK. If you have any comments or remarks, please do not hesitate at all! -- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Fri, Oct 9, 2015 at 8:53 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Fri, Oct 9, 2015 at 8:47 PM, Amir Rohan wrote: >> Ok, I've put myself down as reviewer in cfapp. I don't think I can >> provide any more useful feedback that would actually result in changes >> at this point, but I'll read through the entire discussion once last >> time and write down final comments/notes. After that I have no problem >> marking this for a committer to look at. > > OK. If you have any comments or remarks, please do not hesitate at all! So, to let everybody know the issue, Amir has reported me offlist a bug in one of the tests that can be reproduced more easily on a slow machine: > Amir wrote: > Before posting the summary, I ran the latest v8 patch on today's git > master (9c42727) and got some errors: > t/004_timeline_switch.pl ... > 1..1 > # ERROR: invalid input syntax for type pg_lsn: "" > # LINE 1: SELECT ''::pg_lsn <= pg_last_xlog_replay_location() > # ^ > # No tests run! And here is my reply: This is a timing issue and can happen when standby1, the promoted standby which standby2 reconnects to to check that recovery works with a timeline jump, is still in recovery after being restarted. There is a small windows where this is possible, and this gets easier to reproduce on slow machines (did so on a VM). So the issue was in test 004. I have updated the script to check pg_is_in_recovery() to be sure that the node exits recovery before querying it with pg_current_xlog_location. It is worth noticing that the following change has saved me a lot of pain: --- a/src/test/perl/TestLib.pm +++ b/src/test/perl/TestLib.pm @@ -259,6 +259,7 @@ sub psql my ($stdout, $stderr); print("# Running SQL command: $sql\n"); run [ 'psql', '-X', '-A', '-t', '-q', '-d', $dbname, '-f', '-'], '<', \$sql, '>', \$stdout, '2>', \$stderr or die; + print "# Error output: $stderr\n" if $stderr ne ""; Perhaps we should consider backpatching it, it helped me find out the issue I faced. Attached is an updated patch fixing 004. Regards, -- Michael
Attachment
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Amir Rohan
Date:
On 10/10/2015 02:43 PM, Michael Paquier wrote: > On Fri, Oct 9, 2015 at 8:53 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Fri, Oct 9, 2015 at 8:47 PM, Amir Rohan wrote: >>> Ok, I've put myself down as reviewer in cfapp. I don't think I can >>> provide any more useful feedback that would actually result in changes >>> at this point, but I'll read through the entire discussion once last >>> time and write down final comments/notes. After that I have no problem >>> marking this for a committer to look at. >> >> OK. If you have any comments or remarks, please do not hesitate at all! > > So, to let everybody know the issue, Amir has reported me offlist a > bug in one of the tests that can be reproduced more easily on a slow > machine: > Yeah, I usually stick to the list for discussion, but I ran an earlier version without issues and thought this might be a problem with my system as I've changed things a bit this week. Now that v9 fixes the probkem, here's a summary from going over the entire thread one last time: # Windows and TAP sets Noah (2015-03) mentioned TAP doesn't work on windows, and hoped this would include some work on that. IIUC, the facilities and tests do run on windows, but focus was there and not the preexisting TAP suite. # Test coverage (in the future) Andres wanted a test for xid/multixid wraparound which also raises the question of the tests that will need to be written in the future. The patch focuses on providing facilities, while providing new coverage for several features. There should be a TODO list on the wiki (bug tracker, actually), where the list of tests to be written can be managed. Some were mentioned in the thread (multi/xid wraparound hot_standby_feedback, max_standby_archive_delay and max_standby_streaming_delay? recovery_target_action? some in your original list?), but threads are precisely where these things get lost in the cracks. # Interactive use vs. TAP tests Early on the goal was also to provide something for interactive use in order to test scenarios. The shift has focused to the TAP tests and some of the choices in the API reflect that. Interactive use is possible, but wasn't a central requirement. # Directory structure I suggested keeping backup/log/PGDATA per instance, rejected. # Parallel tests and port collisions Lots about this. Final result is no port races are possible because dedicated dirs are used per test, per instance. And because tcp isn't used for connections on any platform (can you confirm that's true on windows as well? I'm not familiar with sspi and what OSI layer it lives on) # Allow test to specify shutdown mode Added # decouple cleanup from node shutdown Added (in latest patches?) # Conveniences for test writing vs. running My suggestions weren't picked up, but for one thing setting CLEANUP=0 in the lib (which means editing it...) can be useful for writers. # blocking until server ready pg_isready wrapper added. # Multiple masters back and forth, but supported in latest version. That's it. I've ran the latest (v9) tests works and passed on my system (fedora 64bit) and also under docker with --cpu-quota=10000, which simulates a slow machine. Michael, is there anything else to do here or shall I mark this for committer review? Regards, Amir
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Sat, Oct 10, 2015 at 9:04 PM, Amir Rohan wrote: > Now that v9 fixes the probkem, here's a summary from going over the > entire thread one last time: Thanks a lot for the summary of the events. > # Windows and TAP sets > Noah (2015-03) mentioned TAP doesn't work on windows, and hoped > this would include some work on that. > IIUC, the facilities and tests do run on windows, but focus was there > and not the preexisting TAP suite. They do work on Windows, see 13d856e. > # Test coverage (in the future) > Andres wanted a test for xid/multixid wraparound which also raises > the question of the tests that will need to be written in the future. I recall that this would have needed extra functions on the backend... > The patch focuses on providing facilities, while providing new coverage > for several features. There should be a TODO list on the wiki (bug > tracker, actually), where the list of tests to be written can be managed. > Some were mentioned in the thread (multi/xid wraparound > hot_standby_feedback, max_standby_archive_delay and > max_standby_streaming_delay? recovery_target_action? some in your > original list?), but threads > are precisely where these things get lost in the cracks. Sure, that's an on-going task. > # Directory structure > I suggested keeping backup/log/PGDATA per instance, rejected. I guess that I am still flexible on this one, the node information (own PGDATA, connection string, port, etc.) is logged as well so this is not a big deal to me... > # Parallel tests and port collisions > Lots about this. Final result is no port races are possible because > dedicated dirs are used per test, per instance. And because tcp > isn't used for connections on any platform (can you confirm that's > true on windows as well? I'm not familiar with sspi and what OSI > layer it lives on) On Windows you remain with the problem that all nodes initialized using TestLib.pm will listen to 127.0.0.1, sspi being used to ensure that the connection at user level is secure (additional entries in pg_hba.conf are added). > # decouple cleanup from node shutdown > Added (in latest patches?) Yes this was added. > Michael, is there anything else to do here or shall I mark this for > committer review? I have nothing else. Thanks a lot! -- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Amir Rohan
Date:
On 10/10/2015 04:32 PM, Michael Paquier wrote: > On Sat, Oct 10, 2015 at 9:04 PM, Amir Rohan wrote: >> Now that v9 fixes the problem, here's a summary from going over the >> entire thread one last time: > > Thanks a lot for the summary of the events. > >> # Windows and TAP sets >> Noah (2015-03) mentioned TAP doesn't work on windows, and hoped >> this would include some work on that. >> IIUC, the facilities and tests do run on windows, but focus was there >> and not the preexisting TAP suite. > > They do work on Windows, see 13d856e. > Thanks, I did not know that. >> # Test coverage (in the future) >> Andres wanted a test for xid/multixid wraparound which also raises >> the question of the tests that will need to be written in the future. > > I recall that this would have needed extra functions on the backend... > >> The patch focuses on providing facilities, while providing new coverage >> for several features. There should be a TODO list on the wiki (bug >> tracker, actually), where the list of tests to be written can be managed. >> Some were mentioned in the thread (multi/xid wraparound >> hot_standby_feedback, max_standby_archive_delay and >> max_standby_streaming_delay? recovery_target_action? some in your >> original list?), but threads >> are precisely where these things get lost in the cracks. > > Sure, that's an on-going task. > >> # Directory structure >> I suggested keeping backup/log/PGDATA per instance, rejected. > > I guess that I am still flexible on this one, the node information > (own PGDATA, connection string, port, etc.) is logged as well so this > is not a big deal to me... > >> # Parallel tests and port collisions >> Lots about this. Final result is no port races are possible because >> dedicated dirs are used per test, per instance. And because tcp >> isn't used for connections on any platform (can you confirm that's >> true on windows as well? I'm not familiar with sspi and what OSI >> layer it lives on) > > On Windows you remain with the problem that all nodes initialized > using TestLib.pm will listen to 127.0.0.1, sspi being used to ensure > that the connection at user level is secure (additional entries in > pg_hba.conf are added). > >> # decouple cleanup from node shutdown >> Added (in latest patches?) > > Yes this was added. > >> Michael, is there anything else to do here or shall I mark this for >> committer review? > > I have nothing else. Thanks a lot! > Ok, marked for committer, I hope I'm following "correct" cf procedure. Regards, Amir
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Amir Rohan
Date:
On 10/10/2015 04:32 PM, Michael Paquier wrote: > On Sat, Oct 10, 2015 at 9:04 PM, Amir Rohan wrote: >> The patch focuses on providing facilities, while providing new coverage >> for several features. There should be a TODO list on the wiki (bug >> tracker, actually), where the list of tests to be written can be managed. >> Some were mentioned in the thread (multi/xid wraparound >> hot_standby_feedback, max_standby_archive_delay and >> max_standby_streaming_delay? recovery_target_action? some in your >> original list?), but threads >> are precisely where these things get lost in the cracks. > > Sure, that's an on-going task. > I was arguing that it's an on-going task that would do better if it had a TODO list, instead of "ideas for tests" being scattered across 50-100 messages spanning a year or more in one thread or another. You may disagree. Amir
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Sat, Oct 10, 2015 at 10:52 PM, Amir Rohan <amir.rohan@zoho.com> wrote: > On 10/10/2015 04:32 PM, Michael Paquier wrote: > I was arguing that it's an on-going task that would do > better if it had a TODO list, instead of "ideas for tests" > being scattered across 50-100 messages spanning a year or > more in one thread or another. You may disagree. Let's be clear. I am fully in line with your point. -- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Amir Rohan
Date:
On 10/11/2015 02:47 AM, Michael Paquier wrote: > On Sat, Oct 10, 2015 at 10:52 PM, Amir Rohan <amir.rohan@zoho.com> wrote: >> On 10/10/2015 04:32 PM, Michael Paquier wrote: >> I was arguing that it's an on-going task that would do >> better if it had a TODO list, instead of "ideas for tests" >> being scattered across 50-100 messages spanning a year or >> more in one thread or another. You may disagree. > > Let's be clear. I am fully in line with your point. > I apologize -- that didn't came out right. What I meant to suggest was "open an issue" to track any works that needs to be done. But I guess that's not the PG way. Amir
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Sun, Oct 11, 2015 at 4:44 PM, Amir Rohan wrote: > On 10/11/2015 02:47 AM, Michael Paquier wrote: > I apologize -- that didn't came out right. > What I meant to suggest was "open an issue" to track > any works that needs to be done. But I guess that's > not the PG way. No problem. I was not clear either. We could create a new item in the TODO list (https://wiki.postgresql.org/wiki/Todo) and link it to dedicated page on the wiki where all the potential tests would be listed. -- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Amir Rohan
Date:
On 10/11/2015 01:19 PM, Michael Paquier wrote: > On Sun, Oct 11, 2015 at 4:44 PM, Amir Rohan wrote: >> On 10/11/2015 02:47 AM, Michael Paquier wrote: >> I apologize -- that didn't came out right. >> What I meant to suggest was "open an issue" to track >> any works that needs to be done. But I guess that's >> not the PG way. > > No problem. I was not clear either. We could create a new item in the > TODO list (https://wiki.postgresql.org/wiki/Todo) and link it to > dedicated page on the wiki where all the potential tests would be > listed. > It couldn't hurt but also may be just a waste of your time. I'm just realizing how central an issue tracker is to how I work and how much not having one irritates me. Tough luck for me I guess. Regards, Amir
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Hi, I just started looking this over a bit. The first thing I noticed is that it adds a dependency on Archive::Tar which isn't already used anywhere else. Did anybody check whether this exists back in 5.8 installations? Why is "recovery" added to ALWAYS_SUBDIRS in src/test/Makefile instead of to SUBDIRS? Seems a strange choice. Instead of adding print "# Error output: $stderr\n" if $stderr ne ""; to sub psql, I think it would be better to add line separators, which would be clearer if the error output ever turns into a multiline error messages. It would still show as empty if no stderr is produced; so I think something like if ($stderr ne '') {print "#### Begin standard error\n"print $stderr;print "#### End standard error\n"; } or something like that. In my days of Perl, it was starting to become frowned upon to call subroutines without parenthesizing arguments. Is that no longer the case? Because I notice there are many places in this patch and pre- existing that call psql with an argument list without parens. And it's a bit odd because I couldn't find any other subroutine that we're using in that way. In 005_replay_delay there's a 2s delay configured; then we test whether something is replayed in 1s. I hate tests that run for a long time, but is 2s good enough considering that some of our test animals in buildfarm are really slow? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Erik Rijkers
Date:
On 2015-11-18 16:21, Alvaro Herrera wrote: > Hi, I just started looking this over a bit. The first thing I noticed > is that it adds a dependency on Archive::Tar which isn't already used > anywhere else. Did anybody check whether this exists back in 5.8 > installations? Apparently it did not yet exist in core then, Module::CoreList says 5.9.3: $ perl -MModule::CoreList -e ' print Module::CoreList->first_release('Archive::Tar'), "\n";' 5.009003
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Noah Misch
Date:
On Wed, Nov 18, 2015 at 01:21:45PM -0200, Alvaro Herrera wrote: > In my days of Perl, it was starting to become frowned upon to call > subroutines without parenthesizing arguments. Is that no longer the > case? I've not witnessed those frowns. > Because I notice there are many places in this patch and pre- > existing that call psql with an argument list without parens. And it's > a bit odd because I couldn't find any other subroutine that we're using > in that way. TestLib.pm has unparenthesized calls to "standard_initdb", "start" and "run". 070_dropuser.pl has such calls to "start_test_server" and "psql". > In 005_replay_delay there's a 2s delay configured; then we test whether > something is replayed in 1s. I hate tests that run for a long time, but > is 2s good enough considering that some of our test animals in buildfarm > are really slow? That test will be unreliable, agreed.
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Thu, Nov 19, 2015 at 12:21 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > Hi, I just started looking this over a bit. The first thing I noticed > is that it adds a dependency on Archive::Tar which isn't already used > anywhere else. Did anybody check whether this exists back in 5.8 > installations? Actually I didn't and that's a good point, we have decided to support TAP down to 5.8.9. The only reason why I introduced this dependency is that there is no easy native way to copy an entire folder in perl, and that's for handling base backups. There are things like File::NCopy of File::Copy::Recursive however it does not seem like a good idea to depend on other modules that IPC::Run. Would it be better to have an in-core module dedicated to that similar to SimpleTee.pm? Or are you guys fine to accept a dependency with another module? > Why is "recovery" added to ALWAYS_SUBDIRS in src/test/Makefile instead > of to SUBDIRS? Seems a strange choice. Because I thought that it should not be part of the main regression suite, like ssl/. Feel free to correct me if my feeling is wrong. > Instead of adding > print "# Error output: $stderr\n" if $stderr ne ""; > to sub psql, I think it would be better to add line separators, which > would be clearer if the error output ever turns into a multiline error > messages. It would still show as empty if no stderr is produced; so I > think something like > if ($stderr ne '') > { > print "#### Begin standard error\n" > print $stderr; > print "#### End standard error\n"; > } > or something like that. Yes, that would be better. > In my days of Perl, it was starting to become frowned upon to call > subroutines without parenthesizing arguments. Is that no longer the > case? Because I notice there are many places in this patch and pre- > existing that call psql with an argument list without parens. And it's > a bit odd because I couldn't find any other subroutine that we're using > in that way. Hm, yeah. If we decide about a perl coding policy I would be happy to follow it. Personally I prefer usually using parenthesis however if we decide to make the calls consistent we had better address that as a separate patch. > In 005_replay_delay there's a 2s delay configured; then we test whether > something is replayed in 1s. I hate tests that run for a long time, but > is 2s good enough considering that some of our test animals in buildfarm > are really slow? A call to poll_query_until ensures that we wait for the standby to replay once the minimum replay threshold is reached. Even with a slow machine the first query would still see only 10 rows at the first try, and then wait for the standby to replay before checking if 20 rows are visible. Or I am not following your point. -- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Robert Haas
Date:
On Wed, Nov 18, 2015 at 10:21 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > In my days of Perl, it was starting to become frowned upon to call > subroutines without parenthesizing arguments. Is that no longer the > case? Because I notice there are many places in this patch and pre- > existing that call psql with an argument list without parens. And it's > a bit odd because I couldn't find any other subroutine that we're using > in that way. I've been coding in Perl for more than 20 years and have never heard of such a rule. Maybe I am not part of the "in" crowd. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Mike Blackwell
Date:
On Thu, Nov 19, 2015 at 10:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Nov 18, 2015 at 10:21 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> In my days of Perl, it was starting to become frowned upon to call
> subroutines without parenthesizing arguments. Is that no longer the
> case?
As I understand it, there are several reasons not to make function calls in Perl without parenthesis. Whether they are good reasons is a question for the user. Modern Perl chapter 5 covers most of them.
__________________________________________________________________________________
Mike Blackwell | Technical Analyst, Distribution Services/Rollout Management | RR Donnelley
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
Mike.Blackwell@rrd.com
http://www.rrdonnelley.com
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
David Steele
Date:
On 11/19/15 11:05 AM, Robert Haas wrote: > On Wed, Nov 18, 2015 at 10:21 AM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: >> In my days of Perl, it was starting to become frowned upon to call >> subroutines without parenthesizing arguments. Is that no longer the >> case? Because I notice there are many places in this patch and pre- >> existing that call psql with an argument list without parens. And it's >> a bit odd because I couldn't find any other subroutine that we're using >> in that way. > > I've been coding in Perl for more than 20 years and have never heard > of such a rule. I follow the convention of using parentheses for all function calls in Perl, though this stems more from my greater familiarity with languages that require them than any adherence to vague Perl conventions. I do think it makes the code [more] readable. -- -David david@pgmasters.net
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
I just noticed that RecoveryTest.pm is lacking "use strict; use warnings;". With those added, there's a number of problems reported: Global symbol "%datadir_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line66. Global symbol "%backup_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line 67. Global symbol "%archive_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line68. Global symbol "%connstr_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line69. Global symbol "%applname_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line70. Global symbol "%datadir_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line92. Global symbol "%connstr_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line93. Global symbol "%applname_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line93. Global symbol "%archive_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line104. Global symbol "%datadir_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line111. Global symbol "%archive_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line121. Global symbol "%datadir_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line130. Global symbol "%datadir_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line185. Global symbol "%datadir_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line197. Global symbol "@array" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line 220. Global symbol "%backup_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line 243. Global symbol "%archive_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line244. Global symbol "%datadir_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line246. Global symbol "%datadir_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line257. Global symbol "%backup_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line 258. Global symbol "%archive_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line259. Global symbol "%connstr_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line260. Global symbol "%applname_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line261. Global symbol "%backup_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line 272. Global symbol "%datadir_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line287. Global symbol "%backup_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line 288. Global symbol "%archive_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line289. Global symbol "%datadir_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line292. Global symbol "$current_dir" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line 294. Global symbol "%datadir_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line302. Global symbol "%datadir_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line313. Global symbol "%datadir_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line320. Global symbol "%datadir_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line367. Global symbol "%backup_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line 377. Global symbol "%datadir_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line390. Global symbol "%backup_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line 391. Global symbol "%archive_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line392. Global symbol "%connstr_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line393. Global symbol "%applname_nodes" requires explicit package name at /pgsql/source/master/src/test/perl/RecoveryTest.pm line394. Most of them are easily fixable by adding the correct "my" lines; but at least @array and $current_dir require more code to be written. TBH all that business with arrays that are kept in sync looks too contrived to me. Could we have a Perl object representing each node instead? That would require a "PostgresNode" package (or similar). The RecoveryTest.pm would have a single %nodes hash. Also, you don't need @active_nodes, just a flag in PostgresNode, and have the stop routine do nothing if node is not marked active. Also: if you pass the "root node" when creating a node that will become a standby, you don't need to pass it when calling, say, enable_streaming; the root node becomes an instance variable. (Hmm, actually, if we do that, I wonder what if in the future we want to test node promotion and a standby is repointed to a new master. Maybe we don't want to have this knowledge in the Perl code at all.) In get_free_port, isn't it easier to use pg_isready rather than psql? I've been messing with 003 because I think it's a bit too repetitive. Will finish it after you post a fixed version of RecoveryTest.pm. Thanks! -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Michael Paquier wrote: > On Thu, Nov 19, 2015 at 12:21 AM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > Hi, I just started looking this over a bit. The first thing I noticed > > is that it adds a dependency on Archive::Tar which isn't already used > > anywhere else. Did anybody check whether this exists back in 5.8 > > installations? > > Actually I didn't and that's a good point, we have decided to support > TAP down to 5.8.9. The only reason why I introduced this dependency is > that there is no easy native way to copy an entire folder in perl, and > that's for handling base backups. There are things like File::NCopy of > File::Copy::Recursive however it does not seem like a good idea to > depend on other modules that IPC::Run. Would it be better to have an > in-core module dedicated to that similar to SimpleTee.pm? Or are you > guys fine to accept a dependency with another module? It would be a lot better to not have to rely on another module existing everywhere. I'd rather have another simple module, following SimpleTee's example. Since this doesn't have to be terribly generic, it should be reasonably short, I hope. > > Why is "recovery" added to ALWAYS_SUBDIRS in src/test/Makefile instead > > of to SUBDIRS? Seems a strange choice. > > Because I thought that it should not be part of the main regression > suite, like ssl/. Feel free to correct me if my feeling is wrong. As I understand, the problem with "ssl" is that it messes with system-wide settings, which is not the case here. I'm inclined to move it to SUBDIRS. As an example, "modules" is not part of the main regression suite either. > > In my days of Perl, it was starting to become frowned upon to call > > subroutines without parenthesizing arguments. Is that no longer the > > case? Because I notice there are many places in this patch and pre- > > existing that call psql with an argument list without parens. And it's > > a bit odd because I couldn't find any other subroutine that we're using > > in that way. > > Hm, yeah. If we decide about a perl coding policy I would be happy to > follow it. Personally I prefer usually using parenthesis however if we > decide to make the calls consistent we had better address that as a > separate patch. Some votes against, some votes for. Ultimately, it seems that this depends on the committer. I don't really care all that much about this TBH. > > In 005_replay_delay there's a 2s delay configured; then we test whether > > something is replayed in 1s. I hate tests that run for a long time, but > > is 2s good enough considering that some of our test animals in buildfarm > > are really slow? > > A call to poll_query_until ensures that we wait for the standby to > replay once the minimum replay threshold is reached. Even with a slow > machine the first query would still see only 10 rows at the first try, > and then wait for the standby to replay before checking if 20 rows are > visible. Or I am not following your point. Ah, I see. Maybe it's fine then, or else I'm not following your point ;-) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Tue, Nov 24, 2015 at 6:27 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Michael Paquier wrote: >> On Thu, Nov 19, 2015 at 12:21 AM, Alvaro Herrera >> <alvherre@2ndquadrant.com> wrote: > >> > Hi, I just started looking this over a bit. The first thing I noticed >> > is that it adds a dependency on Archive::Tar which isn't already used >> > anywhere else. Did anybody check whether this exists back in 5.8 >> > installations? >> >> Actually I didn't and that's a good point, we have decided to support >> TAP down to 5.8.9. The only reason why I introduced this dependency is >> that there is no easy native way to copy an entire folder in perl, and >> that's for handling base backups. There are things like File::NCopy of >> File::Copy::Recursive however it does not seem like a good idea to >> depend on other modules that IPC::Run. Would it be better to have an >> in-core module dedicated to that similar to SimpleTee.pm? Or are you >> guys fine to accept a dependency with another module? > > It would be a lot better to not have to rely on another module existing > everywhere. I'd rather have another simple module, following > SimpleTee's example. Since this doesn't have to be terribly generic, it > should be reasonably short, I hope. Sure, that would be a simple function that does directory lookup and recursive calls. I'll move ahead with that then and reuse it in the recovery logic. >> > Why is "recovery" added to ALWAYS_SUBDIRS in src/test/Makefile instead >> > of to SUBDIRS? Seems a strange choice. >> >> Because I thought that it should not be part of the main regression >> suite, like ssl/. Feel free to correct me if my feeling is wrong. > > As I understand, the problem with "ssl" is that it messes with > system-wide settings, which is not the case here. I'm inclined to move > it to SUBDIRS. As an example, "modules" is not part of the main > regression suite either. OK, I'll move it back to it then. -- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
Thanks for the review.
On Tue, Nov 24, 2015 at 6:15 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Oops.
-- I just noticed that RecoveryTest.pm is lacking "use strict; use
warnings;". With those added, there's a number of problems reported:
Most of them are easily fixable by adding the correct "my" lines; but at
least @array and $current_dir require more code to be written.
Oops.
TBH all that business with arrays that are kept in sync looks too
contrived to me. Could we have a Perl object representing each node
instead?
Not really to be honest.
That would require a "PostgresNode" package (or similar). The
RecoveryTest.pm would have a single %nodes hash. Also, you don't need
@active_nodes, just a flag in PostgresNode, and have the stop routine do
nothing if node is not marked active. Also: if you pass the "root node"
when creating a node that will become a standby, you don't need to pass
it when calling, say, enable_streaming; the root node becomes an
instance variable. (Hmm, actually, if we do that, I wonder what if in
the future we want to test node promotion and a standby is repointed to
a new master. Maybe we don't want to have this knowledge in the Perl
code at all.)
I think I'll get the idea. In short all the parametrization will just happen at object level, as well as basic actions on the nodes like start, stop, restart etc.
In get_free_port, isn't it easier to use pg_isready rather than psql?
Will switch.
I've been messing with 003 because I think it's a bit too repetitive.
Will finish it after you post a fixed version of RecoveryTest.pm.
Sure, thanks.
I'll rework this patch and will update a new version soon.
Thanks again for the review.
Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Tue, Nov 24, 2015 at 2:14 PM, Michael Paquier wrote:
- Add strict and warnings to what is used in the new modules of this patchI'll rework this patch and will update a new version soon.
So, attached is a new patch addressing all the comments received. The new version has the following changes:
- Print more verbosely stderr output in case of error in psql
- Add recovery test suite to SUBDIRS in src/test/Makefile
- Add new module RecursiveCopy to be used for base backups. This removes the dependency with Archive::Tar. PostgresNode makes use of that when initializing a node from a backup.
- Tests have been updated to use the PostgresNode objects instead of the port number as identifier. That's more portable.
Hopefully I have missed nothing.
Regards,
--
Michael
Attachment
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Michael Paquier wrote: > - Manage node information using package/class PostgresNode.pm and have > RecoveryTest use it. I have actually made PostgresNode bare-bone and simple > on purpose: one can initialize the node, append configuration parameters to > it and manage it through start/stop/restart (we may want to add reload and > promote actually if needed). This looks great as a starting point. I think we should make TestLib depend on PostgresNode instead of the other way around. I will have a look at that (I realize this means messing with the existing tests). > I have also arrived at the conclusion that it is not really worth > adding a node status flag in PostgresNode because the port number > saved there is sufficient when doing free port lookup, and the list of > nodes used in a recovery test are saved in an array. I don't disagree with this in principle, but I think the design that you get a new PostgresNode object by calling get_free_port is strange. I think the port lookup code should be part of either TestLib or PostgresNode, not RecoveryTest. > - Add new module RecursiveCopy to be used for base backups. This removes > the dependency with Archive::Tar. PostgresNode makes use of that when > initializing a node from a backup. Great. > - Tests have been updated to use the PostgresNode objects instead of the > port number as identifier. That's more portable. Makes sense. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Wed, Nov 25, 2015 at 6:22 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Michael Paquier wrote:
> - Manage node information using package/class PostgresNode.pm and have
> RecoveryTest use it. I have actually made PostgresNode bare-bone and simple
> on purpose: one can initialize the node, append configuration parameters to
> it and manage it through start/stop/restart (we may want to add reload and
> promote actually if needed).
This looks great as a starting point. I think we should make TestLib
depend on PostgresNode instead of the other way around. I will have a
look at that (I realize this means messing with the existing tests).
Makes sense. My thoughts following that is that we should keep a track of the nodes started as an array which is part of TestLib, with PGHOST set once at startup using tempdir_short. That's surely an refactoring patch somewhat independent of the recovery test suite. I would not mind writing something among those lines if needed.
> I have also arrived at the conclusion that it is not really worth
> adding a node status flag in PostgresNode because the port number
> saved there is sufficient when doing free port lookup, and the list of
> nodes used in a recovery test are saved in an array.
I don't disagree with this in principle, but I think the design that you
get a new PostgresNode object by calling get_free_port is strange. I
think the port lookup code should be part of either TestLib or
PostgresNode, not RecoveryTest.
I'd vote for TestLib. I have written PostgresNode this way to allow users to set up arbitrary port numbers if they'd like to do so. That's more flexible.
-- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Michael Paquier wrote: > On Wed, Nov 25, 2015 at 6:22 AM, Alvaro Herrera <alvherre@2ndquadrant.com> > wrote: > > > Michael Paquier wrote: > > This looks great as a starting point. I think we should make TestLib > > depend on PostgresNode instead of the other way around. I will have a > > look at that (I realize this means messing with the existing tests). > > Makes sense. My thoughts following that is that we should keep a track of > the nodes started as an array which is part of TestLib, with PGHOST set > once at startup using tempdir_short. That's surely an refactoring patch > somewhat independent of the recovery test suite. I would not mind writing > something among those lines if needed. OK, please do. We can split this up in two patches: one introducing PostgresNode (+ RecursiveCopy) together with the refactoring of existing test code, and a subsequent one introducing RecoveryTest and the corresponding subdir. Sounds good? > > > I have also arrived at the conclusion that it is not really worth > > > adding a node status flag in PostgresNode because the port number > > > saved there is sufficient when doing free port lookup, and the list of > > > nodes used in a recovery test are saved in an array. > > > > I don't disagree with this in principle, but I think the design that you > > get a new PostgresNode object by calling get_free_port is strange. I > > think the port lookup code should be part of either TestLib or > > PostgresNode, not RecoveryTest. > > I'd vote for TestLib. I have written PostgresNode this way to allow users > to set up arbitrary port numbers if they'd like to do so. That's more > flexible. That works for me. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Wed, Nov 25, 2015 at 10:55 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
-- Michael Paquier wrote:
> On Wed, Nov 25, 2015 at 6:22 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
>
> > Michael Paquier wrote:
> > This looks great as a starting point. I think we should make TestLib
> > depend on PostgresNode instead of the other way around. I will have a
> > look at that (I realize this means messing with the existing tests).
>
> Makes sense. My thoughts following that is that we should keep a track of
> the nodes started as an array which is part of TestLib, with PGHOST set
> once at startup using tempdir_short. That's surely an refactoring patch
> somewhat independent of the recovery test suite. I would not mind writing
> something among those lines if needed.
OK, please do.
We can split this up in two patches: one introducing PostgresNode
(+ RecursiveCopy) together with the refactoring of existing test code,
and a subsequent one introducing RecoveryTest and the corresponding
subdir. Sounds good?
Yeah, that matches my line of thoughts. Will do so.
Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Wed, Nov 25, 2015 at 11:00 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Wed, Nov 25, 2015 at 10:55 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:Michael Paquier wrote:
> On Wed, Nov 25, 2015 at 6:22 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
>
> > Michael Paquier wrote:
> > This looks great as a starting point. I think we should make TestLib
> > depend on PostgresNode instead of the other way around. I will have a
> > look at that (I realize this means messing with the existing tests).
>
> Makes sense. My thoughts following that is that we should keep a track of
> the nodes started as an array which is part of TestLib, with PGHOST set
> once at startup using tempdir_short. That's surely an refactoring patch
> somewhat independent of the recovery test suite. I would not mind writing
> something among those lines if needed.
OK, please do.
We can split this up in two patches: one introducing PostgresNode
(+ RecursiveCopy) together with the refactoring of existing test code,
and a subsequent one introducing RecoveryTest and the corresponding
subdir. Sounds good?Yeah, that matches my line of thoughts. Will do so.
The result of a couple of hours of hacking is attached:
- 0001 is the refactoring adding PostgresNode and RecursiveCopy. I have also found that it is quite advantageous to move some of the routines that are synonyms of system() and the stuff used for logging into another low-level library that PostgresNode depends on, that I called TestBase in this patch. This way, all the infrastructure depends on the same logging management. Existing tests have been refactored to fit into the new code, and this leads to a couple of simplifications particularly in pg_rewind tests because there is no more need to have there routines for environment cleanup and logging. I have done tests on OSX and Windows using it and tests are passing. I have as well tested that ssl tests were working.
- 0002 adds the recovery tests with RecoveryTest.pm now located in src/test/recovery.
Regards,
--
Michael
Attachment
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Michael Paquier wrote: > The result of a couple of hours of hacking is attached: > - 0001 is the refactoring adding PostgresNode and RecursiveCopy. I have > also found that it is quite advantageous to move some of the routines that > are synonyms of system() and the stuff used for logging into another > low-level library that PostgresNode depends on, that I called TestBase in > this patch. This way, all the infrastructure depends on the same logging > management. Existing tests have been refactored to fit into the new code, > and this leads to a couple of simplifications particularly in pg_rewind > tests because there is no more need to have there routines for environment > cleanup and logging. I have done tests on OSX and Windows using it and > tests are passing. I have as well tested that ssl tests were working. Here's another version of this. I changed the packages a bit more. For starters, I moved the routines around a bit; some of your choices seemed more about keeping stuff where it was originally rather than moving it to where it made sense. These are the routines in each module: TestBase: system_or_bail system_log run_log slurp_dir slurp_file append_to_file TestLib: get_new_node teardown_node psql poll_query_until command_ok command_fails command_exit_is program_help_ok program_version_ok program_options_handling_ok command_like issues_sql_like I tried to get rid of teardown_node by having a DESTROY method for PostgresNode; that method would call "pg_ctl stop -m immediate". That would have been much cleaner. However, when a test fails this doesn't work sanely because the END block for File::Temp runs earlier than that DESTROY block, which means the datadir is already gone by the time pg_ctl stop runs, so the node stop doesn't work at all. (Perhaps we could fix this by noting postmaster's PID at start time, and then sending a signal directly instead of relying on pg_ctl). I moved all the initialization code (deleting stuff from environment, detecting Windows, opening SimpleTie filedescs etc) into BEGIN blocks, which run earlier than any other code. I perltidy'ed PostgresNode (and all the other files actually), to have the style match the rest of our code. I also updated some code to be more Perlish. I added a lot of error checking in RecursiveCopy. You had a "cat" call somewhere, which I replaced with slurp_file. I considered updating RewindTest so that it didn't have to export the node global variables, but decided not to, not because of the huge code churn for the t/*.pl files but because of the problem with the DESTROY method above: it didn't actually buy anything. Hmm. I just noticed RewindTest sets $ENV{PGDATABASE} outside BEGIN. Not sure what to think of that. Could instead pass the database name in $node->getConnStr() calls, like run_pg_rewind() is already doing. I tried all the t/ tests we have and all of them pass for me. If I'm able, I will push this on my Sunday late evening, so that I can fix whatever gets red on Monday first thing ... -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Sat, Nov 28, 2015 at 7:53 AM, Alvaro Herrera wrote: > I moved all the initialization code (deleting stuff from environment, > detecting Windows, opening SimpleTie filedescs etc) into BEGIN blocks, > which run earlier than any other code. Ah, OK. Thanks. That makes visibly the whole set of modules more consistent. > Hmm. I just noticed RewindTest sets $ENV{PGDATABASE} outside BEGIN. Not > sure what to think of that. Could instead pass the database name in > $node->getConnStr() calls, like run_pg_rewind() is already doing. Yes, let's remove that and pass the database name to getConnStr(). > I tried all the t/ tests we have and all of them pass for me. If I'm > able, I will push this on my Sunday late evening, so that I can fix > whatever gets red on Monday first thing ... I have done as well additional tests on Windows and this patch is showing a green status. A separate issue, but as long as we are working on this set of tests: I have noticed that config_default.pl is missing the flag tap_tests in its list. See the patch attached. Could you apply that as well and backpatch? I have as well noticed that RewindTest.pm is missing "1;" on its last line. When this is loaded this would lead to compilation errors. -- Michael
Attachment
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Michael Paquier wrote: > On Sat, Nov 28, 2015 at 7:53 AM, Alvaro Herrera wrote: > > Hmm. I just noticed RewindTest sets $ENV{PGDATABASE} outside BEGIN. Not > > sure what to think of that. Could instead pass the database name in > > $node->getConnStr() calls, like run_pg_rewind() is already doing. > > Yes, let's remove that and pass the database name to getConnStr(). Ok. > A separate issue, but as long as we are working on this set of tests: > I have noticed that config_default.pl is missing the flag tap_tests in > its list. See the patch attached. Could you apply that as well and > backpatch? > I have as well noticed that RewindTest.pm is missing "1;" on its last > line. When this is loaded this would lead to compilation errors. Sure. > > I tried all the t/ tests we have and all of them pass for me. If I'm > > able, I will push this on my Sunday late evening, so that I can fix > > whatever gets red on Monday first thing ... > > I have done as well additional tests on Windows and this patch is > showing a green status. Great. Here's your recovery test patch rebased, for your (and others'!) perusal. It passes for me. (Test 003 is unchanged.) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Sun, Nov 29, 2015 at 12:13 AM, Alvaro Herrera wrote: > Here's your recovery test patch rebased, for your (and others'!) > perusal. It passes for me. (Test 003 is unchanged.) Are you planning to push that as well? It does not have much coverage but I guess that's quite good for a first shot, and that can serve as example for future tests. Still, the first patch adds enough infrastructure to allow any other module to have more complex regression test scenarios, the first two targets coming immediately to my mind being the quorum syncrep patch and pg_rewind and its timeline switch manipulation. So that's more than welcome! -- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Noah Misch
Date:
On Fri, Nov 27, 2015 at 07:53:10PM -0300, Alvaro Herrera wrote: > Michael Paquier wrote: > > The result of a couple of hours of hacking is attached: > > - 0001 is the refactoring adding PostgresNode and RecursiveCopy. I have > > also found that it is quite advantageous to move some of the routines that > > are synonyms of system() and the stuff used for logging into another > > low-level library that PostgresNode depends on, that I called TestBase in > > this patch. > Here's another version of this. I changed the packages a bit more. For > starters, I moved the routines around a bit; some of your choices seemed > more about keeping stuff where it was originally rather than moving it > to where it made sense. These are the routines in each module: > > TestBase: system_or_bail system_log run_log slurp_dir slurp_file > append_to_file > > TestLib: get_new_node teardown_node psql poll_query_until command_ok > command_fails command_exit_is program_help_ok program_version_ok > program_options_handling_ok command_like issues_sql_like The proposed code is short on guidance about when to put a function in TestLib versus TestBase. TestLib has no header comment. The TestBase header comment would permit, for example, command_ok() in that module. I would try instead keeping TestLib as the base module and moving into PostgresNode the functions that deal with PostgreSQL clusters (get_new_node teardown_node psql poll_query_until issues_sql_like). > I tried to get rid of teardown_node by having a DESTROY method for > PostgresNode; that method would call "pg_ctl stop -m immediate". That > would have been much cleaner. However, when a test fails this doesn't > work sanely because the END block for File::Temp runs earlier than that > DESTROY block, which means the datadir is already gone by the time > pg_ctl stop runs, so the node stop doesn't work at all. (Perhaps we > could fix this by noting postmaster's PID at start time, and then > sending a signal directly instead of relying on pg_ctl). You could disable File::Temp cleanup and handle cleanup yourself at the desired time. (I haven't reviewed whether the goal of removing teardown_node is otherwise good.) > +my $node = get_new_node(); > +# Initialize node without replication settings > +$node->initNode(0); > +$node->startNode(); > +my $pgdata = $node->getDataDir(); > + > +$ENV{PGPORT} = $node->getPort(); Starting a value retrieval method name with "get" is not Perlish. The TAP suites currently follow "man perlstyle" in using underscored_lower_case method names. No PostgreSQL Perl code uses lowerFirstCamelCase, though some uses CamelCase. The word "Node" is redundant. Use this style: $node->init(0); $node->start; my $pgdata = $node->data_dir; $ENV{PGPORT} = $node->port; As a matter of opinion, I recommend giving "init" key/value arguments instead of the single Boolean argument. The method could easily need more options in the future, and this makes the call site self-documenting: $node->init(hba_permit_replication => 0); > - 'pg_controldata with nonexistent directory fails'); > + 'pg_controldata with nonexistent directory fails'); perltidy will undo this whitespace-only change. > --- a/src/bin/pg_rewind/t/001_basic.pl > +++ b/src/bin/pg_rewind/t/001_basic.pl > @@ -1,9 +1,11 @@ > +# Basic pg_rewind test. > + > use strict; > use warnings; > -use TestLib; > -use Test::More tests => 8; > > use RewindTest; > +use TestLib; > +use Test::More tests => 8; Revert all changes to this file. Audit the rest of the patch for whitespace change unrelated to the subject. > - 'fails with nonexistent table'); > + 'fails with nonexistent table'); > -'CREATE TABLE test1 (a int); CREATE INDEX test1x ON test1 (a); CLUSTER test1 USING test1x'; > + 'CREATE TABLE test1 (a int); CREATE INDEX test1x ON test1 (a); CLUSTER test1 USING test1x'; perltidy will undo these whitespace-only changes. > +# cluster -a is not compatible with -d, hence enforce environment variables s/cluster -a/clusterdb -a/ > -issues_sql_like( > +$ENV{PGPORT} = $node->getPort(); > + > +issues_sql_like($node, perltidy will move $node to its own line. > -command_fails([ 'createuser', 'user1' ], 'fails if role already exists'); > +command_fails([ 'createuser', 'user1' ], > + 'fails if role already exists'); perltidy will undo this whitespace-only change. > @@ -0,0 +1,252 @@ > +# PostgresNode, simple node representation for regression tests. > +# > +# Regression tests should use this basic class infrastructure to define nodes > +# that need used in the test modules/scripts. > +package PostgresNode; Consider just saying, "Class representing a data directory and postmaster." > + my $self = { > + _port => undef, > + _host => undef, > + _basedir => undef, > + _applname => undef, > + _logfile => undef }; > + > + # Set up each field > + $self->{_port} = $pgport; > + $self->{_host} = $pghost; > + $self->{_basedir} = TestBase::tempdir; > + $self->{_applname} = "node_$pgport"; > + $self->{_logfile} = "$TestBase::log_path/node_$pgport.log"; Why set fields to undef immediately before filling them? > @@ -0,0 +1,143 @@ > +# Set of low-level routines dedicated to base tasks for regression tests, like > +# command execution and logging. > +# > +# This module should not depend on any other PostgreSQL regression test > +# modules. > +package TestBase; This is no mere set of routines. Just "use"-ing this module creates some directories and alters stdin/stdout/stderr. > +BEGIN > +{ > + $windows_os = $Config{osname} eq 'MSWin32' || $Config{osname} eq 'msys'; > + > + # Determine output directories, and create them. The base path is the > + # TESTDIR environment variable, which is normally set by the invoking > + # Makefile. > + $tmp_check = $ENV{TESTDIR} ? "$ENV{TESTDIR}/tmp_check" : "tmp_check"; > + $log_path = "$tmp_check/log"; > + > + mkdir $tmp_check; > + mkdir $log_path; Never mutate the filesystem in a BEGIN block, because "perl -c" runs BEGIN blocks. (Likewise for the BEGIN block this patch adds to TestLib.) nm
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Andrew Dunstan
Date:
On 11/29/2015 04:28 PM, Noah Misch wrote: > +BEGIN > +{ > + $windows_os = $Config{osname} eq 'MSWin32' || $Config{osname} eq 'msys'; > + > + # Determine output directories, and create them. The base path is the > + # TESTDIR environment variable, which is normally set by the invoking > + # Makefile. > + $tmp_check = $ENV{TESTDIR} ? "$ENV{TESTDIR}/tmp_check" : "tmp_check"; > + $log_path = "$tmp_check/log"; > + > + mkdir $tmp_check; > + mkdir $log_path; > Never mutate the filesystem in a BEGIN block, because "perl -c" runs BEGIN > blocks. (Likewise for the BEGIN block this patch adds to TestLib.) Yeah, those two lines might belong in an INIT block. "perldoc perlmod" for details. cheers andrew
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Mon, Nov 30, 2015 at 6:28 AM, Noah Misch <noah@leadboat.com> wrote: > > On Fri, Nov 27, 2015 at 07:53:10PM -0300, Alvaro Herrera wrote: > > Michael Paquier wrote: > > > The result of a couple of hours of hacking is attached: > > > - 0001 is the refactoring adding PostgresNode and RecursiveCopy. I have > > > also found that it is quite advantageous to move some of the routines that > > > are synonyms of system() and the stuff used for logging into another > > > low-level library that PostgresNode depends on, that I called TestBase in > > > this patch. > > > Here's another version of this. I changed the packages a bit more. For > > starters, I moved the routines around a bit; some of your choices seemed > > more about keeping stuff where it was originally rather than moving it > > to where it made sense. These are the routines in each module: > > > > TestBase: system_or_bail system_log run_log slurp_dir slurp_file > > append_to_file > > > > TestLib: get_new_node teardown_node psql poll_query_until command_ok > > command_fails command_exit_is program_help_ok program_version_ok > > program_options_handling_ok command_like issues_sql_like > > The proposed code is short on guidance about when to put a function in TestLib > versus TestBase. TestLib has no header comment. The TestBase header comment > would permit, for example, command_ok() in that module. I would try instead > keeping TestLib as the base module and moving into PostgresNode the functions > that deal with PostgreSQL clusters (get_new_node teardown_node psql > poll_query_until issues_sql_like). PostgresNode is wanted to be a base representation of how of node is, not of how to operate on it. The ways to perform the tests, which works on a node, is wanted as a higher-level operation. Logging and base configuration of a test set is a lower level of operations than PostgresNode, because cluster nodes need actually to perform system calls, some of those system calls like run_log allowing to log in the centralized log file. I have tried to make the headers of those modules more verbose, please see attached. > > > +my $node = get_new_node(); > > +# Initialize node without replication settings > > +$node->initNode(0); > > +$node->startNode(); > > +my $pgdata = $node->getDataDir(); > > + > > +$ENV{PGPORT} = $node->getPort(); > > Starting a value retrieval method name with "get" is not Perlish. The TAP > suites currently follow "man perlstyle" in using underscored_lower_case method > names. No PostgreSQL Perl code uses lowerFirstCamelCase, though some uses > CamelCase. The word "Node" is redundant. Use this style: > > $node->init(0); > $node->start; > my $pgdata = $node->data_dir; > $ENV{PGPORT} = $node->port; I have switched the style this way. > As a matter of opinion, I recommend giving "init" key/value arguments instead > of the single Boolean argument. The method could easily need more options in > the future, and this makes the call site self-documenting: > > $node->init(hba_permit_replication => 0); Done. > > > - 'pg_controldata with nonexistent directory fails'); > > + 'pg_controldata with nonexistent directory fails'); > > perltidy will undo this whitespace-only change. Cleaned up. > > > --- a/src/bin/pg_rewind/t/001_basic.pl > > +++ b/src/bin/pg_rewind/t/001_basic.pl > > @@ -1,9 +1,11 @@ > > +# Basic pg_rewind test. > > + > > use strict; > > use warnings; > > -use TestLib; > > -use Test::More tests => 8; > > > > use RewindTest; > > +use TestLib; > > +use Test::More tests => 8; > > Revert all changes to this file. Audit the rest of the patch for whitespace > change unrelated to the subject. Done. > > > > - 'fails with nonexistent table'); > > + 'fails with nonexistent table'); > > > -'CREATE TABLE test1 (a int); CREATE INDEX test1x ON test1 (a); CLUSTER test1 USING test1x'; > > + 'CREATE TABLE test1 (a int); CREATE INDEX test1x ON test1 (a); CLUSTER test1 USING test1x'; > > perltidy will undo these whitespace-only changes. Cleaned up. > > > +# cluster -a is not compatible with -d, hence enforce environment variables > > s/cluster -a/clusterdb -a/ Fixed. > > > -command_fails([ 'createuser', 'user1' ], 'fails if role already exists'); > > +command_fails([ 'createuser', 'user1' ], > > + 'fails if role already exists'); > > perltidy will undo this whitespace-only change. > > > @@ -0,0 +1,252 @@ > > +# PostgresNode, simple node representation for regression tests. > > +# > > +# Regression tests should use this basic class infrastructure to define nodes > > +# that need used in the test modules/scripts. > > +package PostgresNode; > > Consider just saying, "Class representing a data directory and postmaster." OK, I have changed this description: +# PostgresNode, class representing a data directory and postmaster. +# +# This contains a basic set of routines able to work on a PostgreSQL node, +# allowing to start, stop, backup and initialize it with various options. > > > + my $self = { > > + _port => undef, > > + _host => undef, > > + _basedir => undef, > > + _applname => undef, > > + _logfile => undef }; > > + > > + # Set up each field > > + $self->{_port} = $pgport; > > + $self->{_host} = $pghost; > > + $self->{_basedir} = TestBase::tempdir; > > + $self->{_applname} = "node_$pgport"; > > + $self->{_logfile} = "$TestBase::log_path/node_$pgport.log"; > > Why set fields to undef immediately before filling them? Fixed. > > > @@ -0,0 +1,143 @@ > > +# Set of low-level routines dedicated to base tasks for regression tests, like > > +# command execution and logging. > > +# > > +# This module should not depend on any other PostgreSQL regression test > > +# modules. > > +package TestBase; > > This is no mere set of routines. Just "use"-ing this module creates some > directories and alters stdin/stdout/stderr. I have updated the description of this file. > > > +BEGIN > > +{ > > + $windows_os = $Config{osname} eq 'MSWin32' || $Config{osname} eq 'msys'; > > + > > + # Determine output directories, and create them. The base path is the > > + # TESTDIR environment variable, which is normally set by the invoking > > + # Makefile. > > + $tmp_check = $ENV{TESTDIR} ? "$ENV{TESTDIR}/tmp_check" : "tmp_check"; > > + $log_path = "$tmp_check/log"; > > + > > + mkdir $tmp_check; > > + mkdir $log_path; > > Never mutate the filesystem in a BEGIN block, because "perl -c" runs BEGIN > blocks. (Likewise for the BEGIN block this patch adds to TestLib.) Hm. It seems to me that the whole block should be part of INIT then, because the log file where STDERR and STDOUT is recaptured depends on those to be created as well. By doing this change, please note that compilation errors are not recaptured into the log file (thanks Andrew for the pointers to perlmod). I have as well updated pg_rewind tests to remove PGDATABASE. Patch to address those issues is attached. Regards, -- Michael
Attachment
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Michael Paquier wrote: > On Mon, Nov 30, 2015 at 6:28 AM, Noah Misch <noah@leadboat.com> wrote: > > The proposed code is short on guidance about when to put a function in TestLib > > versus TestBase. TestLib has no header comment. The TestBase header comment > > would permit, for example, command_ok() in that module. I would try instead > > keeping TestLib as the base module and moving into PostgresNode the functions > > that deal with PostgreSQL clusters (get_new_node teardown_node psql > > poll_query_until issues_sql_like). > > PostgresNode is wanted to be a base representation of how of node is, > not of how to operate on it. The ways to perform the tests, which > works on a node, is wanted as a higher-level operation. > > Logging and base configuration of a test set is a lower level of > operations than PostgresNode, because cluster nodes need actually to > perform system calls, some of those system calls like run_log allowing > to log in the centralized log file. I have tried to make the headers > of those modules more verbose, please see attached. I'm not terribly convinced by this argument TBH. Perhaps we can have PostgresNode be one package, and the logging routines be another package, and we create a higher-level package whose @ISA=(PostgresNode, LoggingWhatever) and then we move the routines suggested by Noah into that new package. Then the tests use that instead of PostgresNode directly. > > > --- a/src/bin/pg_rewind/t/001_basic.pl > > > +++ b/src/bin/pg_rewind/t/001_basic.pl > > > @@ -1,9 +1,11 @@ > > > +# Basic pg_rewind test. > > > + > > > use strict; > > > use warnings; > > > -use TestLib; > > > -use Test::More tests => 8; > > > > > > use RewindTest; > > > +use TestLib; > > > +use Test::More tests => 8; > > > > Revert all changes to this file. Audit the rest of the patch for whitespace > > change unrelated to the subject. > > Done. I perltidied several files, though not consistently. Regarding this particular hunk, what is going on here is that I moved "use strict;use warnings" as one stanza, followed by all the other "use" lines as another stanza, alphabetically. It was previously a bit messy, with @EXPORTS and other stuff in between "use" lines. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Mon, Nov 30, 2015 at 11:51 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Michael Paquier wrote: >> On Mon, Nov 30, 2015 at 6:28 AM, Noah Misch <noah@leadboat.com> wrote: > >> > The proposed code is short on guidance about when to put a function in TestLib >> > versus TestBase. TestLib has no header comment. The TestBase header comment >> > would permit, for example, command_ok() in that module. I would try instead >> > keeping TestLib as the base module and moving into PostgresNode the functions >> > that deal with PostgreSQL clusters (get_new_node teardown_node psql >> > poll_query_until issues_sql_like). >> >> PostgresNode is wanted to be a base representation of how of node is, >> not of how to operate on it. The ways to perform the tests, which >> works on a node, is wanted as a higher-level operation. >> >> Logging and base configuration of a test set is a lower level of >> operations than PostgresNode, because cluster nodes need actually to >> perform system calls, some of those system calls like run_log allowing >> to log in the centralized log file. I have tried to make the headers >> of those modules more verbose, please see attached. > > I'm not terribly convinced by this argument TBH. Perhaps we can have > PostgresNode be one package, and the logging routines be another > package, and we create a higher-level package whose @ISA=(PostgresNode, > LoggingWhatever) and then we move the routines suggested by Noah into > that new package. Then the tests use that instead of PostgresNode > directly. OK... I have merged TestLib and PostgresNode of the previous patch into PostgresNode into the way suggested by Noah. TestBase has been renamed back to TestLib, and includes as well the base test functions like command_ok. Regards, -- Michael
Attachment
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Michael Paquier wrote: > OK... I have merged TestLib and PostgresNode of the previous patch > into PostgresNode into the way suggested by Noah. TestBase has been > renamed back to TestLib, and includes as well the base test functions > like command_ok. Great, thanks. Here's one more version, hopefully the last one. - I discovered that not setting PGPORT was causing some of the tests that fail (using command_fails) to fail to test what was being tested. The problem is that the command was failing with "could not connect to server" instead of failing because of trying to cluster a nonexistant table, etc. Unfortunately the only way to verify this is by looking at the regress_log_xxx_foobar file. Not ideal, but not this patch's fault. - I changed the routines moved to PostgresNode so that they are instance methods, i.e. $node->poll_until_query; also psql and issues_query_like. The latter also sets "local $PGPORT" so that the command is run with the correct port. - It would be nice to have command_ok and command_fails in PostgresNode too; that would remove the need for setting $ENV{PGPORT} but it's possible to run commands outside a node too, so we'd need duplicates, which would be worse. - I changed start/stop/restart so that they keep track of the postmaster PID; also added a DESTROY sub to PostgresNode that sends SIGQUIT. This means that when the test finishes, the server gets an immediate stop signal. We were getting a lot of errors in the server log about failing to write to the stats file otherwise, until the node noticed that the datadir was gone. - I removed the @active_nodes array, which is now unnecessary (per above). - I moved the "delete $ENV{PGxxx}" back to a BEGIN block in TestLib. I did it because it's possible to write test scripts without PostgresNode, and it's not nice to have those pick up the environment. This still affects anything using PostgresNode because that one uses TestLib. Finally, I ran perltidy on all the files, which strangely changed stuff that I didn't expect it to change. I wonder if this is related to the perltidy version. Do you see further changes if you run perltidy on the patched tree? This is my version: $ perltidy --version This is perltidy, v20140328 Copyright 2000-2014, Steve Hancock Perltidy is free software and may be copied under the terms of the GNU General Public License, which is included in the distribution files. Complete documentation for perltidy can be found using 'man perltidy' or on the internet at http://perltidy.sourceforge.net. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Wed, Dec 2, 2015 at 8:11 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > - I discovered that not setting PGPORT was causing some of the tests > that fail (using command_fails) to fail to test what was being tested. > The problem is that the command was failing with "could not connect to > server" instead of failing because of trying to cluster a nonexistant > table, etc. Unfortunately the only way to verify this is by looking > at the regress_log_xxx_foobar file. Not ideal, but not this patch's > fault. Nice catch. > - I changed the routines moved to PostgresNode so that they are instance > methods, i.e. $node->poll_until_query; also psql and > issues_query_like. The latter also sets "local $PGPORT" so that the > command is run with the correct port. OK. > - It would be nice to have command_ok and command_fails in PostgresNode > too; that would remove the need for setting $ENV{PGPORT} but it's > possible to run commands outside a node too, so we'd need duplicates, > which would be worse. I am fine to let it the way your patch does it. There are already many changes. > - I removed the @active_nodes array, which is now unnecessary (per > above). So that's basically replaced by @all_nodes. > - I moved the "delete $ENV{PGxxx}" back to a BEGIN block in TestLib. > I did it because it's possible to write test scripts without > PostgresNode, and it's not nice to have those pick up the environment. > This still affects anything using PostgresNode because that one uses > TestLib. OK. > Finally, I ran perltidy on all the files, which strangely changed stuff > that I didn't expect it to change. I wonder if this is related to the > perltidy version. Do you see further changes if you run perltidy on the > patched tree? SimpleTee.pm shows some diffs, though it doesn't seem that this patch should care about that. The rest is showing no diffs. And I have used perltidy v20140711. -- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Michael Paquier wrote: > On Wed, Dec 2, 2015 at 8:11 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > - It would be nice to have command_ok and command_fails in PostgresNode > > too; that would remove the need for setting $ENV{PGPORT} but it's > > possible to run commands outside a node too, so we'd need duplicates, > > which would be worse. > > I am fine to let it the way your patch does it. There are already many changes. Idea: we can have a bare command_ok exported by TestLib just as currently, and instance method PostgresNode->command_ok that first sets local $ENV{PGPORT} and then calls the other one. > > - I removed the @active_nodes array, which is now unnecessary (per > > above). > > So that's basically replaced by @all_nodes. @all_nodes is only used to look for unused port numbers. > > Finally, I ran perltidy on all the files, which strangely changed stuff > > that I didn't expect it to change. I wonder if this is related to the > > perltidy version. Do you see further changes if you run perltidy on the > > patched tree? > > SimpleTee.pm shows some diffs, though it doesn't seem that this patch > should care about that. The rest is showing no diffs. And I have used > perltidy v20140711. Yes, the patch doesn't modify SimpleTee -- I just used "find" to indent perl files. What I don't understand is why doesn't my perltidy run match what was in master currently. It should be a no-op ... -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Wed, Dec 2, 2015 at 12:01 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Michael Paquier wrote: >> On Wed, Dec 2, 2015 at 8:11 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > >> > - It would be nice to have command_ok and command_fails in PostgresNode >> > too; that would remove the need for setting $ENV{PGPORT} but it's >> > possible to run commands outside a node too, so we'd need duplicates, >> > which would be worse. >> >> I am fine to let it the way your patch does it. There are already many changes. > > Idea: we can have a bare command_ok exported by TestLib just as > currently, and instance method PostgresNode->command_ok that first sets > local $ENV{PGPORT} and then calls the other one. Hm. That would be cleaner and make the code more consistent. Now as TestLib exports command_ok, command_like and command_fails, we would get redefinition errors when compiling the code if those routines are not named differently in PostgresNode. If you want to have the names consistent, then I guess that the only way would be to remove those routines from the export list of TestLib and call them directly as for example TestLib::command_ok(). See for example the patch attached that applies on top on your patch 2 that adds a set of routines in PostgresNode with a slightly different name. >> > Finally, I ran perltidy on all the files, which strangely changed stuff >> > that I didn't expect it to change. I wonder if this is related to the >> > perltidy version. Do you see further changes if you run perltidy on the >> > patched tree? >> >> SimpleTee.pm shows some diffs, though it doesn't seem that this patch >> should care about that. The rest is showing no diffs. And I have used >> perltidy v20140711. > > Yes, the patch doesn't modify SimpleTee -- I just used "find" to indent > perl files. What I don't understand is why doesn't my perltidy run > match what was in master currently. It should be a no-op ... Well I don't get it either :) I did a run on top of your two patches and saw no differences. -- Michael
Attachment
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Wed, Dec 2, 2015 at 1:04 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Dec 2, 2015 at 12:01 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: >> Michael Paquier wrote: >>> On Wed, Dec 2, 2015 at 8:11 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> >>> > - It would be nice to have command_ok and command_fails in PostgresNode >>> > too; that would remove the need for setting $ENV{PGPORT} but it's >>> > possible to run commands outside a node too, so we'd need duplicates, >>> > which would be worse. >>> >>> I am fine to let it the way your patch does it. There are already many changes. >> >> Idea: we can have a bare command_ok exported by TestLib just as >> currently, and instance method PostgresNode->command_ok that first sets >> local $ENV{PGPORT} and then calls the other one. > > Hm. That would be cleaner and make the code more consistent. Now as > TestLib exports command_ok, command_like and command_fails, we would > get redefinition errors when compiling the code if those routines are > not named differently in PostgresNode. If you want to have the names > consistent, then I guess that the only way would be to remove those > routines from the export list of TestLib and call them directly as for > example TestLib::command_ok(). See for example the patch attached that > applies on top on your patch 2 that adds a set of routines in > PostgresNode with a slightly different name. Well, Alvaro has whispered me a more elegant method by using TestLib() to only import a portion of the routines and avoid the redefinition errors. Hence, patch 0001 attached creates equivalents of command_* for PostgresNode and tests use it without setting PGPORT. Patch 0002 is a run of perltidy on the whole. -- Michael
Attachment
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Noah Misch
Date:
On Tue, Dec 01, 2015 at 08:11:21PM -0300, Alvaro Herrera wrote: > Finally, I ran perltidy on all the files, which strangely changed stuff > that I didn't expect it to change. I wonder if this is related to the > perltidy version. The last pgindent run (commit 807b9e0) used perltidy v20090616, and perltidy behavior has changed slightly over time. Install that version to do your own perltidy runs.
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Noah Misch wrote: > On Tue, Dec 01, 2015 at 08:11:21PM -0300, Alvaro Herrera wrote: > > Finally, I ran perltidy on all the files, which strangely changed stuff > > that I didn't expect it to change. I wonder if this is related to the > > perltidy version. > > The last pgindent run (commit 807b9e0) used perltidy v20090616, and perltidy > behavior has changed slightly over time. Install that version to do your own > perltidy runs. I tried that version, but it seems to emit the same. How did you figure that that was the version used, anyway? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Michael Paquier wrote: > Well, Alvaro has whispered me a more elegant method by using TestLib() > to only import a portion of the routines and avoid the redefinition > errors. Hence, patch 0001 attached creates equivalents of command_* > for PostgresNode and tests use it without setting PGPORT. Patch 0002 > is a run of perltidy on the whole. It seemed better to me to have the import list be empty, i.e. "use TestLib ()" and then qualify the routine names inside PostgresNode, instead of having to list the names of the routines to import, so I pushed it that way after running the tests a few more times. (Another option would be to have those routines be in EXPORT_OK instead of EXPORT, but then every other user of TestLib would have to explicitely declare that it wants those routines imported. Maybe this is a good change; if anyone wants to push for that, patches welcome.) I didn't push the changed for config_default you requested a few messages upthread; it's not clear to me how setting it to undef affects the whole thing. If setting it to undef makes the MSVC toolchain run the tap tests in the default config, then I can do it; let's be clear about what branch to backpatch this to. Also the "1;" at the end of RewindTest. Thanks for the patch. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Thu, Dec 3, 2015 at 6:59 AM, Alvaro Herrera wrote: > I didn't push the changed for config_default you requested a few > messages upthread; it's not clear to me how setting it to undef affects > the whole thing. If setting it to undef makes the MSVC toolchain run > the tap tests in the default config, then I can do it; let's be clear > about what branch to backpatch this to. Also the "1;" at the end of > RewindTest. Setting it to undef will prevent the tests to run, per vcregress.pl: die "Tap tests not enabled in configuration" unless $config->{tap_tests}; Also, setting it to undef will match the existing behavior on platforms where ./configure is used because the switch --enable-tap-tests needs to be used there. And I would believe that in most cases Windows environments are not going to have IPC::Run deployed. I have also rebased the recovery test suite as the attached, using the infrastructure that has been committed lately. Regards, -- Michael
Attachment
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Thu, Dec 3, 2015 at 6:59 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I didn't push the changes for config_default you requested a few > messages upthread; it's not clear to me how setting it to undef affects > the whole thing. If setting it to undef makes the MSVC toolchain run > the tap tests in the default config, then I can do it; let's be clear > about what branch to backpatch this to. Also the "1;" at the end of > RewindTest. Setting it to undef will prevent the tests to run, per vcregress.pl: sub tap_check { die "Tap tests not enabled in configuration" unless $config->{tap_tests}; Also, setting it to undef will match the existing behavior on platforms where ./configure is used because the switch --enable-tap-tests needs to be used there. And I would believe that in most cases Windows environments are not going to have IPC::Run deployed. I have also rebased the recovery test suite as the attached, using the infrastructure that has been committed lately. Regards, -- Michael
Attachment
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Thu, Dec 3, 2015 at 3:44 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Thu, Dec 3, 2015 at 6:59 AM, Alvaro Herrera wrote: >> I didn't push the changed for config_default you requested a few >> messages upthread; it's not clear to me how setting it to undef affects >> the whole thing. If setting it to undef makes the MSVC toolchain run >> the tap tests in the default config, then I can do it; let's be clear >> about what branch to backpatch this to. Also the "1;" at the end of >> RewindTest. > > > Setting it to undef will prevent the tests to run, per vcregress.pl: > die "Tap tests not enabled in configuration" > unless $config->{tap_tests}; > Also, setting it to undef will match the existing behavior on > platforms where ./configure is used because the switch > --enable-tap-tests needs to be used there. And I would believe that in > most cases Windows environments are not going to have IPC::Run > deployed. > > I have also rebased the recovery test suite as the attached, using the > infrastructure that has been committed lately. By the way, if there are no objections, I am going to mark this patch as committed in the CF app. Putting in the infrastructure is already a good step forward, and I will add an entry in next CF to track the patch to add tests for recovery itself. Alvaro, what do you think? -- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Noah Misch
Date:
On Wed, Dec 02, 2015 at 04:33:50PM -0300, Alvaro Herrera wrote: > Noah Misch wrote: > > On Tue, Dec 01, 2015 at 08:11:21PM -0300, Alvaro Herrera wrote: > > > Finally, I ran perltidy on all the files, which strangely changed stuff > > > that I didn't expect it to change. I wonder if this is related to the > > > perltidy version. > > > > The last pgindent run (commit 807b9e0) used perltidy v20090616, and perltidy > > behavior has changed slightly over time. Install that version to do your own > > perltidy runs. > > I tried that version, but it seems to emit the same. git checkout 807b9e0 (find src -name \*.pl -o -name \*.pm ) | sort -u | xargs perltidy --profile=src/tools/pgindent/perltidyrc perltidy v20090616 leaves the working directory clean, but perltidy v20150815 introduces diffs: src/backend/catalog/genbki.pl | 15 ++++++++-------src/bin/pg_basebackup/t/010_pg_basebackup.pl | 2 +-src/tools/msvc/Install.pm | 6 +++---src/tools/msvc/Mkvcbuild.pm | 2 +-src/tools/msvc/Project.pm | 2 +-src/tools/msvc/Solution.pm | 5 ++---src/tools/msvc/gendef.pl | 4 ++--7 files changed, 18 insertions(+), 18 deletions(-) You see a different result? > How did you figure > that that was the version used, anyway? I asked Bruce at one point.
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Fri, Dec 4, 2015 at 2:43 PM, Noah Misch <noah@leadboat.com> wrote: > On Wed, Dec 02, 2015 at 04:33:50PM -0300, Alvaro Herrera wrote: >> How did you figure >> that that was the version used, anyway? > > I asked Bruce at one point. So we are trying to use the same version over the years to keep code consistent across back-branches? Do you think we should try to use a newer version instead with each pgindent run? That would induce a rebasing cost when back-patching, but we cannot stay with the same version of perltidy forever either... -- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Noah Misch
Date:
On Fri, Dec 04, 2015 at 02:47:36PM +0900, Michael Paquier wrote: > On Fri, Dec 4, 2015 at 2:43 PM, Noah Misch <noah@leadboat.com> wrote: > > On Wed, Dec 02, 2015 at 04:33:50PM -0300, Alvaro Herrera wrote: > >> How did you figure > >> that that was the version used, anyway? > > > > I asked Bruce at one point. > > So we are trying to use the same version over the years to keep code > consistent across back-branches? No, I recall no policy discussion concerning this. > Do you think we should try to use a > newer version instead with each pgindent run? That would induce a > rebasing cost when back-patching, but we cannot stay with the same > version of perltidy forever either... No. I expect we'll implicitly change perltidy versions when Bruce upgrades his OS. Assuming future perltidy changes affect us not much more than past changes (six years of perltidy development changed eighteen PostgreSQL source lines), that's just fine.
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Michael Paquier wrote: > By the way, if there are no objections, I am going to mark this patch > as committed in the CF app. Putting in the infrastructure is already a > good step forward, and I will add an entry in next CF to track the > patch to add tests for recovery itself. Alvaro, what do you think? Feel free to do that, but I'm likely to look into it before the next CF anyway. The thread about this has been open since 2013, so that doesn't seem unfair. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Noah Misch wrote: > git checkout 807b9e0 > (find src -name \*.pl -o -name \*.pm ) | sort -u | xargs perltidy --profile=src/tools/pgindent/perltidyrc > > perltidy v20090616 leaves the working directory clean, but perltidy v20150815 > introduces diffs: > > src/backend/catalog/genbki.pl | 15 ++++++++------- > src/bin/pg_basebackup/t/010_pg_basebackup.pl | 2 +- > src/tools/msvc/Install.pm | 6 +++--- > src/tools/msvc/Mkvcbuild.pm | 2 +- > src/tools/msvc/Project.pm | 2 +- > src/tools/msvc/Solution.pm | 5 ++--- > src/tools/msvc/gendef.pl | 4 ++-- > 7 files changed, 18 insertions(+), 18 deletions(-) > > You see a different result? Oh, you're right -- on that commit, I get the same results as you with v20090616 (no changes). I don't have v20150815; the version packaged by Debian is v20140328, and Install.pm is not changed by that one (so I only get 15 lines changed, not 18). I think my confusion stems from code that was introduced after the last pgindent run. I guess I could have tidied the original files prior to patching. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Sat, Dec 5, 2015 at 1:11 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Michael Paquier wrote: > >> By the way, if there are no objections, I am going to mark this patch >> as committed in the CF app. Putting in the infrastructure is already a >> good step forward, and I will add an entry in next CF to track the >> patch to add tests for recovery itself. Alvaro, what do you think? > > Feel free to do that, but I'm likely to look into it before the next CF > anyway. The thread about this has been open since 2013, so that doesn't > seem unfair. Let's see then. For the time being I have marked this patch as committed, and created a new entry for the set of tests regarding standbys: https://commitfest.postgresql.org/8/438/ -- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes: > On 11/29/2015 04:28 PM, Noah Misch wrote: >> Never mutate the filesystem in a BEGIN block, because "perl -c" runs BEGIN >> blocks. (Likewise for the BEGIN block this patch adds to TestLib.) > Yeah, those two lines might belong in an INIT block. "perldoc perlmod" > for details. BTW, if we consider that gospel, why has PostgresNode.pm got this in its BEGIN block? # PGHOST is set once and for all through a single series of tests when# this module is loaded.$test_pghost = $TestLib::windows_os? "127.0.0.1" : TestLib::tempdir_short;$ENV{PGHOST} = $test_pghost; On non-Windows machines, the call of tempdir_short will result in filesystem activity, ie creation of a directory. Offhand it looks like all of the activity in this BEGIN block could go to an INIT block instead. I'm also bemused by why there was any question about this being wrong: # XXX: Should this use PG_VERSION_NUM?$last_port_assigned = 90600 % 16384 + 49152; If that's not a hard-coded PG version number then I don't know what it is. Maybe it would be better to use random() instead, but surely this isn't good as-is. regards, tom lane
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Noah Misch
Date:
On Wed, Dec 02, 2015 at 06:59:09PM -0300, Alvaro Herrera wrote: > It seemed better to me to have the import list be empty, i.e. "use > TestLib ()" and then qualify the routine names inside PostgresNode, > instead of having to list the names of the routines to import, so I > pushed it that way after running the tests a few more times. I inspected commit 1caef31: > --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl > +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl > -like($recovery_conf, qr/^standby_mode = 'on[']$/m, 'recovery.conf sets standby_mode'); > -like($recovery_conf, qr/^primary_conninfo = '.*port=$ENV{PGPORT}.*'$/m, 'recovery.conf sets primary_conninfo'); > - > -command_ok([ 'pg_basebackup', '-D', "$tempdir/backupxf", '-X', 'fetch' ], > +like( > + $recovery_conf, > + qr/^standby_mode = 'on[']$/m, > + 'recovery.conf sets standby_mode'); > +like( > + $recovery_conf, > + qr/^primary_conninfo = '.*port=$ENV{PGPORT}.*'$/m, > + 'recovery.conf sets primary_conninfo'); This now elicits a diagnostic: Use of uninitialized value $ENV{"PGPORT"} in regexp compilation at t/010_pg_basebackup.pl line 175, <> line 1. > --- a/src/bin/pg_controldata/t/001_pg_controldata.pl > +++ b/src/bin/pg_controldata/t/001_pg_controldata.pl > -standard_initdb "$tempdir/data"; > -command_like([ 'pg_controldata', "$tempdir/data" ], > + > +my $node = get_new_node(); > +$node->init; > +$node->start; Before the commit, this test did not start a server. > --- /dev/null > +++ b/src/test/perl/PostgresNode.pm > +sub _update_pid > +{ > + my $self = shift; > + > + # If we can open the PID file, read its first line and that's the PID we > + # want. If the file cannot be opened, presumably the server is not > + # running; don't be noisy in that case. > + open my $pidfile, $self->data_dir . "/postmaster.pid"; > + if (not defined $pidfile) $ grep -h 'at /.*line' src/bin/*/tmp_check/log/* |sort|uniq -c 1 cannot remove directory for /data/nmisch/src/pg/postgresql/src/bin/scripts/tmp_testNNCZ:Directory not empty at /home/nmisch/sw/cpan/lib/perl5/File/Temp.pmline 784 1 cannot remove directory for /data/nmisch/src/pg/postgresql/src/bin/scripts/tmp_testNNCZ/pgdata/base/13264:Directory not empty at /home/nmisch/sw/cpan/lib/perl5/File/Temp.pmline 784 1 cannot remove directory for /data/nmisch/src/pg/postgresql/src/bin/scripts/tmp_testNNCZ/pgdata/base:Directory not empty at /home/nmisch/sw/cpan/lib/perl5/File/Temp.pmline 784 1 cannot remove directory for /data/nmisch/src/pg/postgresql/src/bin/scripts/tmp_testNNCZ/pgdata:Directory not empty at /home/nmisch/sw/cpan/lib/perl5/File/Temp.pmline 784 28 readline() on closed filehandle $pidfile at /data/nmisch/src/pg/postgresql/src/bin/pg_rewind/../../../src/test/perl/PostgresNode.pmline 308. 28 Use of uninitializedvalue in concatenation (.) or string at /data/nmisch/src/pg/postgresql/src/bin/pg_rewind/../../../src/test/perl/PostgresNode.pmline 309. $pidfile is always defined. Test the open() return value. > + { > + $self->{_pid} = undef; > + print "# No postmaster PID\n"; > + return; > + } > + > + $self->{_pid} = <$pidfile>; chomp() that value to remove its trailing newline. > + print "# Postmaster PID is $self->{_pid}\n"; > + close $pidfile; > +} > + my $devnull = $TestLib::windows_os ? "nul" : "/dev/null"; Unused variable. > +sub DESTROY > +{ > + my $self = shift; > + return if not defined $self->{_pid}; > + print "# signalling QUIT to $self->{_pid}\n"; > + kill 'QUIT', $self->{_pid}; On Windows, that kill() is the wrong thing. I suspect "pg_ctl kill" will be the right thing, but that warrants specific testing. Postmaster log file names became less informative. Before the commit: -rw-------. 1 nmisch nmisch 211265 2015-12-06 22:35:59.931114215 +0000 regress_log_001_basic -rw-------. 1 nmisch nmisch 268887 2015-12-06 22:36:21.871165951 +0000 regress_log_002_databases -rw-------. 1 nmisch nmisch 206808 2015-12-06 22:36:41.861213736 +0000 regress_log_003_extrafiles -rw-------. 1 nmisch nmisch 7464 2015-12-06 22:37:00.371256795 +0000 master.log -rw-------. 1 nmisch nmisch 6648 2015-12-06 22:37:01.381259211 +0000 standby.log -rw-------. 1 nmisch nmisch 208561 2015-12-06 22:37:02.381261374 +0000 regress_log_004_pg_xlog_symlink After: -rw-------. 1 nmisch nmisch 219581 2015-12-06 23:00:50.504643175 +0000 regress_log_001_basic -rw-------. 1 nmisch nmisch 276315 2015-12-06 23:01:12.924697085 +0000 regress_log_002_databases -rw-------. 1 nmisch nmisch 213940 2015-12-06 23:01:33.574747195 +0000 regress_log_003_extrafiles -rw-------. 1 nmisch nmisch 4114 2015-12-06 23:01:40.914764850 +0000 node_57834.log -rw-------. 1 nmisch nmisch 6166 2015-12-06 23:01:43.184770615 +0000 node_57833.log -rw-------. 1 nmisch nmisch 5550 2015-12-06 23:01:52.504792997 +0000 node_57835.log -rw-------. 1 nmisch nmisch 9494 2015-12-06 23:01:53.514794802 +0000 node_57836.log -rw-------. 1 nmisch nmisch 216212 2015-12-06 23:01:54.544797739 +0000 regress_log_004_pg_xlog_symlink Should nodes get a name, so we instead see master_57834.log? See also the things Tom Lane identified in <27550.1449342569@sss.pgh.pa.us>. nm
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Mon, Dec 7, 2015 at 12:06 PM, Noah Misch <noah@leadboat.com> wrote: > On Wed, Dec 02, 2015 at 06:59:09PM -0300, Alvaro Herrera wrote: >> --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl >> +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl > >> -like($recovery_conf, qr/^standby_mode = 'on[']$/m, 'recovery.conf sets standby_mode'); >> -like($recovery_conf, qr/^primary_conninfo = '.*port=$ENV{PGPORT}.*'$/m, 'recovery.conf sets primary_conninfo'); >> - >> -command_ok([ 'pg_basebackup', '-D', "$tempdir/backupxf", '-X', 'fetch' ], >> +like( >> + $recovery_conf, >> + qr/^standby_mode = 'on[']$/m, >> + 'recovery.conf sets standby_mode'); >> +like( >> + $recovery_conf, >> + qr/^primary_conninfo = '.*port=$ENV{PGPORT}.*'$/m, >> + 'recovery.conf sets primary_conninfo'); > > This now elicits a diagnostic: > > Use of uninitialized value $ENV{"PGPORT"} in regexp compilation at t/010_pg_basebackup.pl line 175, <> line 1. Fixed. >> --- a/src/bin/pg_controldata/t/001_pg_controldata.pl >> +++ b/src/bin/pg_controldata/t/001_pg_controldata.pl > >> -standard_initdb "$tempdir/data"; >> -command_like([ 'pg_controldata', "$tempdir/data" ], >> + >> +my $node = get_new_node(); >> +$node->init; >> +$node->start; > > Before the commit, this test did not start a server. Fixed. >> --- /dev/null >> +++ b/src/test/perl/PostgresNode.pm > >> +sub _update_pid >> +{ >> + my $self = shift; >> + >> + # If we can open the PID file, read its first line and that's the PID we >> + # want. If the file cannot be opened, presumably the server is not >> + # running; don't be noisy in that case. >> + open my $pidfile, $self->data_dir . "/postmaster.pid"; >> + if (not defined $pidfile) > > $ grep -h 'at /.*line' src/bin/*/tmp_check/log/* |sort|uniq -c > 1 cannot remove directory for /data/nmisch/src/pg/postgresql/src/bin/scripts/tmp_testNNCZ: Directory not empty at/home/nmisch/sw/cpan/lib/perl5/File/Temp.pm line 784 > 1 cannot remove directory for /data/nmisch/src/pg/postgresql/src/bin/scripts/tmp_testNNCZ/pgdata/base/13264: Directorynot empty at /home/nmisch/sw/cpan/lib/perl5/File/Temp.pm line 784 > 1 cannot remove directory for /data/nmisch/src/pg/postgresql/src/bin/scripts/tmp_testNNCZ/pgdata/base: Directorynot empty at /home/nmisch/sw/cpan/lib/perl5/File/Temp.pm line 784 > 1 cannot remove directory for /data/nmisch/src/pg/postgresql/src/bin/scripts/tmp_testNNCZ/pgdata: Directory not emptyat /home/nmisch/sw/cpan/lib/perl5/File/Temp.pm line 784 > 28 readline() on closed filehandle $pidfile at /data/nmisch/src/pg/postgresql/src/bin/pg_rewind/../../../src/test/perl/PostgresNode.pmline 308. > 28 Use of uninitialized value in concatenation (.) or string at /data/nmisch/src/pg/postgresql/src/bin/pg_rewind/../../../src/test/perl/PostgresNode.pmline 309. > > $pidfile is always defined. Test the open() return value. That's something I have addressed here: http://www.postgresql.org/message-id/CAB7nPqTOP28Zxv_SXFo2axGJoesfvLLMO6syddAfV0DUvsFMDw@mail.gmail.com I am including the fix as well here to do a group shot. >> + { >> + $self->{_pid} = undef; >> + print "# No postmaster PID\n"; >> + return; >> + } >> + >> + $self->{_pid} = <$pidfile>; > > chomp() that value to remove its trailing newline. Right. >> + print "# Postmaster PID is $self->{_pid}\n"; >> + close $pidfile; >> +} > >> + my $devnull = $TestLib::windows_os ? "nul" : "/dev/null"; > > Unused variable. Removed. >> +sub DESTROY >> +{ >> + my $self = shift; >> + return if not defined $self->{_pid}; >> + print "# signalling QUIT to $self->{_pid}\n"; >> + kill 'QUIT', $self->{_pid}; > > On Windows, that kill() is the wrong thing. I suspect "pg_ctl kill" will be > the right thing, but that warrants specific testing. I don't directly see any limitation with the use of kill on Windows.. http://perldoc.perl.org/functions/kill.html But indeed using directly pg_ctl kill seems like a better fit for the PG infrastructure. > Postmaster log file names became less informative. Before the commit: > Should nodes get a name, so we instead see master_57834.log? I am not sure that this is necessary. There is definitely a limitation regarding the fact that log files of nodes get overwritten after each test, hence I would tend with just appending the test name in front of the node_* files similarly to the other files. > See also the things Tom Lane identified in <27550.1449342569@sss.pgh.pa.us>. Yep, I marked this email as something to address when it was sent. On Sun, Dec 6, 2015 at 4:09 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > BTW, if we consider that gospel, why has PostgresNode.pm got this in its > BEGIN block? > > # PGHOST is set once and for all through a single series of tests when > # this module is loaded. > $test_pghost = > $TestLib::windows_os ? "127.0.0.1" : TestLib::tempdir_short; > $ENV{PGHOST} = $test_pghost; > > On non-Windows machines, the call of tempdir_short will result in > filesystem activity, ie creation of a directory. Offhand it looks like > all of the activity in this BEGIN block could go to an INIT block instead. OK, the whole block is switched to INIT instead. > I'm also bemused by why there was any question about this being wrong: > > # XXX: Should this use PG_VERSION_NUM? > $last_port_assigned = 90600 % 16384 + 49152; > If that's not a hard-coded PG version number then I don't know > what it is. Maybe it would be better to use random() instead, > but surely this isn't good as-is. We would definitely want something within the ephemeral port range, so we are up to that: rand() * 16384 + 49152; All those issues are addressed as per the patch attached. Regards, -- Michael
Attachment
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Michael Paquier wrote: > > If that's not a hard-coded PG version number then I don't know > > what it is. Maybe it would be better to use random() instead, > > but surely this isn't good as-is. > > We would definitely want something within the ephemeral port range, so > we are up to that: > rand() * 16384 + 49152; Yes, this seems to produce the correct range. Thanks Noah and Tom for the review, and thanks Michael for the patch. I pushed it. A slight fix was to change the chomp() call; it was always returning 1 (number of elements chomped) so it tried to kill init. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Michael Paquier wrote: > On Thu, Dec 3, 2015 at 6:59 AM, Alvaro Herrera wrote: > > I didn't push the changed for config_default you requested a few > > messages upthread; it's not clear to me how setting it to undef affects > > the whole thing. If setting it to undef makes the MSVC toolchain run > > the tap tests in the default config, then I can do it; let's be clear > > about what branch to backpatch this to. Also the "1;" at the end of > > RewindTest. > > Setting it to undef will prevent the tests to run, per vcregress.pl: > die "Tap tests not enabled in configuration" > unless $config->{tap_tests}; > Also, setting it to undef will match the existing behavior on > platforms where ./configure is used because the switch > --enable-tap-tests needs to be used there. And I would believe that in > most cases Windows environments are not going to have IPC::Run > deployed. But if I don't set it to anything, then it will be "initialized" as undef, so it has the same effect. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Michael Paquier wrote: >> We would definitely want something within the ephemeral port range, so >> we are up to that: >> rand() * 16384 + 49152; > Yes, this seems to produce the correct range. Speaking of ephemeral port range ... if get_new_node() were to run past port 65535, which is certainly possible with this new code, it would fail altogether. Seems it needs to wrap around properly within that port range. regards, tom lane
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Michael Paquier wrote: > >> We would definitely want something within the ephemeral port range, so > >> we are up to that: > >> rand() * 16384 + 49152; > > > Yes, this seems to produce the correct range. > > Speaking of ephemeral port range ... if get_new_node() were to run > past port 65535, which is certainly possible with this new code, > it would fail altogether. Seems it needs to wrap around properly > within that port range. Oh, of course. Pushed fix. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Tue, Dec 8, 2015 at 7:46 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Michael Paquier wrote:
> On Thu, Dec 3, 2015 at 6:59 AM, Alvaro Herrera wrote:
> > I didn't push the changed for config_default you requested a few
> > messages upthread; it's not clear to me how setting it to undef affects
> > the whole thing. If setting it to undef makes the MSVC toolchain run
> > the tap tests in the default config, then I can do it; let's be clear
> > about what branch to backpatch this to. Also the "1;" at the end of
> > RewindTest.
>
> Setting it to undef will prevent the tests to run, per vcregress.pl:
> die "Tap tests not enabled in configuration"
> unless $config->{tap_tests};
> Also, setting it to undef will match the existing behavior on
> platforms where ./configure is used because the switch
> --enable-tap-tests needs to be used there. And I would believe that in
> most cases Windows environments are not going to have IPC::Run
> deployed.
But if I don't set it to anything, then it will be "initialized" as
undef, so it has the same effect.
Yes, it will have the same effect. Still it is a problem to not list it in default_config.pl as the other options, no? And that's as well the case with GetFakeConfigure, which should list it I think for consistency with the rest. See the attached for example.
--
Michael
Attachment
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
I've been giving RecoveryTest.pm a look. I wonder if we really need that as a separate package. My first thought was that we could have another class that inherits from PostgresNode (say RecoveryNode). But later it occured to me that we could have the new functions just be part of PostgresNode itself directly; so we would have some new PostgresNode methods:$node->enable_streaming$node->enable_restoring$node->enable_archiving$node->wait (your RecoveryTest::wait_for_node;better name for this?) and some additional constructors:make_mastermake_stream_standbymake_archive_standby -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Noah Misch
Date:
On Mon, Dec 07, 2015 at 02:34:39PM +0900, Michael Paquier wrote: > On Mon, Dec 7, 2015 at 12:06 PM, Noah Misch <noah@leadboat.com> wrote: > > On Wed, Dec 02, 2015 at 06:59:09PM -0300, Alvaro Herrera wrote: > >> +sub DESTROY > >> +{ > >> + my $self = shift; > >> + return if not defined $self->{_pid}; > >> + print "# signalling QUIT to $self->{_pid}\n"; > >> + kill 'QUIT', $self->{_pid}; > > > > On Windows, that kill() is the wrong thing. I suspect "pg_ctl kill" will be > > the right thing, but that warrants specific testing. > > I don't directly see any limitation with the use of kill on Windows.. > http://perldoc.perl.org/functions/kill.html > But indeed using directly pg_ctl kill seems like a better fit for the > PG infrastructure. From http://perldoc.perl.org/perlwin32.html, "Using signals under this port should currently be considered unsupported." Windows applications cannot handle SIGQUIT: https://msdn.microsoft.com/en-us/library/xdkz3x12.aspx. The PostgreSQL backend does not generate or expect Windows signals; see its signal.c emulation facility. > > Postmaster log file names became less informative. Before the commit: > > Should nodes get a name, so we instead see master_57834.log? > > I am not sure that this is necessary. In general, you'd need to cross-reference the main log file to determine which postmaster log corresponds to which action within the test. I did plenty of "grep $PATTERN src/bin/pg_rewind/tmp_check/log/master.log" while debugging that test. I'd like to be able to use /*master*.log, not rely on timestamps or on scraping regress_log_002_databases to determine which logs are master logs. Feel free to skip this point if I'm the only one minding, though. > There is definitely a limitation > regarding the fact that log files of nodes get overwritten after each > test, hence I would tend with just appending the test name in front of > the node_* files similarly to the other files. They got appended, not overwritten. I like how you changed that to not happen, but it doesn't address what I was reporting. nm
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Thu, Dec 10, 2015 at 6:46 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I've been giving RecoveryTest.pm a look. I wonder if we really need that > as a separate package. My first thought was that we could have another > class that inherits from PostgresNode (say RecoveryNode). But later it > occured to me that we could have the new functions just be part of > PostgresNode itself directly; so we would have some new PostgresNode > methods: > $node->enable_streaming > $node->enable_restoring > $node->enable_archiving Sure. > $node->wait (your RecoveryTest::wait_for_node; better name for this?) wait_for_access? > and some additional constructors: > make_master > make_stream_standby > make_archive_standby I have done that a little bit differently. Those are completely remove, then init() and init_from_backup() are extended with a new set of parameters to enable archive, streaming or restore on a node. Which gives the patch attached. -- Michael
Attachment
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Noah Misch wrote: > On Mon, Dec 07, 2015 at 02:34:39PM +0900, Michael Paquier wrote: > > I don't directly see any limitation with the use of kill on Windows.. > > http://perldoc.perl.org/functions/kill.html > > But indeed using directly pg_ctl kill seems like a better fit for the > > PG infrastructure. > > From http://perldoc.perl.org/perlwin32.html, "Using signals under this port > should currently be considered unsupported." Windows applications cannot > handle SIGQUIT: https://msdn.microsoft.com/en-us/library/xdkz3x12.aspx. The > PostgreSQL backend does not generate or expect Windows signals; see its > signal.c emulation facility. Makes sense. What we're doing now is what you suggested, so we should be fine. > > > Postmaster log file names became less informative. Before the commit: > > > Should nodes get a name, so we instead see master_57834.log? > > > > I am not sure that this is necessary. > > In general, you'd need to cross-reference the main log file to determine which > postmaster log corresponds to which action within the test. I did plenty of > "grep $PATTERN src/bin/pg_rewind/tmp_check/log/master.log" while debugging > that test. I'd like to be able to use /*master*.log, not rely on timestamps > or on scraping regress_log_002_databases to determine which logs are master > logs. Feel free to skip this point if I'm the only one minding, though. Since we now have the node name in the log file name, perhaps we no longer need the port number in there. In fact, I find having the file name change on every run (based on the port number) is slightly annoying. I vote we change it back to using the node name without the port number. (Also, some PostgresNode messages refer to the instance by datadir and port number, which is unnecessary: it would be clearer to use the name instead.) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Fri, Dec 11, 2015 at 5:35 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Noah Misch wrote: >> On Mon, Dec 07, 2015 at 02:34:39PM +0900, Michael Paquier wrote: >> > > Postmaster log file names became less informative. Before the commit: >> > > Should nodes get a name, so we instead see master_57834.log? >> > >> > I am not sure that this is necessary. >> >> In general, you'd need to cross-reference the main log file to determine which >> postmaster log corresponds to which action within the test. I did plenty of >> "grep $PATTERN src/bin/pg_rewind/tmp_check/log/master.log" while debugging >> that test. I'd like to be able to use /*master*.log, not rely on timestamps >> or on scraping regress_log_002_databases to determine which logs are master >> logs. Feel free to skip this point if I'm the only one minding, though. > > Since we now have the node name in the log file name, perhaps we no > longer need the port number in there There is no node name in the log file name as of now, they are built using the port number, and the information of a node is dumped into the central log file when created (see dump_info). > In fact, I find having the file > name change on every run (based on the port number) is slightly > annoying. I vote we change it back to using the node name without the > port number. (Also, some PostgresNode messages refer to the instance by > datadir and port number, which is unnecessary: it would be clearer to > use the name instead.) OK, so... What we have now as log file for a specific node is that: ${testname}_node_${port}.log which is equivalent to that: ${testname}_${applname}.log I guess that to complete your idea we could allow PostgresNode to get a custom name for its log file through an optional parameter like logfile => 'myname' or similar. And if nothing is defined, process falls back to applname. So this would give the following: ${testname}_${logfile}.log It seems that we had better keep the test name as a prefix of the log file name though, to avoid an overlap with any other test in the same series. Thoughts? -- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Michael Paquier wrote: > On Fri, Dec 11, 2015 at 5:35 AM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > Since we now have the node name in the log file name, perhaps we no > > longer need the port number in there > > There is no node name in the log file name as of now, they are built > using the port number, and the information of a node is dumped into > the central log file when created (see dump_info). Yeah, I realized this after posting. What I thought was the node name, based on some of the files I had laying around, was actually the test name. > I guess that to complete your idea we could allow PostgresNode to get > a custom name for its log file through an optional parameter like > logfile => 'myname' or similar. And if nothing is defined, process > falls back to applname. So this would give the following: > ${testname}_${logfile}.log Sure. I don't think we should the name only for the log file, though, but also for things like the "## " informative messages we print here and there. That would make the log file simpler to follow. Also, I'm not sure about having it be optional. (TBH I'm not sure about applname either; why do we keep that one?) > It seems that we had better keep the test name as a prefix of the log > file name though, to avoid an overlap with any other test in the same > series. Thoughts? Yes, agreed on that. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Fri, Dec 11, 2015 at 8:48 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Michael Paquier wrote: >> On Fri, Dec 11, 2015 at 5:35 AM, Alvaro Herrera >> <alvherre@2ndquadrant.com> wrote: >> I guess that to complete your idea we could allow PostgresNode to get >> a custom name for its log file through an optional parameter like >> logfile => 'myname' or similar. And if nothing is defined, process >> falls back to applname. So this would give the following: >> ${testname}_${logfile}.log > > Sure. I don't think we should the name only for the log file, though, > but also for things like the "## " informative messages we print here > and there. That would make the log file simpler to follow. Also, I'm > not sure about having it be optional. (TBH I'm not sure about applname > either; why do we keep that one?) OK, so let's do this: the node name is a mandatory argument of get_new_node, which is passed to "new PostgresNode" like the port and the host, and it is then used in the log file name as well as in the information messages you are mentioning. That's a patch simple enough. Are you fine with this approach? Regarding the application name, I still think it is useful to have it though. pg_rewind should actually use it, and the other patch adding the recovery routines will use it. -- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Noah Misch
Date:
On Fri, Dec 11, 2015 at 09:34:34PM +0900, Michael Paquier wrote: > On Fri, Dec 11, 2015 at 8:48 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > Michael Paquier wrote: > >> On Fri, Dec 11, 2015 at 5:35 AM, Alvaro Herrera > >> <alvherre@2ndquadrant.com> wrote: > >> I guess that to complete your idea we could allow PostgresNode to get > >> a custom name for its log file through an optional parameter like > >> logfile => 'myname' or similar. And if nothing is defined, process > >> falls back to applname. So this would give the following: > >> ${testname}_${logfile}.log > > > > Sure. I don't think we should the name only for the log file, though, > > but also for things like the "## " informative messages we print here > > and there. That would make the log file simpler to follow. Also, I'm > > not sure about having it be optional. (TBH I'm not sure about applname > > either; why do we keep that one?) > > OK, so let's do this: the node name is a mandatory argument of > get_new_node, which is passed to "new PostgresNode" like the port and > the host, and it is then used in the log file name as well as in the > information messages you are mentioning. That's a patch simple enough. > Are you fine with this approach? Sounds reasonable so far. > Regarding the application name, I still think it is useful to have it > though. pg_rewind should actually use it, and the other patch adding > the recovery routines will use it. Using the application_name connection parameter is fine, but I can't think of a reason to set it to "node_".$node->port instead of $node->name. And I can't think of a use for the $node->applname field once you have $node->name. What use case would benefit?
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Sat, Dec 12, 2015 at 11:37 AM, Noah Misch <noah@leadboat.com> wrote: > On Fri, Dec 11, 2015 at 09:34:34PM +0900, Michael Paquier wrote: >> On Fri, Dec 11, 2015 at 8:48 PM, Alvaro Herrera >> <alvherre@2ndquadrant.com> wrote: >> > Michael Paquier wrote: >> >> On Fri, Dec 11, 2015 at 5:35 AM, Alvaro Herrera >> >> <alvherre@2ndquadrant.com> wrote: >> >> I guess that to complete your idea we could allow PostgresNode to get >> >> a custom name for its log file through an optional parameter like >> >> logfile => 'myname' or similar. And if nothing is defined, process >> >> falls back to applname. So this would give the following: >> >> ${testname}_${logfile}.log >> > >> > Sure. I don't think we should the name only for the log file, though, >> > but also for things like the "## " informative messages we print here >> > and there. That would make the log file simpler to follow. Also, I'm >> > not sure about having it be optional. (TBH I'm not sure about applname >> > either; why do we keep that one?) >> >> OK, so let's do this: the node name is a mandatory argument of >> get_new_node, which is passed to "new PostgresNode" like the port and >> the host, and it is then used in the log file name as well as in the >> information messages you are mentioning. That's a patch simple enough. >> Are you fine with this approach? > > Sounds reasonable so far. OK, done so. >> Regarding the application name, I still think it is useful to have it >> though. pg_rewind should actually use it, and the other patch adding >> the recovery routines will use it. > > Using the application_name connection parameter is fine, but I can't think of > a reason to set it to "node_".$node->port instead of $node->name. And I can't > think of a use for the $node->applname field once you have $node->name. What > use case would benefit? I have the applname stuff, and updated the log messages to use the node name for clarity. The patch to address those points is attached. Regards, -- Michael
Attachment
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Sat, Dec 12, 2015 at 8:29 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Sat, Dec 12, 2015 at 11:37 AM, Noah Misch <noah@leadboat.com> wrote: >> On Fri, Dec 11, 2015 at 09:34:34PM +0900, Michael Paquier wrote: >>> On Fri, Dec 11, 2015 at 8:48 PM, Alvaro Herrera >>> <alvherre@2ndquadrant.com> wrote: >>> > Michael Paquier wrote: >>> >> On Fri, Dec 11, 2015 at 5:35 AM, Alvaro Herrera >>> >> <alvherre@2ndquadrant.com> wrote: >>> >> I guess that to complete your idea we could allow PostgresNode to get >>> >> a custom name for its log file through an optional parameter like >>> >> logfile => 'myname' or similar. And if nothing is defined, process >>> >> falls back to applname. So this would give the following: >>> >> ${testname}_${logfile}.log >>> > >>> > Sure. I don't think we should the name only for the log file, though, >>> > but also for things like the "## " informative messages we print here >>> > and there. That would make the log file simpler to follow. Also, I'm >>> > not sure about having it be optional. (TBH I'm not sure about applname >>> > either; why do we keep that one?) >>> >>> OK, so let's do this: the node name is a mandatory argument of >>> get_new_node, which is passed to "new PostgresNode" like the port and >>> the host, and it is then used in the log file name as well as in the >>> information messages you are mentioning. That's a patch simple enough. >>> Are you fine with this approach? >> >> Sounds reasonable so far. > > OK, done so. > >>> Regarding the application name, I still think it is useful to have it >>> though. pg_rewind should actually use it, and the other patch adding >>> the recovery routines will use it. >> >> Using the application_name connection parameter is fine, but I can't think of >> a reason to set it to "node_".$node->port instead of $node->name. And I can't >> think of a use for the $node->applname field once you have $node->name. What >> use case would benefit? > > I have the applname stuff, and updated the log messages to use the > node name for clarity. > > The patch to address those points is attached. As this thread is stalling a bit, please find attached a series of patch gathering all the pending issues for this thread: - 0001, fix config_default.pl for MSVC builds to take into account TAP tests - 0002, append a node name in get_new_node (per Noah's request) - 0003, the actual recovery test suite Hopefully this facilitates future reviews. Regards, -- Michael
Attachment
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Mon, Dec 21, 2015 at 4:45 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > As this thread is stalling a bit, please find attached a series of > patch gathering all the pending issues for this thread: > - 0001, fix config_default.pl for MSVC builds to take into account TAP tests > - 0002, append a node name in get_new_node (per Noah's request) > - 0003, the actual recovery test suite > Hopefully this facilitates future reviews. Patch 2 has been pushed as c8642d9 (thanks Alvaro). The remaining two patches still apply and pass cleanly. -- Michael
Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Stas Kelvich
Date:
Hi. I’ve looked over proposed patch and migrated my shell tests scripts that i’ve used for testing twophase commits on master/slaveto this test framework. Everything looks mature, and I didn’t encountered any problems with writing new testsusing this infrastructure. From my point of view I don’t see any problems to commit this patches in their current state. Also some things that came into mind about test suite: 0) There are several routines that does actual checking, like is/command_ok/command_fails. I think it will be very handyto have wrappers psql_ok/psql_fails that calls psql through the command_ok/fails. 1) Better to raise more meaningful error when IPC::Run is absend. 2) --enable-tap-tests deserves mention in test/recovery/README and more obvious error message when one trying to run makecheck in test/recovery without --enable-tap-tests. 3) Is it hard to give ability to run TAP tests in extensions? 4) It will be handy if make check will write path to log files in case of failed test. 5) psql() accepts database name as a first argument, but everywhere in tests it is ‘postgres’. Isn’t it simpler to storedbname in connstr, and have separate function to change database? 6) Clean logs on prove restart? Clean up tmp installations? 7) Make check sets PGPORT PG_REGRESS for prove. Is it necessary? > On 22 Jan 2016, at 09:17, Michael Paquier <michael.paquier@gmail.com> wrote: > > On Mon, Dec 21, 2015 at 4:45 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> As this thread is stalling a bit, please find attached a series of >> patch gathering all the pending issues for this thread: >> - 0001, fix config_default.pl for MSVC builds to take into account TAP tests >> - 0002, append a node name in get_new_node (per Noah's request) >> - 0003, the actual recovery test suite >> Hopefully this facilitates future reviews. > > Patch 2 has been pushed as c8642d9 (thanks Alvaro). The remaining two > patches still apply and pass cleanly. > -- > Michael > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Kyotaro HORIGUCHI
Date:
Hello, I'm studying this. Two hunks in 0003 needed a fix but the other part applied cleanly on master. At Fri, 22 Jan 2016 15:17:51 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqTTAtVCEXAoyMtF4Xu9g=mXY4cjnP=+hy7jgYfnFzM=JA@mail.gmail.com> > On Mon, Dec 21, 2015 at 4:45 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: > > As this thread is stalling a bit, please find attached a series of > > patch gathering all the pending issues for this thread: > > - 0001, fix config_default.pl for MSVC builds to take into account TAP tests > > - 0002, append a node name in get_new_node (per Noah's request) > > - 0003, the actual recovery test suite > > Hopefully this facilitates future reviews. > > Patch 2 has been pushed as c8642d9 (thanks Alvaro). The remaining two > patches still apply and pass cleanly. The TAP test framework doesn't remove existing temporary directories when a test script suite (or a prove?) starts, and it in turn removes all temprorary directories it has made even if ended with fairures. It would be sometimes inconvenient to find the cause of the failures and inconsistent with the behavior of the ordinary(?) make check, as far as my understanding goes. tmp_check is left remained but it would be ok to preserve logs, which is located in tmp_check differently than the ordinary regressions. One annoyance is the name of data directories is totally meaningless. We cannot investigate them even if it is left behind. Addition to them, maybe it is useful that a test script can get stderr content from PostgresNode->psql(). Setting client_min_messages lower can give a plenty of useful information about how server is working. So, I'd like to propose four (or five) changes to this harness. - prove_check to remove all in tmp_check - TestLib to preserve temporary directories/files if the current test fails. - PostgresNode::get_new_node to create data directory with meaningful basenames. - PostgresNode::psql to return a list of ($stdout, $stderr) if requested. (The previous behavior is not changed) - (recovery/t/00x_* gives test number to node name) As a POC, the attached diff will appliy on the 0001 and (fixed) 0003 patches. It might be good to give test number to the name of temp dirs by any automated way, but it is not included in it. Opinions? Thoughts? regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index 4602e3e..75ddcaf 100644 --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -51,7 +51,7 @@ sub new my $self = { _port => $pgport, _host => $pghost, - _basedir => TestLib::tempdir, + _basedir => TestLib::tempdir($name), _name => $name, _logfile => "$TestLib::log_path/${testname}_${name}.log"}; @@ -513,10 +513,16 @@ sub psql print "#### Begin standard error\n"; print $stderr; print "#### End standarderror\n"; + if (wantarray) + { + chomp $stderr; + $stderr =~ s/\r//g if $Config{osname} eq 'msys'; + } } chomp $stdout; $stdout =~ s/\r//g if $Config{osname} eq 'msys'; - return $stdout; + + return wantarray ? ($stdout, $stderr) : $stdout;}# Run a query once a second, until it returns 't' (i.e. SQL booleantrue). diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm index 3d11cbb..841dc06 100644 --- a/src/test/perl/TestLib.pm +++ b/src/test/perl/TestLib.pm @@ -107,13 +107,24 @@ INIT autoflush TESTLOG 1;} +END +{ + # Preserve temporary directory for this test if failure + if (!Test::More->builder->is_passing) + { + $File::Temp::KEEP_ALL = 1; + } +} +## Helper functions#sub tempdir{ + my ($prefix) = @_; + $prefix = "tmp_test" if (!$prefix); return File::Temp::tempdir( - 'tmp_testXXXX', + $prefix.'_XXXX', DIR => $tmp_check, CLEANUP => 1);} diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl index 3ed9be3..8a00d00 100644 --- a/src/test/recovery/t/001_stream_rep.pl +++ b/src/test/recovery/t/001_stream_rep.pl @@ -6,16 +6,18 @@ use TestLib;use Test::More tests => 4;# Initialize master node -my $node_master = get_new_node('master'); +my $node_master = get_new_node('001_master');$node_master->init(allows_streaming => 1);$node_master->start;my $backup_name= 'my_backup'; +$File::Temp::KEEP_ALL = 0; +# Take backup$node_master->backup($backup_name);# Create streaming standby linking to master -my $node_standby_1 = get_new_node('standby_1'); +my $node_standby_1 = get_new_node('001_standby_1');$node_standby_1->init_from_backup($node_master, $backup_name, has_streaming => 1);$node_standby_1->start; @@ -25,7 +27,7 @@ $node_standby_1->start;$node_standby_1->backup($backup_name);# Create second standby node linking to standby1 -my $node_standby_2 = get_new_node('standby_2'); +my $node_standby_2 = get_new_node('001_standby_2');$node_standby_2->init_from_backup($node_standby_1, $backup_name, has_streaming => 1);$node_standby_2->start; @@ -43,11 +45,11 @@ $caughtup_query = "SELECT pg_last_xlog_replay_location() = write_location FROM p$node_standby_1->poll_query_until('postgres',$caughtup_query) or die "Timed out while waiting for standby 2 to catchup"; -my $result = $node_standby_1->psql('postgres', "SELECT count(*) FROM tab_int"); +my $result = $node_standby_1->psql('postgres', "set client_min_messages to debug5;SELECT count(*) FROM tab_int");print "standby1: $result\n"; -is($result, qq(1002), 'check streamed content on standby 1'); +isnt($result[0], qq(1002), 'check streamed content on standby 1'); -$result = $node_standby_2->psql('postgres', "SELECT count(*) FROM tab_int"); +my $result = $node_standby_2->psql('postgres', "SELECT count(*) FROM tab_int");print "standby 2: $result\n";is($result,qq(1002), 'check streamed content on standby 2'); diff --git a/src/test/recovery/t/002_archiving.pl b/src/test/recovery/t/002_archiving.pl index 930125c..bb620c6 100644 --- a/src/test/recovery/t/002_archiving.pl +++ b/src/test/recovery/t/002_archiving.pl @@ -7,7 +7,7 @@ use Test::More tests => 1;use File::Copy;# Initialize master node, doing archives -my $node_master = get_new_node('master'); +my $node_master = get_new_node('002_master');$node_master->init(has_archiving => 1, allows_streaming =>1);my $backup_name = 'my_backup'; @@ -19,7 +19,7 @@ $node_master->start;$node_master->backup($backup_name);# Initialize standby node from backup, fetchingWAL from archives -my $node_standby = get_new_node('standby'); +my $node_standby = get_new_node('002_standby');$node_standby->init_from_backup($node_master, $backup_name, has_restoring => 1);$node_standby->append_conf('postgresql.conf', qq( diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl index 293603a..4d59277 100644 --- a/src/test/recovery/t/003_recovery_targets.pl +++ b/src/test/recovery/t/003_recovery_targets.pl @@ -16,7 +16,7 @@ sub test_recovery_standby my $num_rows = shift; my $until_lsn = shift; - my $node_standby = get_new_node($node_name); + my $node_standby = get_new_node("003_".$node_name); $node_standby->init_from_backup($node_master, 'my_backup', has_restoring => 1); @@ -43,7 +43,7 @@ sub test_recovery_standby}# Initialize master node -my $node_master = get_new_node('master'); +my $node_master = get_new_node('003_master');$node_master->init(has_archiving => 1, allows_streaming => 1);# Start it diff --git a/src/test/recovery/t/004_timeline_switch.pl b/src/test/recovery/t/004_timeline_switch.pl index c58c602..0a2362b 100644 --- a/src/test/recovery/t/004_timeline_switch.pl +++ b/src/test/recovery/t/004_timeline_switch.pl @@ -11,7 +11,7 @@ use Test::More tests => 1;$ENV{PGDATABASE} = 'postgres';# Initialize master node -my $node_master = get_new_node('master'); +my $node_master = get_new_node('004_master');$node_master->init(allows_streaming => 1);$node_master->start; @@ -20,11 +20,11 @@ my $backup_name = 'my_backup';$node_master->backup($backup_name);# Create two standbys linking to it -my $node_standby_1 = get_new_node('standby_1'); +my $node_standby_1 = get_new_node('004_standby_1');$node_standby_1->init_from_backup($node_master, $backup_name, has_streaming => 1);$node_standby_1->start; -my $node_standby_2 = get_new_node('standby_2'); +my $node_standby_2 = get_new_node('004_standby_2');$node_standby_2->init_from_backup($node_master, $backup_name, has_streaming => 1);$node_standby_2->start; diff --git a/src/test/recovery/t/005_replay_delay.pl b/src/test/recovery/t/005_replay_delay.pl index 14d9b29..ae209e4 100644 --- a/src/test/recovery/t/005_replay_delay.pl +++ b/src/test/recovery/t/005_replay_delay.pl @@ -6,7 +6,7 @@ use TestLib;use Test::More tests => 2;# Initialize master node -my $node_master = get_new_node(); +my $node_master = get_new_node('005_master');$node_master->init(allows_streaming => 1);$node_master->start; @@ -18,7 +18,7 @@ my $backup_name = 'my_backup';$node_master->backup($backup_name);# Create streaming standby from backup -my $node_standby = get_new_node(); +my $node_standby = get_new_node('005_standby');$node_standby->init_from_backup($node_master, $backup_name, has_streaming => 1);$node_standby->append_conf('recovery.conf', qq(
Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Thu, Feb 4, 2016 at 5:23 PM, Stas Kelvich <s.kelvich@postgrespro.ru> wrote: (Please do not top-post, this breaks the thread flow.) > I’ve looked over proposed patch and migrated my shell tests scripts that i’ve used for testing twophase commits on master/slaveto this test framework. Everything looks mature, and I didn’t encountered any problems with writing new testsusing this infrastructure. > From my point of view I don’t see any problems to commit this patches in their current state. Thanks for the review! > 0) There are several routines that does actual checking, like is/command_ok/command_fails. I think it will be very handyto have wrappers psql_ok/psql_fails that calls psql through the command_ok/fails. Do you have a test case in mind for it? > 1) Better to raise more meaningful error when IPC::Run is absent. This has been discussed before, and as far as I recall the current behavior has been concluded as being fine. That's where --enable-tap-tests becomes meaningful. > 2) --enable-tap-tests deserves mention in test/recovery/README and more obvious error message when one trying to run makecheck in test/recovery without --enable-tap-tests. When running without --enable-tap-tests from src/test/recovery you would get the following error per how prove_check is defined: "TAP tests not enabled" > 3) Is it hard to give ability to run TAP tests in extensions? Not really. You would need to enforce a check rule or similar. For the recovery test suite I have mapped the check rule with prove_check. > 4) It will be handy if make check will write path to log files in case of failed test. Hm, perhaps. The log files are hardcoded in log/, so it is not like we don't know it. That's an argument for the main TAP suite though, not really this series of patch. > 5) psql() accepts database name as a first argument, but everywhere in tests it is ‘postgres’. Isn’t it simpler to storedbname in connstr, and have separate function to change database? > 6) Clean logs on prove restart? Clean up tmp installations? Those are issues proper to the main TAP infrastructure, though I agree that we could improve things here, particularly for temporary installations that get automatically... Hm... Cleaned up should a test failure happen? > 7) Make check sets PGPORT PG_REGRESS for prove. Is it necessary? No, that's not needed (I think I noticed that at some point) and that's a bug. We could live without setting it. -- Michael
Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Victor Wagner
Date:
On Thu, 4 Feb 2016 12:59:03 +0300 Michael Paquier <michael.paquier@gmail.com> wrote: for it? > > > 1) Better to raise more meaningful error when IPC::Run is absent. > > This has been discussed before, and as far as I recall the current > behavior has been concluded as being fine. That's where > --enable-tap-tests becomes meaningful. Really, it is not so hard to add configure checks for perl modules. And we need to test not only for IPC::Run, but for Test::More too, because some Linux distributions put modules which come with perl into separate package. The only problem that most m4 files with tests for perl modules, which can be found in the Internet, have GPL license, so someone have either to write his own and publish under PostgreSQL license or contact author of one of them and ask to publish it under PostgreSQL license. First seems to be much easier.
Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Thu, Feb 4, 2016 at 4:43 PM, Victor Wagner <vitus@wagner.pp.ru> wrote: > On Thu, 4 Feb 2016 12:59:03 +0300 > Michael Paquier <michael.paquier@gmail.com> wrote: >> > 1) Better to raise more meaningful error when IPC::Run is absent. >> >> This has been discussed before, and as far as I recall the current >> behavior has been concluded as being fine. That's where >> --enable-tap-tests becomes meaningful. > > Really, it is not so hard to add configure checks for perl modules. > And we need to test not only for IPC::Run, but for Test::More too, > because some Linux distributions put modules which come with perl into > separate package. The last time we discussed about that on this list we concluded that it was not really necessary to have such checks, for one it makes the code more simple, and because this is leveraged by the presence of --enable-tap-tests, tests which can get actually costly with check-world. But this is digressing the subject of this thread, which deals with the fact of having recovery tests integrated in core... -- Michael
Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Victor Wagner
Date:
On Thu, 4 Feb 2016 18:33:27 +0300 Michael Paquier <michael.paquier@gmail.com> wrote: > > Really, it is not so hard to add configure checks for perl modules. > > And we need to test not only for IPC::Run, but for Test::More too, > > because some Linux distributions put modules which come with perl > > into separate package. > > The last time we discussed about that on this list we concluded that > it was not really necessary to have such checks, for one it makes the > code more simple, and because this is leveraged by the presence of > --enable-tap-tests, tests which can get actually costly with > check-world. But this is digressing the subject of this thread, which > deals with the fact of having recovery tests integrated in core... Of course, such configure tests should be run only if --enable-tap-tests is passed to the configure script It would look like if test "$enable_tap_tests" = "yes"; then AX_PROG_PERL_MODULES( Test::More, , AC_MSG_ERROR([Test::More is necessary to runTAP Tests]) AX_PROG_PERL_MODULES( IPC::Run, , AC_MSG_ERROR([IPC::Run is necessary to run TAP Tests]) fi in the configure.in May be it is not strictly necessary, but it is really useful to see such problems as clear error message during configure stage, rather than successfully configure, compile, run tests and only then find out, that something is forgotten. I don't see why having such tests in the configure.in, makes code more complex. It just prevents configure to finish successfully if --enable-tap-tests is specified and required modules are not available. -- Victor Wagner <vitus@wagner.pp.ru>
Re: [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Victor Wagner
Date:
This patch adds a long-awaited functionality to the PostgreSQL test suite - testing of cluster configuration. It contains bare minimum of replication and recovery test, but it should be a good starting point for other people. Really, adding a much more tests for replication and recovery is problematic, because these tests are resource-hungry, and take big enough time even on relatively powerful machines, but it seems to be necessary, because they need to create several temporary installation. So, set of tests, included into this patch is reasonably good choice. I think that readability of tests can be improved a bit, because these tests would serve as an example for all tap test writers. It's quite good that patch sets standard of using 'use strict; use warnings;' in the test script. It is bad, that Postgres-specific perl modules do not have embedded documentation. It would be nice to see POD documentation in the TestLib.pm and PostgresNode.pm instead of just comments. It would be much easier to test writers to read documentation using perldoc utility, rather than browse through the code. I'll second Stas' suggestion about psql_ok/psql_fail functions. 1. psql_ok instead of just psql would provide visual feedback for the reader of code. One would see 'here condition is tested, here is something ended with _ok/_fail'. It would be nice that seeing say "use Test::More tests => 4" one can immediately see "Yes, there is three _ok's and one _fail in the script' 2. I have use case for psql_fail code. In my libpq failover patch there is number of cases, where it should be tested that connection is not established, But this is rather about further evolution of the tap test library, not about this set of tests. I think that this patch should be commited as soon as possible in its current form (short of already reported reject in the PostgresNode.pm init function). -- Victor Wagner <vitus@wagner.pp.ru>
Re: [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Thu, Feb 4, 2016 at 9:18 PM, Victor Wagner wrote: > It's quite good that patch sets standard of using 'use strict; use > warnings;' in the test script. FWIW, this is decided as being a standard rule for any modules/script added in the main tree. > It is bad, that Postgres-specific perl modules do not have embedded > documentation. It would be nice to see POD documentation in the > TestLib.pm and PostgresNode.pm instead of just comments. It would be > much easier to test writers to read documentation using perldoc utility, > rather than browse through the code. Why not. I am no perlist but those prove to be helpful, however those Postgres modules are not dedicated to a large audience, so we could live without for now. > I think that this patch should be commited as soon as possible in its > current form (short of already reported reject in the PostgresNode.pm > init function). Thanks for your enthusiasm. Now, to do an auto-critic of my patch: + if ($params{allows_streaming}) + { + print $conf "wal_level = hot_standby\n"; + print $conf "max_wal_senders = 5\n"; + print $conf "wal_keep_segments = 20\n"; + print $conf "max_wal_size = 128MB\n"; + print $conf "shared_buffers = 1MB\n"; + print $conf "wal_log_hints = on\n"; + print $conf "hot_standby = on\n"; + } This could have more thoughts, particularly for wal_log_hints which is not used all the time, I think that we'd actually want to complete that with an optional hash of parameter/values that get appended at the end of the configuration file, then pass wal_log_hints in the tests where it is needed. The default set of parameter is maybe fine if done this way, still wal_keep_segments could be removed. +# Tets for timeline switch +# Encure that a standby is able to follow a newly-promoted standby Two typos in two lines. -- Michael
Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Noah Misch
Date:
On Thu, Feb 04, 2016 at 09:13:46PM +0300, Victor Wagner wrote: > On Thu, 4 Feb 2016 18:33:27 +0300 Michael Paquier <michael.paquier@gmail.com> wrote: > > > Really, it is not so hard to add configure checks for perl modules. > > > And we need to test not only for IPC::Run, but for Test::More too, > > > because some Linux distributions put modules which come with perl > > > into separate package. > > > > The last time we discussed about that on this list we concluded that > > it was not really necessary to have such checks, for one it makes the > > code more simple, and because this is leveraged by the presence of > > --enable-tap-tests, tests which can get actually costly with > > check-world. But this is digressing the subject of this thread, which > > deals with the fact of having recovery tests integrated in core... > > Of course, such configure tests should be run only if > --enable-tap-tests is passed to the configure script > > It would look like > > if test "$enable_tap_tests" = "yes"; then > AX_PROG_PERL_MODULES( Test::More, , AC_MSG_ERROR([Test::More is > necessary to run TAP Tests]) > AX_PROG_PERL_MODULES( IPC::Run, , AC_MSG_ERROR([IPC::Run is > necessary to run TAP Tests]) > fi > > in the configure.in > > May be it is not strictly necessary, but it is really useful to see > such problems as clear error message during configure stage, rather > than successfully configure, compile, run tests and only then find out, > that something is forgotten. > > I don't see why having such tests in the configure.in, makes code more > complex. It just prevents configure to finish successfully if > --enable-tap-tests is specified and required modules are not available. Even if detecting missing modules at "configure" time is the right thing, it belongs in a distinct patch, discussed on a distinct thread. The absence of IPC::Run affects the proposed replication tests in the same way it affects current TAP suites, so this thread has no business revisiting it.
Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Thu, Feb 4, 2016 at 11:58 PM, Stas Kelvich <s.kelvich@postgrespro.ru> wrote: >> On 04 Feb 2016, at 12:59, Michael Paquier <michael.paquier@gmail.com> wrote: >>> 0) There are several routines that does actual checking, like is/command_ok/command_fails. I think it will be very handyto have wrappers psql_ok/psql_fails that calls psql through the command_ok/fails. >> >> Do you have a test case in mind for it? > > Yes, I’ve used that to test prepare/commit while recovery (script attached, it’s in WIP state, i’ll submit that later alongwith other twophase stuff). Oh, OK, now I see. Well it seems to make sense for your case, though it does not seem to be directly linked to the patch here. We could incrementally add something on top of the existing infrastructure that gets into the code tree once the 2PC patch gets in a more advanced shape. >>> 2) --enable-tap-tests deserves mention in test/recovery/README and more obvious error message when one trying to runmake check in test/recovery without --enable-tap-tests. >> >> When running without --enable-tap-tests from src/test/recovery you >> would get the following error per how prove_check is defined: >> "TAP tests not enabled" > > Yes, but that message doesn’t mention --enable-tap-tests and README also silent about that too. I didn’t know about thatflag and had to search in makefiles for this error message to see what conditions leads to it. I think we can save planetfrom one more stackoverflow question if the error message will mention that flag. Well, that works for the whole TAP test infrastructure and not really this patch only. Let's not forget that the goal of this thread is to provide a basic set of tests and routines to help people building test cases for more advanced clustering scenarios, so I'd rather not complicate the code with side things and remain focused on the core problem. -- Michael
Re: [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Fri, Feb 5, 2016 at 4:17 AM, Michael Paquier wrote: > Thanks for your enthusiasm. Now, to do an auto-critic of my patch: > + if ($params{allows_streaming}) > + { > + print $conf "wal_level = hot_standby\n"; > + print $conf "max_wal_senders = 5\n"; > + print $conf "wal_keep_segments = 20\n"; > + print $conf "max_wal_size = 128MB\n"; > + print $conf "shared_buffers = 1MB\n"; > + print $conf "wal_log_hints = on\n"; > + print $conf "hot_standby = on\n"; > + } > This could have more thoughts, particularly for wal_log_hints which is > not used all the time, I think that we'd actually want to complete > that with an optional hash of parameter/values that get appended at > the end of the configuration file, then pass wal_log_hints in the > tests where it is needed. The default set of parameter is maybe fine > if done this way, still wal_keep_segments could be removed. At the end I have refrained from doing that, and refactoring setup_cluster@RewindTest.pm to use the new option allows_streaming, simplifying a bit the code. The introduction of allows_streaming could be done in a separate patch, though it did not seem worth the complication when hacking at that. The split is simple, though. > +# Tets for timeline switch > +# Encure that a standby is able to follow a newly-promoted standby > Two typos in two lines. Fixed. I also found an issue with the use of application_name causing test 001 to fail, bug squashed on the way. The generation of paths for archive_command and restore_command was incorrect on Windows. Those need to use two backslashes (to detect correctly files) and need to be double-quoted (to avoid errors with command copy should a space be included in the path). I have added as well a new subcommand in vcregress.pl called recoverycheck where one can run the recovery test suite on Windows using MSVC. Attached are rebased patches, split into 3 parts doing the following: - 0001, fix default configuration of MSVC builds ignoring TAP tests - 0002, add a promote routine in PostgresNode.pm. pg_rewind's tests can make immediate use of that. - 0003, the actual test suite. This is registered in CF 2016-03 as well for further consideration. -- Michael
Attachment
Re: [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Wed, Feb 17, 2016 at 9:52 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Fri, Feb 5, 2016 at 4:17 AM, Michael Paquier wrote: >> Thanks for your enthusiasm. Now, to do an auto-critic of my patch: >> + if ($params{allows_streaming}) >> + { >> + print $conf "wal_level = hot_standby\n"; >> + print $conf "max_wal_senders = 5\n"; >> + print $conf "wal_keep_segments = 20\n"; >> + print $conf "max_wal_size = 128MB\n"; >> + print $conf "shared_buffers = 1MB\n"; >> + print $conf "wal_log_hints = on\n"; >> + print $conf "hot_standby = on\n"; >> + } >> This could have more thoughts, particularly for wal_log_hints which is >> not used all the time, I think that we'd actually want to complete >> that with an optional hash of parameter/values that get appended at >> the end of the configuration file, then pass wal_log_hints in the >> tests where it is needed. The default set of parameter is maybe fine >> if done this way, still wal_keep_segments could be removed. > > At the end I have refrained from doing that, and refactoring > setup_cluster@RewindTest.pm to use the new option allows_streaming, > simplifying a bit the code. The introduction of allows_streaming could > be done in a separate patch, though it did not seem worth the > complication when hacking at that. The split is simple, though. > >> +# Tets for timeline switch >> +# Encure that a standby is able to follow a newly-promoted standby >> Two typos in two lines. > > Fixed. > > I also found an issue with the use of application_name causing test > 001 to fail, bug squashed on the way. The generation of paths for > archive_command and restore_command was incorrect on Windows. Those > need to use two backslashes (to detect correctly files) and need to be > double-quoted (to avoid errors with command copy should a space be > included in the path). I have added as well a new subcommand in > vcregress.pl called recoverycheck where one can run the recovery test > suite on Windows using MSVC. > > Attached are rebased patches, split into 3 parts doing the following: > - 0001, fix default configuration of MSVC builds ignoring TAP tests > - 0002, add a promote routine in PostgresNode.pm. pg_rewind's tests > can make immediate use of that. > - 0003, the actual test suite. > This is registered in CF 2016-03 as well for further consideration. Here is a rebased set after the conflicts created by e640093, with the following changes: - In 0002, added perldoc for new promote routine - In 0003, added perldoc documentation for the new options introduced in init and init_from_backup, and fixed some log entries not using the node name to identify the node involved when enabling archive, streaming or recovery. - Craig has pinged me regarding tap_tests being incorrectly updated in config_default.pl in 0003. I just re-ran the tests on OSX and Windows (MSVC 2010 with Win7) to be sure that nothing broke, and nothing has been reported as broken. -- Michael
Attachment
Re: [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Craig Ringer
Date:
On 26 February 2016 at 10:58, Michael Paquier <michael.paquier@gmail.com> wrote:
Here is a rebased set after the conflicts created by e640093, with the
following changes:
Thanks for rebasing on top of that. Not totally fair when your patch came first, but I guess it was simpler to merge the other one first.
- In 0002, added perldoc for new promote routine
- In 0003, added perldoc documentation for the new options introduced
in init and init_from_backup, and fixed some log entries not using the
node name to identify the node involved when enabling archive,
streaming or recovery.
Very much appreciated.
- Craig has pinged me regarding tap_tests being incorrectly updated in
config_default.pl in 0003.
I just re-ran the tests on OSX and Windows (MSVC 2010 with Win7) to be
sure that nothing broke, and nothing has been reported as broken.
I've looked over the tests. I see that you've updated the docs for the Windows tests to reflect the changes, which is good, thanks.
I like the patch and would love to see it committed soon.
I do have one major disagreement, which is that you turn autovacuum off if streaming is enabled. This is IMO completely wrong and must be removed. It's making the tests ignore a major and important part of real-world use.
If you did it to make it easier to test replay catchup etc, just use pg_xlog_location_diff instead of an equality test. Instead of:
my $caughtup_query = "SELECT '$current_lsn'::pg_lsn <= pg_last_xlog_replay_location()";
use
my $caughtup_query = "SELECT pg_xlog_location_diff('$current_lsn', pg_last_xlog_replay_location()) <= 0";
so it doesn't care if we replay past the expected LSN on the master due to autovacuum activity. That's what's done in the real world and what should be covered by the tests IMO.
The patch sets
tap_tests => 1,
in config_default.pl. Was that on purpose? I'd have no problem with running the TAP tests by default if they worked by default, but the docs say that at least with ActiveState's Perl you have to jump through some hoops to get IPC::Run.
Typo in PostgresNode.pm: passiong should be 'passing' .
Otherwise looks _really_ good and I'd love to see this committed very soon.
I'd like a way to append parameters in a way that won't clobber settings made implicitly by the module through things like enable_streaming but I can add that in a followup patch. It doesn't need to complicate this one. I'm thinking of having the tests append an include_dir directive when they create a node, maintain a hash of all parameters and rewrite a postgresql.conf.taptests file in the include_dir when params are updated. Also exposing a 'reload' call.
Re: [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Fri, Feb 26, 2016 at 1:47 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 26 February 2016 at 10:58, Michael Paquier <michael.paquier@gmail.com> > wrote: >> Here is a rebased set after the conflicts created by e640093, with the >> following changes: > > Thanks for rebasing on top of that. Not totally fair when your patch came > first, but I guess it was simpler to merge the other one first. At this point the final result is the same. It does not matter what gets in first. > I do have one major disagreement, which is that you turn autovacuum off if > streaming is enabled. This is IMO completely wrong and must be removed. It's > making the tests ignore a major and important part of real-world use. This has been chosen for consistency with what is in pg_rewind tests, the idea being to keep the runs more stable with a WAL output under control to allow predictable results. Though I do not see any direct reason to not remove it actually now that I think about it. > If you did it to make it easier to test replay catchup etc, just use > pg_xlog_location_diff instead of an equality test. Instead of: > my $caughtup_query = "SELECT '$current_lsn'::pg_lsn <= > pg_last_xlog_replay_location()"; > use > my $caughtup_query = "SELECT pg_xlog_location_diff('$current_lsn', > pg_last_xlog_replay_location()) <= 0"; > so it doesn't care if we replay past the expected LSN on the master due to > autovacuum activity. That's what's done in the real world and what should be > covered by the tests IMO. Those two statements have the same meaning. pg_xlog_location_diff does exactly the same thing as the pg_lsn data type in terms of LSN comparisons. Choosing one or the other is really a matter of taste. Though I see that 001 is the only test that uses an equality, this should not be the case I agree. > The patch sets > > tap_tests => 1, > > in config_default.pl. Was that on purpose? I'd have no problem with running > the TAP tests by default if they worked by default, but the docs say that at > least with ActiveState's Perl you have to jump through some hoops to get > IPC::Run. No, this was an error in the previous version of the patch 0003. Those tests should be disabled by default, to match what ./configure does, and also because installing IPC::Run requires some extra operations, but that's easily doable with a bit of black magic. > Typo in PostgresNode.pm: passiong should be 'passing'. Oops. > I'd like a way to append parameters in a way that won't clobber settings > made implicitly by the module through things like enable_streaming but I can > add that in a followup patch. It doesn't need to complicate this one. This is something that I have been thinking about for some time while hacking this thing, but I finished with the current version to not complicate the patch more than it needs to be, and because the current version is enough for the needs of all the tests present. Surely this can be extended further more. One idea that I had was for example to pass as parameter to init() and init_from_backup() a set of key/values that would be appended to postgresql.conf. > I'm thinking of having the tests append an include_dir directive when they > create a node, maintain a hash of all parameters and rewrite a > postgresql.conf.taptests file in the include_dir when params are updated. > Also exposing a 'reload' call. The reload wrapper would make sense to have. That has not proved to be necessary yet. -- Michael
Attachment
Re: [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Craig Ringer
Date:
On 26 February 2016 at 13:43, Michael Paquier <michael.paquier@gmail.com> wrote:
> my $caughtup_query = "SELECT '$current_lsn'::pg_lsn <=
> pg_last_xlog_replay_location()";
> use
> my $caughtup_query = "SELECT pg_xlog_location_diff('$current_lsn',
> pg_last_xlog_replay_location()) <= 0";
> so it doesn't care if we replay past the expected LSN on the master due to
> autovacuum activity. That's what's done in the real world and what should be
> covered by the tests IMO.
Those two statements have the same meaning. pg_xlog_location_diff does
exactly the same thing as the pg_lsn data type in terms of LSN
comparisons.
Ah. Whoops. I meant to write '=' in the first, to reflect what the code does.
You're quite right that casting to pg_lsn has the same effect and looks cleaner.
I think this looks good as of the last version. I'm not keen on disabling autovacuum but that can be addressed in a followup that makes it easier to configure params, as discussed. I definitely don't want to complicate this patch with it.
Should be committed ASAP IMO.
Re: [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Victor Wagner wrote: > I'll second Stas' suggestion about psql_ok/psql_fail functions. > > 1. psql_ok instead of just psql would provide visual feedback for the > reader of code. One would see 'here condition is tested, here is > something ended with _ok/_fail'. > > It would be nice that seeing say "use Test::More tests => 4" > one can immediately see "Yes, there is three _ok's and one _fail in the > script' > > 2. I have use case for psql_fail code. In my libpq failover patch there > is number of cases, where it should be tested that connection is not > established, > > But this is rather about further evolution of the tap test library, not > about this set of tests. This makes sense to me. Please submit a patch for this. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Michael Paquier wrote: > Attached are rebased patches, split into 3 parts doing the following: > - 0001, fix default configuration of MSVC builds ignoring TAP tests BTW you keep submitting this one and I keep ignoring it. I think you should start a separate thread for this one, so that some Windows-enabled committer can look at it and maybe push it. I still don't understand why this matters. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Kyotaro HORIGUCHI wrote: > So, I'd like to propose four (or five) changes to this harness. > > - prove_check to remove all in tmp_check > > - TestLib to preserve temporary directories/files if the current > test fails. > > - PostgresNode::get_new_node to create data directory with > meaningful basenames. > > - PostgresNode::psql to return a list of ($stdout, $stderr) if > requested. (The previous behavior is not changed) > > - (recovery/t/00x_* gives test number to node name) > > As a POC, the attached diff will appliy on the 0001 and (fixed) > 0003 patches. These changes all seem very reasonable to me. I'm not so sure about the last one. Perhaps the framework ought to generate an appropriate subdir name using the test file name plus the node name, so that instead of tmp_XXXX it becomes tmp_001_master_XXXX or something like that? Having be a coding convention doesn't look real nice to me. I didn't try to apply your patch but I'm fairly certain it would conflict with what's here now; can you please rebase and resend? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Craig Ringer wrote: > Should be committed ASAP IMO. Finally pushed it. Let's see how it does in the buildfarm. Now let's get going and add more tests, I know there's no shortage of people with test scripts waiting for this. Thanks, Michael, for the persistency, and thanks to all reviewers. (I'm sorry we seem to have lost Amir Rohan in the process. He was doing a great job.) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Sat, Feb 27, 2016 at 4:30 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Craig Ringer wrote: >> Should be committed ASAP IMO. > > Finally pushed it. Let's see how it does in the buildfarm. Now let's > get going and add more tests, I know there's no shortage of people with > test scripts waiting for this. > > Thanks, Michael, for the persistency, and thanks to all reviewers. (I'm > sorry we seem to have lost Amir Rohan in the process. He was doing > a great job.) Date of first message of this thread: Mon, 2 Dec 2013 15:40:41 +0900 Date of this message: Fri, 26 Feb 2016 16:30:08 -0300 This has been a long trip. Thanks a lot to all involved. Many people have reviewed and helped out with this patch. -- Michael
Re: [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Sat, Feb 27, 2016 at 7:37 AM, Michael Paquier wrote: > Date of first message of this thread: Mon, 2 Dec 2013 15:40:41 +0900 > Date of this message: Fri, 26 Feb 2016 16:30:08 -0300 > This has been a long trip. Thanks a lot to all involved. Many people > have reviewed and helped out with this patch. I just had a closer look at what has been committed, and I found a couple of minor issues, addressed via the patch attached: 1) install-windows.sgml should use the markup command when mentioning bincheck and recoverycheck 2) src/test/recovery/.gitignore is listing /regress_log/ but that's not needed (this is a remnant of a previous version of the patch posted on this thread). 3) Header of 002_archiving.pl mentions that the tests are running on a warm standby, but that's a hot standby (found by Craig and reported on github on my account) 4) src/test/recovery/Makefile is missing a clean target, to remove tmp_check/. 5) src/tools/msvc/clean.bat is missing the same cleanup command for the same thing after running the tests. 6) Header of 004_timeline_switch.pl should perhaps mention that a cascading standby is used (idea of Craig, a good addition IMO) Regards, -- Michael
Attachment
Re: [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Sun, Feb 28, 2016 at 10:41 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Sat, Feb 27, 2016 at 7:37 AM, Michael Paquier wrote: >> Date of first message of this thread: Mon, 2 Dec 2013 15:40:41 +0900 >> Date of this message: Fri, 26 Feb 2016 16:30:08 -0300 >> This has been a long trip. Thanks a lot to all involved. Many people >> have reviewed and helped out with this patch. > > I just had a closer look at what has been committed, and I found a > couple of minor issues, addressed via the patch attached: > 1) install-windows.sgml should use the markup command when mentioning > bincheck and recoverycheck > 2) src/test/recovery/.gitignore is listing /regress_log/ but that's > not needed (this is a remnant of a previous version of the patch > posted on this thread). > 3) Header of 002_archiving.pl mentions that the tests are running on a > warm standby, but that's a hot standby (found by Craig and reported on > github on my account) > 4) src/test/recovery/Makefile is missing a clean target, to remove tmp_check/. > 5) src/tools/msvc/clean.bat is missing the same cleanup command for > the same thing after running the tests. > 6) Header of 004_timeline_switch.pl should perhaps mention that a > cascading standby is used (idea of Craig, a good addition IMO) 7) src/test/README is not describing recovery/ 8) This description in src/test/recovery/README is not exact, it mentions a set of routines that are now part of PostgresNode.pm: +This directory contains a test suite for recovery and replication, +testing mainly the interactions of recovery.conf with cluster +instances by providing a simple set of routines that can be used +to define a custom cluster for a test, including backup, archiving, +and streaming configuration. I would suggest removing the last 4 lines and simplify the paragraph. 9) I have no logical explanation to explain why I am seeing all those things now. v2 is attached. -- Michael
Attachment
Re: [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Craig Ringer
Date:
On 27 February 2016 at 06:37, Michael Paquier <michael.paquier@gmail.com> wrote:
On Sat, Feb 27, 2016 at 4:30 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Craig Ringer wrote:
>> Should be committed ASAP IMO.
>
> Finally pushed it. Let's see how it does in the buildfarm. Now let's
> get going and add more tests, I know there's no shortage of people with
> test scripts waiting for this.
>
> Thanks, Michael, for the persistency, and thanks to all reviewers. (I'm
> sorry we seem to have lost Amir Rohan in the process. He was doing
> a great job.)
Date of first message of this thread: Mon, 2 Dec 2013 15:40:41 +0900
Date of this message: Fri, 26 Feb 2016 16:30:08 -0300
This has been a long trip. Thanks a lot to all involved. Many people
have reviewed and helped out with this patch.
Congratulations and thanks.
I don't see any new buildfarm failures. The BinInstallCheck failure on Windows predates this and the isolationtest failure on OpenBSD is unrelated.
Re: [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Sun, Feb 28, 2016 at 11:22 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 27 February 2016 at 06:37, Michael Paquier <michael.paquier@gmail.com> > wrote: >> >> On Sat, Feb 27, 2016 at 4:30 AM, Alvaro Herrera >> <alvherre@2ndquadrant.com> wrote: >> > Craig Ringer wrote: >> >> Should be committed ASAP IMO. >> > >> > Finally pushed it. Let's see how it does in the buildfarm. Now let's >> > get going and add more tests, I know there's no shortage of people with >> > test scripts waiting for this. >> > >> > Thanks, Michael, for the persistency, and thanks to all reviewers. (I'm >> > sorry we seem to have lost Amir Rohan in the process. He was doing >> > a great job.) >> >> Date of first message of this thread: Mon, 2 Dec 2013 15:40:41 +0900 >> Date of this message: Fri, 26 Feb 2016 16:30:08 -0300 >> This has been a long trip. Thanks a lot to all involved. Many people >> have reviewed and helped out with this patch. > > > Congratulations and thanks. > > I don't see any new buildfarm failures. The BinInstallCheck failure on > Windows predates this and the isolationtest failure on OpenBSD is unrelated. The buildfarm does not have infrastructure to test that yet.. I need to craft a patch for the client-side code and send it to Andrew. Will try to do so today. -- Michael
Re: [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Mon, Feb 29, 2016 at 7:40 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > The buildfarm does not have infrastructure to test that yet.. I need > to craft a patch for the client-side code and send it to Andrew. Will > try to do so today. For those interested, here is where things are going to happen: https://github.com/PGBuildFarm/client-code/pull/7 (This patch could be more refactored, I am not sure how much Andrew would like things to be less duplicated, so I went to the most simple solution). -- Michael
Re: [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Michael Paquier wrote: > 9) I have no logical explanation to explain why I am seeing all those > things now. Happens all the time ... Pushed, thanks. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Tue, Mar 1, 2016 at 6:25 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Michael Paquier wrote: > >> 9) I have no logical explanation to explain why I am seeing all those >> things now. > > Happens all the time ... > > Pushed, thanks. Thanks. I am going to patch by buildfarm scripts to run those tests on hamster. Let's see what happens for this machine. -- Michael
Re: [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Tue, Mar 1, 2016 at 11:28 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Tue, Mar 1, 2016 at 6:25 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> Michael Paquier wrote: >> >>> 9) I have no logical explanation to explain why I am seeing all those >>> things now. >> >> Happens all the time ... >> >> Pushed, thanks. > > Thanks. I am going to patch by buildfarm scripts to run those tests on > hamster. Let's see what happens for this machine. It looks like it works fine (see step recovery-check): http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2016-03-01%2002%3A34%3A26 http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hamster&dt=2016-03-01%2002%3A34%3A26&stg=recovery-check Let's that run for a couple of days.. -- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Kyotaro HORIGUCHI
Date:
Hello, At Fri, 26 Feb 2016 15:43:14 -0300, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in <20160226184314.GA205945@alvherre.pgsql> > Kyotaro HORIGUCHI wrote: > > > So, I'd like to propose four (or five) changes to this harness. > > > > - prove_check to remove all in tmp_check > > > > - TestLib to preserve temporary directories/files if the current > > test fails. > > > > - PostgresNode::get_new_node to create data directory with > > meaningful basenames. > > > > - PostgresNode::psql to return a list of ($stdout, $stderr) if > > requested. (The previous behavior is not changed) > > > > - (recovery/t/00x_* gives test number to node name) > > > > As a POC, the attached diff will appliy on the 0001 and (fixed) > > 0003 patches. > > These changes all seem very reasonable to me. I'm not so sure about the > last one. Perhaps the framework ought to generate an appropriate subdir > name using the test file name plus the node name, so that instead of > tmp_XXXX it becomes tmp_001_master_XXXX or something like that? Having > be a coding convention doesn't look real nice to me. Thank you for mentioning this. Sorry, the last one accidentally contained garbage code to intentionally raise an error. The last one names the nodes as such like '001_master_24f8'. Maybe prefixing "tmp_" would be better. > I didn't try to apply your patch but I'm fairly certain it would > conflict with what's here now; can you please rebase and resend? This was a very small patch made on the old, uncommited patches. I remade the patch and split it into two parts. 0001-Change-behavior-... Changes of PostmasterNode.pm and TestLib.pm to add some features and change a behavior. 0002-Prefix-test-numbers-to-node- This is rather a example usage of 0001- patch (except for stderr stuff). 00n_xxx test leaves temporary directories withthe names of 00n_(master|standby)_XXXX on failure. If this is considered reasonable, I'll make same patches for the other/t/nnn_*.pl tests. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Tue, Mar 1, 2016 at 5:13 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > 0001-Change-behavior-... > > Changes of PostmasterNode.pm and TestLib.pm to add some > features and change a behavior. + # Preserve temporary directory for this test if failure + $File::Temp::KEEP_ALL = 1 unless Test::More->builder->is_passing; +1. Having the data folders being removed even in case of a failure is really annoying. + # Preserve temporary directory for this test if failure + $File::Temp::KEEP_ALL = 1 unless Test::More->builder->is_passing; s/if failure/in the event of a failure/? + return wantarray ? ($stdout, $stderr) : $stdout; So you are willing to extend that so as you could perform conparison tests on the error strings returned. Why no, it looks useful, though now there is no test in need of it I think. So without a proper need I think that we could live without. > 0002-Prefix-test-numbers-to-node- > > This is rather a example usage of 0001- patch (except for > stderr stuff). 00n_xxx test leaves temporary directories with > the names of 00n_(master|standby)_XXXX on failure. If this is > considered reasonable, I'll make same patches for the other > /t/nnn_*.pl tests. -my $node_master = get_new_node('master'); +my $node_master = get_new_node('001_master'); I am not a fan of appending the test number in the node name. For one, this complicates the log file name associated with a node by duplicating the test number in its name. Also, it is possible to easily get the name of the data folder for a node by looking at the logs. -- Michael
Re: [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Michael Paquier
Date:
On Sat, Feb 27, 2016 at 1:02 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Michael Paquier wrote: > >> Attached are rebased patches, split into 3 parts doing the following: >> - 0001, fix default configuration of MSVC builds ignoring TAP tests > > BTW you keep submitting this one and I keep ignoring it. I think you > should start a separate thread for this one, so that some > Windows-enabled committer can look at it and maybe push it. I still > don't understand why this matters. OK. I will create a separate thread on hackers. I think that this is still a bug. -- Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Craig Ringer
Date:
On 1 March 2016 at 20:45, Michael Paquier <michael.paquier@gmail.com> wrote:
On Tue, Mar 1, 2016 at 5:13 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> 0001-Change-behavior-...
>
> Changes of PostmasterNode.pm and TestLib.pm to add some
> features and change a behavior.
+ # Preserve temporary directory for this test if failure
+ $File::Temp::KEEP_ALL = 1 unless Test::More->builder->is_passing;
+1. Having the data folders being removed even in case of a failure is
really annoying.
I agree on all points re your review. Keeping tempdirs is really needed, the tempdir name change is good, the the rest I'm not keen on.
I've addressed the need to get stderr from psql in a patch I'll submit separately, which provides a thinner wrapper around IPC::Run for more complex needs, then uses that for the existing 'psql' function. It also provides a 'psql_check' that dies on error.
I'll incorporate the wanted changes into that patch.
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Craig Ringer
Date:
On 1 March 2016 at 21:05, Craig Ringer <craig@2ndquadrant.com> wrote:
On 1 March 2016 at 20:45, Michael Paquier <michael.paquier@gmail.com> wrote:On Tue, Mar 1, 2016 at 5:13 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> 0001-Change-behavior-...
>
> Changes of PostmasterNode.pm and TestLib.pm to add some
> features and change a behavior.
+ # Preserve temporary directory for this test if failure
+ $File::Temp::KEEP_ALL = 1 unless Test::More->builder->is_passing;
+1. Having the data folders being removed even in case of a failure is
really annoying.I agree on all points re your review. Keeping tempdirs is really needed, the tempdir name change is good, the the rest I'm not keen on.I've addressed the need to get stderr from psql in a patch I'll submit separately, which provides a thinner wrapper around IPC::Run for more complex needs, then uses that for the existing 'psql' function. It also provides a 'psql_check' that dies on error.I'll incorporate the wanted changes into that patch.
OK, done.
Michael, I'd value your thoughts on the patches. I think the psql one is important, I found the existing 'psql' function too limiting and I think defaulting to ignoring errors is sub-optimal.
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
From
Alvaro Herrera
Date:
Michael Paquier wrote: > On Tue, Mar 1, 2016 at 5:13 PM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > + return wantarray ? ($stdout, $stderr) : $stdout; > So you are willing to extend that so as you could perform conparison > tests on the error strings returned. Why no, it looks useful, though > now there is no test in need of it I think. So without a proper need I > think that we could live without. Does this change let us implement psql_ok and psql_fail? I think I've seen a few places already, both in committed code and in submitted patches, that test for some kind of failure from psql. > > 0002-Prefix-test-numbers-to-node- > > > > This is rather a example usage of 0001- patch (except for > > stderr stuff). 00n_xxx test leaves temporary directories with > > the names of 00n_(master|standby)_XXXX on failure. If this is > > considered reasonable, I'll make same patches for the other > > /t/nnn_*.pl tests. > > -my $node_master = get_new_node('master'); > +my $node_master = get_new_node('001_master'); > I am not a fan of appending the test number in the node name. For one, > this complicates the log file name associated with a node by > duplicating the test number in its name. Also, it is possible to > easily get the name of the data folder for a node by looking at the > logs. Why don't we use something similar to what's in $test_logfile in TestLib? > Also, it is possible to easily get the name of the data folder for a > node by looking at the logs. No disagreement on it being possible, but "easily" seems a bad description for that. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services