Thread: Re: In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
Re: In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
From
Mark Dilger
Date:
I am building a regression test system for replication and came across
this email thread. I have gotten pretty far into my implementation, but
would be happy to make modifications if folks have improvements to
suggest. If the community likes my design, or a modified version based
on your feedback, I'd be happy to submit a patch.
Currently I am canibalizing src/test/pg_regress.c, but that could instead
be copied to src/test/pg_regress_replication.c or whatever. The regression
test creates and configures multiple database clusters, sets up the
replication configuration for them, runs them each in nonprivileged mode
and bound to different ports, feeds all the existing 141 regression tests
into the master database with the usual checking that all the right results
are obtained, and then checks that the standbys have the expected
data. This is possible all on one system because the database clusters
are chroot'ed to see their own /data directory and not the /data directory
of the other chroot'ed clusters, although the rest of the system, like /bin
and /etc and /dev are all bind mounted and visible to each cluster.
There of course is room to add as many replication tests as you like,
and the main 141 tests fed into the master could be extended to feed
more data and such.
The main drawbacks that I don't care for are:
1) 'make check' becomes 'sudo make check' because it needs permission
to run chroot.
2) I have no win32 version of the logic
3) Bind mounts either have to be created by the privileged pg_regress
process or have to be pre-existing on the system
#1 would not be as bad if pg_regress became pg_regress_replication, as
we could make the mantra into 'sudo make replicationcheck' or similar.
Splitting it from 'make check' also means IMHO that it could have heavier
tests that take longer to run, since people merely interested in building
and installing postgres would not be impacted by this.
#2 might be fixed by someone more familiar with win32 programming
than I am.
#3 cannot be avoided as far as I can tell, but we could chose between
the two options. So far, I have chosen to set up the directory structure
and add the bind mount logic to my /etc/fstab only once, rather than
having this get recreated every time I invoke 'sudo make check'. The
community might prefer to go the other way, and have the directories
and bind mounts get set up each invocation; I have avoided that thus
far as I don't want 'sudo make check' (or 'sudo make replicationcheck')
to abuse its raised privileges and muck with the filesystem in a way
that could cause the user unexpected problems.
The main advantages that I like about this design are:
1) Only one system is required. The developer does not need network
access to a second replication system. Moreover, multiple database
clusters can be established with interesting replication hierarchies between
them, and the cost of each additional cluster is just another chroot
environment
2) Checking out the sources from git and then running
./configure && make && sudo make replicationtest
is not particularly difficult, assuming the directories and mounts are
in place, or alternatively assuming that 'sudo make regressioncheck'
creates them for you if they don't already exist.
Comments and advice sincerely solicited,
mark
this email thread. I have gotten pretty far into my implementation, but
would be happy to make modifications if folks have improvements to
suggest. If the community likes my design, or a modified version based
on your feedback, I'd be happy to submit a patch.
Currently I am canibalizing src/test/pg_regress.c, but that could instead
be copied to src/test/pg_regress_replication.c or whatever. The regression
test creates and configures multiple database clusters, sets up the
replication configuration for them, runs them each in nonprivileged mode
and bound to different ports, feeds all the existing 141 regression tests
into the master database with the usual checking that all the right results
are obtained, and then checks that the standbys have the expected
data. This is possible all on one system because the database clusters
are chroot'ed to see their own /data directory and not the /data directory
of the other chroot'ed clusters, although the rest of the system, like /bin
and /etc and /dev are all bind mounted and visible to each cluster.
There of course is room to add as many replication tests as you like,
and the main 141 tests fed into the master could be extended to feed
more data and such.
The main drawbacks that I don't care for are:
1) 'make check' becomes 'sudo make check' because it needs permission
to run chroot.
2) I have no win32 version of the logic
3) Bind mounts either have to be created by the privileged pg_regress
process or have to be pre-existing on the system
#1 would not be as bad if pg_regress became pg_regress_replication, as
we could make the mantra into 'sudo make replicationcheck' or similar.
Splitting it from 'make check' also means IMHO that it could have heavier
tests that take longer to run, since people merely interested in building
and installing postgres would not be impacted by this.
#2 might be fixed by someone more familiar with win32 programming
than I am.
#3 cannot be avoided as far as I can tell, but we could chose between
the two options. So far, I have chosen to set up the directory structure
and add the bind mount logic to my /etc/fstab only once, rather than
having this get recreated every time I invoke 'sudo make check'. The
community might prefer to go the other way, and have the directories
and bind mounts get set up each invocation; I have avoided that thus
far as I don't want 'sudo make check' (or 'sudo make replicationcheck')
to abuse its raised privileges and muck with the filesystem in a way
that could cause the user unexpected problems.
The main advantages that I like about this design are:
1) Only one system is required. The developer does not need network
access to a second replication system. Moreover, multiple database
clusters can be established with interesting replication hierarchies between
them, and the cost of each additional cluster is just another chroot
environment
2) Checking out the sources from git and then running
./configure && make && sudo make replicationtest
is not particularly difficult, assuming the directories and mounts are
in place, or alternatively assuming that 'sudo make regressioncheck'
creates them for you if they don't already exist.
Comments and advice sincerely solicited,
mark
Re: In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
From
Greg Stark
Date:
<p dir="ltr"><p dir="ltr">-- <br /> greg<br /> On 5 Jan 2014 14:54, "Mark Dilger" <<a href="mailto:markdilger@yahoo.com">markdilger@yahoo.com</a>>wrote:<br /> ><br /> > I am building a regression testsystem for replication and came across<br /> > this email thread. I have gotten pretty far into my implementation,but<br /> > would be happy to make modifications if folks have improvements to<br /> > suggest. Ifthe community likes my design, or a modified version based<br /> > on your feedback, I'd be happy to submit a patch.<pdir="ltr">This sounds pretty cool. The real trick will be in testing concurrent behaviour -- I.e. queries on theslave when it's replaying logs at a certain point. But right now we have nothing so anything would be an improvement.<br/><p dir="ltr">> This is possible all on one system because the database clusters<br /> > are chroot'edto see their own /data directory and not the /data directory<br /> > of the other chroot'ed clusters, althoughthe rest of the system, like /bin<br /> > and /etc and /dev are all bind mounted and visible to each cluster.<pdir="ltr">This isn't necessary. You can use the same binaries and run initdb with a different location just fine.Then start up the database with -D to specify the directory.<br />
Re: In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
From
Michael Paquier
Date:
On Mon, Jan 6, 2014 at 4:51 AM, Mark Dilger <markdilger@yahoo.com> wrote:
> I am building a regression test system for replication and came across
> this email thread. I have gotten pretty far into my implementation, but
> would be happy to make modifications if folks have improvements to
> suggest. If the community likes my design, or a modified version based
> on your feedback, I'd be happy to submit a patch.
Yeah, this would be nice to look at, core code definitely needs to have some more infrastructure for such a test suite. I didn't get the time to go back to it since I began this thread though :)
> Currently I am canibalizing src/test/pg_regress.c, but that could instead
> be copied to src/test/pg_regress_replication.c or whatever. The regression
> test creates and configures multiple database clusters, sets up the
> replication configuration for them, runs them each in nonprivileged mode
> and bound to different ports, feeds all the existing 141 regression tests
> into the master database with the usual checking that all the right results
> are obtained, and then checks that the standbys have the expected
> data. This is possible all on one system because the database clusters
> are chroot'ed to see their own /data directory and not the /data directory
> of the other chroot'ed clusters, although the rest of the system, like /bin
> and /etc and /dev are all bind mounted and visible to each cluster.
Having vanilla regressions run in a cluster with multiple nodes and check the results on a standby is the top of the iceberg though. What I had in mind when I began this thread was to have more than a copy/paste of pg_regress, but an infrastructure that people could use to create and customize tests by having an additional control layer on the cluster itself. For example, testing replication is not only a matter of creating and setting up the nodes, but you might want to be able to initialize, add, remove nodes during the tests. Node addition would be either a new fresh master (this would be damn useful for a test suite for logical replication I think), or a slave node with custom recovery parameters to test replication, as well as PITR, archiving, etc. Then you need to be able to run SQL commands on top of that to check if the results are consistent with what you want.
A possible input for a test that users could provide would be something like that:
# Node information for tests
nodes
{
{node1, postgresql.conf params, recovery.conf params}
{node2, postgresql.conf params, recovery.conf params, slave of node1}
}
# Run test
init node1
run_sql node1 file1.sql
# Check output
init node2
run_sql node2 file2.sql
The main problem is actually how to do that. Having some smart shell infrastructure would be simple and would facilitate (?) the maintenance of code used to run the tests. On the contrary having a C program would make the maintenance of code to run the tests more difficult (?) for a trade with more readable test suite input like the one I wrote above. This might also make the test input more readable for a human eye, in the shape of what is already available in src/test/isolation.
Another possibility could be also to integrate directly a recovery/backup manager in PG core, and have some tests for it, or even include those tests directly with pg_basebackup or an upper layer of it.
> There of course is room to add as many replication tests as you like,
> and the main 141 tests fed into the master could be extended to feed
> more data and such.
>
> The main drawbacks that I don't care for are:
>
> 1) 'make check' becomes 'sudo make check' because it needs permission
> to run chroot.
-1 for that developers should not need to use root to run regression suite.
> 2) I have no win32 version of the logic
For a first shot I am not sure that it matters much.
> The main advantages that I like about this design are:
>
> 1) Only one system is required. The developer does not need network
> access to a second replication system. Moreover, multiple database
> clusters can be established with interesting replication hierarchies between
> them, and the cost of each additional cluster is just another chroot
> environment
An assumption of the test suite is I think to allow developers to check for bugs on a local server only. This facilitates how the test suite is written and you don't need to enter in things like VM settings or cross-environment tests, things that could be done already nicely by frameworks of the type Jenkins. What I think people would like to have is that:
cd src/test/replication && make check/installcheck
And have the test run for them.
Regards,
--
Michael
Michael
Re: In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
From
Steve Singer
Date:
On 01/05/2014 09:13 PM, Michael Paquier wrote: > > > On Mon, Jan 6, 2014 at 4:51 AM, Mark Dilger <markdilger@yahoo.com > <mailto:markdilger@yahoo.com>> wrote: > > I am building a regression test system for replication and came across > > this email thread. I have gotten pretty far into my implementation, but > > would be happy to make modifications if folks have improvements to > > suggest. If the community likes my design, or a modified version based > > on your feedback, I'd be happy to submit a patch. > Yeah, this would be nice to look at, core code definitely needs to > have some more infrastructure for such a test suite. I didn't get the > time to go back to it since I began this thread though :) > > > Currently I am canibalizing src/test/pg_regress.c, but that could > instead > > be copied to src/test/pg_regress_replication.c or whatever. The > regression > > test creates and configures multiple database clusters, sets up the > > replication configuration for them, runs them each in nonprivileged mode > > and bound to different ports, feeds all the existing 141 regression > tests > > into the master database with the usual checking that all the right > results > > are obtained, and then checks that the standbys have the expected > > data. This is possible all on one system because the database clusters > > are chroot'ed to see their own /data directory and not the /data > directory > > of the other chroot'ed clusters, although the rest of the system, > like /bin > > and /etc and /dev are all bind mounted and visible to each cluster. > Having vanilla regressions run in a cluster with multiple nodes and > check the results on a standby is the top of the iceberg though. What > I had in mind when I began this thread was to have more than a > copy/paste of pg_regress, but an infrastructure that people could use > to create and customize tests by having an additional control layer on > the cluster itself. For example, testing replication is not only a > matter of creating and setting up the nodes, but you might want to be > able to initialize, add, remove nodes during the tests. Node addition > would be either a new fresh master (this would be damn useful for a > test suite for logical replication I think), or a slave node with > custom recovery parameters to test replication, as well as PITR, > archiving, etc. Then you need to be able to run SQL commands on top of > that to check if the results are consistent with what you want. > I'd encourage anyone looking at implementing a testing suite for replication to look at the stuff we did for Slony at least to get some ideas. We wrote a test driver framework (clustertest - https://github.com/clustertest/clustertest-framework) then some Javascript base classes for common types of operations. An individual test is then written in Javascript that invokes methods either in the framework or base-class to do most of the interesting work. http://git.postgresql.org/gitweb/?p=slony1-engine.git;a=blob;f=clustertest/disorder/tests/EmptySet.js;h=7b4850c1d24036067f5a659b990c7f05415ed967;hb=HEAD as an example > A possible input for a test that users could provide would be > something like that: > # Node information for tests > nodes > { > {node1, postgresql.conf params, recovery.conf params} > {node2, postgresql.conf params, recovery.conf params, slave of node1} > } > # Run test > init node1 > run_sql node1 file1.sql > # Check output > init node2 > run_sql node2 file2.sql > # Check that results are fine > # Process > > The main problem is actually how to do that. Having some smart shell > infrastructure would be simple and would facilitate (?) the > maintenance of code used to run the tests. On the contrary having a C > program would make the maintenance of code to run the tests more > difficult (?) for a trade with more readable test suite input like the > one I wrote above. This might also make the test input more readable > for a human eye, in the shape of what is already available in > src/test/isolation. > > Another possibility could be also to integrate directly a > recovery/backup manager in PG core, and have some tests for it, or > even include those tests directly with pg_basebackup or an upper layer > of it. > > > There of course is room to add as many replication tests as you like, > > and the main 141 tests fed into the master could be extended to feed > > more data and such. > > > > The main drawbacks that I don't care for are: > > > > 1) 'make check' becomes 'sudo make check' because it needs permission > > to run chroot. > -1 for that developers should not need to use root to run regression > suite. > > > 2) I have no win32 version of the logic > For a first shot I am not sure that it matters much. > > > The main advantages that I like about this design are: > > > > 1) Only one system is required. The developer does not need network > > access to a second replication system. Moreover, multiple database > > clusters can be established with interesting replication hierarchies > between > > them, and the cost of each additional cluster is just another chroot > > environment > An assumption of the test suite is I think to allow developers to > check for bugs on a local server only. This facilitates how the test > suite is written and you don't need to enter in things like VM > settings or cross-environment tests, things that could be done already > nicely by frameworks of the type Jenkins. What I think people would > like to have is that: > cd src/test/replication && make check/installcheck > And have the test run for them. > > Regards, > -- > Michael
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
From
Andres Freund
Date:
On 2014-01-06 01:25:57 +0000, Greg Stark wrote: > -- > greg > On 5 Jan 2014 14:54, "Mark Dilger" <markdilger@yahoo.com> wrote: > > > > I am building a regression test system for replication and came across > > this email thread. I have gotten pretty far into my implementation, but > > would be happy to make modifications if folks have improvements to > > suggest. If the community likes my design, or a modified version based > > on your feedback, I'd be happy to submit a patch. > > This sounds pretty cool. The real trick will be in testing concurrent > behaviour -- I.e. queries on the slave when it's replaying logs at a > certain point. But right now we have nothing so anything would be an > improvement. Abhijit Menon-Sen (CCed) has prototyped an isolationtester version that can connect to multiple nodes. Once we've got automated setup of multiple nodes, pursuing that makes sense again. > > This is possible all on one system because the database clusters > > are chroot'ed to see their own /data directory and not the /data directory > > of the other chroot'ed clusters, although the rest of the system, like > /bin > > and /etc and /dev are all bind mounted and visible to each cluster. > > This isn't necessary. You can use the same binaries and run initdb with a > different location just fine. Then start up the database with -D to specify > the directory. Very emphathically seconded. It should absolutely not be necessary to use different chroots. Pretty much the only case that will require that is tablespaces unless you do some pretty ugly hackery... In almost all scenarios you'll have to change either unix_socket_directory or port (recommended) in addition to the datadir - but that's not a problem. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
From
Mark Dilger
Date:
The reason I was going to all the trouble of creating
chrooted environments was to be able to replicate
clusters that have tablespaces. Not doing so makes
the test code simpler at the expense of reducing
test coverage.
I am using the same binaries. The chroot directories
are not "chroot jails". I'm intentionally bind mounting
out to all the other directories on the system, except
the other clusters' data directories and tablespace
directories. The purpose of the chroot is to make the
paths the same on all clusters without the clusters
clobbering each other.
So:
(the '->' means "is bind mounted to")
/master/bin -> /bin
/master/dev -> /dev
/master/etc -> /etc
/master/lib -> /lib
/master/usr -> /usr
/master/data
/master/tablespace
/hotstandby/bin -> /bin
/hotstandby/dev -> /dev
/hotstandby/etc -> /etc
/hotstandby/lib -> /lib
/hotstandby/usr -> /usr
/hotstandby/data
/hotstandby/tablespace
So from inside the master chroot, you see the system's
/bin as /bin, the system's /dev as /dev, etc, but what
you see as /data and /tablespace are your own private
ones. Likewise from the hotstandby chroot. But since
the binaries are in something like
/home/myuser/postgresql/src/test/regress/tmp_check/install/usr/local/pgsql/bin
each cluster uses the same binaries, refered to by the
same path.
chrooted environments was to be able to replicate
clusters that have tablespaces. Not doing so makes
the test code simpler at the expense of reducing
test coverage.
I am using the same binaries. The chroot directories
are not "chroot jails". I'm intentionally bind mounting
out to all the other directories on the system, except
the other clusters' data directories and tablespace
directories. The purpose of the chroot is to make the
paths the same on all clusters without the clusters
clobbering each other.
So:
(the '->' means "is bind mounted to")
/master/bin -> /bin
/master/dev -> /dev
/master/etc -> /etc
/master/lib -> /lib
/master/usr -> /usr
/master/data
/master/tablespace
/hotstandby/bin -> /bin
/hotstandby/dev -> /dev
/hotstandby/etc -> /etc
/hotstandby/lib -> /lib
/hotstandby/usr -> /usr
/hotstandby/data
/hotstandby/tablespace
So from inside the master chroot, you see the system's
/bin as /bin, the system's /dev as /dev, etc, but what
you see as /data and /tablespace are your own private
ones. Likewise from the hotstandby chroot. But since
the binaries are in something like
/home/myuser/postgresql/src/test/regress/tmp_check/install/usr/local/pgsql/bin
each cluster uses the same binaries, refered to by the
same path.
On Sunday, January 5, 2014 5:25 PM, Greg Stark <stark@mit.edu> wrote:
--
greg
On 5 Jan 2014 14:54, "Mark Dilger" <markdilger@yahoo.com> wrote:
>
> I am building a regression test system for replication and came across
> this email thread. I have gotten pretty far into my implementation, but
> would be happy to make modifications if folks have improvements to
> suggest. If the community likes my design, or a modified version based
> on your feedback, I'd be happy to submit a patch.
greg
On 5 Jan 2014 14:54, "Mark Dilger" <markdilger@yahoo.com> wrote:
>
> I am building a regression test system for replication and came across
> this email thread. I have gotten pretty far into my implementation, but
> would be happy to make modifications if folks have improvements to
> suggest. If the community likes my design, or a modified version based
> on your feedback, I'd be happy to submit a patch.
This sounds pretty cool. The real trick will be in testing concurrent behaviour -- I.e. queries on the slave when it's replaying logs at a certain point. But right now we have nothing so anything would be an improvement.
> This is possible all on one system because the database clusters
> are chroot'ed to see their own /data directory and not the /data directory
> of the other chroot'ed clusters, although the rest of the system, like /bin
> and /etc and /dev are all bind mounted and visible to each cluster.
> are chroot'ed to see their own /data directory and not the /data directory
> of the other chroot'ed clusters, although the rest of the system, like /bin
> and /etc and /dev are all bind mounted and visible to each cluster.
This isn't necessary. You can use the same binaries and run initdb with a different location just fine. Then start up the database with -D to specify the directory.
Re: In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
From
Mark Dilger
Date:
I agree that merely setting up masters and slaves is
the tip of the iceberg. It seems to be what needs
to be tackled first, though, because until we have
a common framework, we cannot all contribute
tests to it.
I imagine setting up a whole hierarchy of master,
hotstandbys, warmstandbys, etc., and having
over the course of the test, base backups made,
new clusters spun up from those backups,
masters stopped and standbys promoted to
master, etc.
But I also imagine there needs to be SQL run
on the master that changes the data, so that
replication of those changes can be confirmed.
There are lots of ways to change data, such as
through the large object interface. The current
'make check' test suite exercises all those
code paths. If we incorporate them into our
replication testing suite, then we get the
advantage of knowing that all those paths are
being tested in our suite as well. And if some
new interface, call it huge object, ever gets
made, then there should be a hugeobject.sql
in src/test/regress/sql, and we automatically
get that in our replication tests.
mark
On Sunday, January 5, 2014 6:13 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Mon, Jan 6, 2014 at 4:51 AM, Mark Dilger <markdilger@yahoo.com> wrote:
> I am building a regression test system for replication and came across
> this email thread. I have gotten pretty far into my implementation, but
> would be happy to make modifications if folks have improvements to
> suggest. If the community likes my design, or a modified version based
> on your feedback, I'd be happy to submit a patch.
Yeah, this would be nice to look at, core code definitely needs to have some more infrastructure for such a test suite. I didn't get the time to go back to it since I began this thread though :)
> Currently I am canibalizing src/test/pg_regress.c, but that could instead
> be copied to src/test/pg_regress_replication.c or whatever. The regression
> test creates and configures multiple database clusters, sets up the
> replication configuration for them, runs them each in nonprivileged mode
> and bound to different ports, feeds all the existing 141 regression tests
> into the master database with the usual checking that all the right results
> are obtained, and then checks that the standbys have the expected
> data. This is possible all on one system because the database clusters
> are chroot'ed to see their own /data directory and not the /data directory
> of the other chroot'ed clusters, although the rest of the system, like /bin
> and /etc and /dev are all bind mounted and visible to each cluster.
Having vanilla regressions run in a cluster with multiple nodes and check the results on a standby is the top of the iceberg though. What I had in mind when I began this thread was to have more than a copy/paste of pg_regress, but an infrastructure that people could use to create and customize tests by having an additional control layer on the cluster itself. For example, testing replication is not only a matter of creating and setting up the nodes, but you might want to be able to initialize, add, remove nodes during the tests. Node addition would be either a new fresh master (this would be damn useful for a test suite for logical replication I think), or a slave node with custom recovery parameters to test replication, as well as PITR, archiving, etc. Then you need to be able to run SQL commands on top of that to check if the results are consistent with what you want.
A possible input for a test that users could provide would be something like that:
# Node information for tests
nodes
{
{node1, postgresql.conf params, recovery.conf params}
{node2, postgresql.conf params, recovery.conf params, slave of node1}
}
# Run test
init node1
run_sql node1 file1.sql
# Check output
init node2
run_sql node2 file2.sql
The main problem is actually how to do that. Having some smart shell infrastructure would be simple and would facilitate (?) the maintenance of code used to run the tests. On the contrary having a C program would make the maintenance of code to run the tests more difficult (?) for a trade with more readable test suite input like the one I wrote above. This might also make the test input more readable for a human eye, in the shape of what is already available in src/test/isolation.
Another possibility could be also to integrate directly a recovery/backup manager in PG core, and have some tests for it, or even include those tests directly with pg_basebackup or an upper layer of it.
> There of course is room to add as many replication tests as you like,
> and the main 141 tests fed into the master could be extended to feed
> more data and such.
>
> The main drawbacks that I don't care for are:
>
> 1) 'make check' becomes 'sudo make check' because it needs permission
> to run chroot.
-1 for that developers should not need to use root to run regression suite.
> 2) I have no win32 version of the logic
For a first shot I am not sure that it matters much.
> The main advantages that I like about this design are:
>
> 1) Only one system is required. The developer does not need network
> access to a second replication system. Moreover, multiple database
> clusters can be established with interesting replication hierarchies between
> them, and the cost of each additional cluster is just another chroot
> environment
An assumption of the test suite is I think to allow developers to check for bugs on a local server only. This facilitates how the test suite is written and you don't need to enter in things like VM settings or cross-environment tests, things that could be done already nicely by frameworks of the type Jenkins. What I think people would like to have is that:
cd src/test/replication && make check/installcheck
And have the test run for them.
Regards,
--
Michael
Michael
Re: In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
From
Andres Freund
Date:
On 2014-01-06 09:12:03 -0800, Mark Dilger wrote: > The reason I was going to all the trouble of creating > chrooted environments was to be able to replicate > clusters that have tablespaces. Not doing so makes > the test code simpler at the expense of reducing > test coverage. > I am using the same binaries. The chroot directories > are not "chroot jails". I'm intentionally bind mounting > out to all the other directories on the system, except > the other clusters' data directories and tablespace > directories. The purpose of the chroot is to make the > paths the same on all clusters without the clusters > clobbering each other. I don't think the benefit of being able to test tablespaces without restarts comes even close to offsetting the cost of requiring sudo permissions and introducing OS dependencies. E.g. there's pretty much no hope of making this work sensibly on windows. So I'd just leave out that part. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
From
David Johnston
Date:
Andres Freund-3 wrote > On 2014-01-06 09:12:03 -0800, Mark Dilger wrote: >> The reason I was going to all the trouble of creating >> chrooted environments was to be able to replicate >> clusters that have tablespaces. Not doing so makes >> the test code simpler at the expense of reducing >> test coverage. > >> I am using the same binaries. The chroot directories >> are not "chroot jails". I'm intentionally bind mounting >> out to all the other directories on the system, except >> the other clusters' data directories and tablespace >> directories. The purpose of the chroot is to make the >> paths the same on all clusters without the clusters >> clobbering each other. > > I don't think the benefit of being able to test tablespaces without > restarts comes even close to offsetting the cost of requiring sudo > permissions and introducing OS dependencies. E.g. there's pretty much no > hope of making this work sensibly on windows. > > So I'd just leave out that part. Only skimming this thread but even if only a handful of buildfarm animals can run this extended test bundle because of the restrictive requirements it is likely better than discarding them altogether. The main thing in this case is to segregate out this routine so that it has to be invoked explicitly and ideally in a "ignore if pre-reqs are missing" manner. Increasing the likelihood and frequency of test runs in what is a fairly popular platform and that covers non-OS specific code as well has benefits. As long at it doesn't poison anything else I don't see that much harm coming of it. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Re-In-core-regression-tests-for-replication-cascading-archiving-PITR-etc-Michael-Paquier-tp5785400p5785555.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
From
Mark Dilger
Date:
I was already starting to consider making the chroot logic optional, based on the resistence expressed by folks on this thread.
How about the following:
During the configure phase, it checks for chroot and setuid and friends that it will need.
The regression suite has config parameters to specify where the chroot directories are to live, defaulting to something sensible.
We have two almost identical make targets, called something like 'replicationcheck' and 'sudofullreplicationcheck', and only do the chroot stuff if uid=0, the directories exist, and the bind mounts exist, and the make target was the 'sudofullreplicationcheck'. The tablespace tests would have to be optional, only running in the full test and not the non-full test, and that makes some complications with having two different expectations (in the sense of the results/ vs. expected/ directories).
I'm inclined to change the name of the tests from 'replicationtest' or 'replicationcheck' to something broader like 'clustercheck', owing to the expectation that more than replication could be tested in this framework. The "sudofull" prefix is just a placefiller -- I don't like that naming convention. Not sure about the name to use.
mark
How about the following:
During the configure phase, it checks for chroot and setuid and friends that it will need.
The regression suite has config parameters to specify where the chroot directories are to live, defaulting to something sensible.
We have two almost identical make targets, called something like 'replicationcheck' and 'sudofullreplicationcheck', and only do the chroot stuff if uid=0, the directories exist, and the bind mounts exist, and the make target was the 'sudofullreplicationcheck'. The tablespace tests would have to be optional, only running in the full test and not the non-full test, and that makes some complications with having two different expectations (in the sense of the results/ vs. expected/ directories).
I'm inclined to change the name of the tests from 'replicationtest' or 'replicationcheck' to something broader like 'clustercheck', owing to the expectation that more than replication could be tested in this framework. The "sudofull" prefix is just a placefiller -- I don't like that naming convention. Not sure about the name to use.
mark
On Monday, January 6, 2014 10:17 AM, David Johnston <polobo@yahoo.com> wrote:
Andres Freund-3 wrote
> On 2014-01-06 09:12:03 -0800, Mark Dilger wrote:
>> The reason I was going to all the trouble of creating
>> chrooted environments was to be able to replicate
>> clusters that have tablespaces. Not doing so makes
>> the test code simpler at the expense of reducing
>> test coverage.
>
>> I am using the same binaries. The chroot directories
>> are not "chroot jails". I'm intentionally bind mounting
>> out to all the other directories on the system, except
>> the other clusters' data directories and tablespace
>> directories. The purpose of the chroot is to make the
>> paths the same on all clusters without the clusters
>> clobbering each other.
>
> I don't think the benefit of being able to test tablespaces without
> restarts comes even close to offsetting the cost of requiring sudo
> permissions and introducing OS dependencies. E.g. there's pretty much no
> hope of making this work sensibly on windows.
>
> So I'd just leave out that part.
Only skimming this thread but even if only a handful of buildfarm animals
can run this extended test bundle because of the restrictive requirements it
is likely better than discarding them altogether. The main thing in this
case is to segregate out this routine so that it has to be invoked
explicitly and ideally in a "ignore if pre-reqs are missing" manner.
Increasing the likelihood and frequency of test runs in what is a fairly
popular platform and that covers non-OS specific code as well has benefits.
As long at it doesn't poison anything else I don't see that much harm coming
of it.
David J.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Re-In-core-regression-tests-for-replication-cascading-archiving-PITR-etc-Michael-Paquier-tp5785400p5785555.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
> On 2014-01-06 09:12:03 -0800, Mark Dilger wrote:
>> The reason I was going to all the trouble of creating
>> chrooted environments was to be able to replicate
>> clusters that have tablespaces. Not doing so makes
>> the test code simpler at the expense of reducing
>> test coverage.
>
>> I am using the same binaries. The chroot directories
>> are not "chroot jails". I'm intentionally bind mounting
>> out to all the other directories on the system, except
>> the other clusters' data directories and tablespace
>> directories. The purpose of the chroot is to make the
>> paths the same on all clusters without the clusters
>> clobbering each other.
>
> I don't think the benefit of being able to test tablespaces without
> restarts comes even close to offsetting the cost of requiring sudo
> permissions and introducing OS dependencies. E.g. there's pretty much no
> hope of making this work sensibly on windows.
>
> So I'd just leave out that part.
Only skimming this thread but even if only a handful of buildfarm animals
can run this extended test bundle because of the restrictive requirements it
is likely better than discarding them altogether. The main thing in this
case is to segregate out this routine so that it has to be invoked
explicitly and ideally in a "ignore if pre-reqs are missing" manner.
Increasing the likelihood and frequency of test runs in what is a fairly
popular platform and that covers non-OS specific code as well has benefits.
As long at it doesn't poison anything else I don't see that much harm coming
of it.
David J.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Re-In-core-regression-tests-for-replication-cascading-archiving-PITR-etc-Michael-Paquier-tp5785400p5785555.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Re: In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
From
Heikki Linnakangas
Date:
On 01/06/2014 07:12 PM, Mark Dilger wrote: > The reason I was going to all the trouble of creating > chrooted environments was to be able to replicate > clusters that have tablespaces. You can remove and recreate the symlink in pg_tblspc directory, after creating the cluster, to point it to a different location. It might be a bit tricky to do that if you have two clusters running at the same time, but it's probably easier than chrooting anyway. For example: 1. stop the standby 2. create the tablespace in master 3. stop master 4. mv the tablespace directory, and modify the symlink in master to point to the new location 5. start standby. It will replay the tablespace creation in the original location 6. restart master. You now have the same tablespace in master and standby, but they point to different locations. This doesn't allow dynamically creating and dropping tablespaces during tests, but at least it gives you one tablespace to use. Another idea would be to do something like chroot, but more lightweight, using FUSE, private mount namespaces, or cgroups. - Heikki
Re: In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
From
Tom Lane
Date:
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > Another idea would be to do something like chroot, but more lightweight, > using FUSE, private mount namespaces, or cgroups. I thought the goal here was to have a testing framework that (a) is portable to every platform we support and (b) doesn't require root privileges to run. None of those options sound like they'll help meet those requirements. regards, tom lane
Re: In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
From
Andres Freund
Date:
On 2014-01-07 10:27:14 -0500, Tom Lane wrote: > Heikki Linnakangas <hlinnakangas@vmware.com> writes: > > Another idea would be to do something like chroot, but more lightweight, > > using FUSE, private mount namespaces, or cgroups. > > I thought the goal here was to have a testing framework that (a) is > portable to every platform we support and (b) doesn't require root > privileges to run. None of those options sound like they'll help meet > those requirements. Seconded. Perhaps the solution is to simply introduce tablespaces located relative to PGDATA? That'd be fracking useful anyway. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
From
Mark Dilger
Date:
On Tuesday, January 7, 2014 7:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: Heikki Linnakangas <hlinnakangas@vmware.com> writes: > Another idea would be to do something like chroot, but more lightweight, > using FUSE, private mount namespaces, or cgroups. I thought the goal here was to have a testing framework that (a) is portable to every platform we support and (b) doesn't require root privileges to run. None of those options sound like they'll help meet those requirements. regards, tom lane If I drop the idea of sudo/chroot and punt for now on testing tablespaces under replication, it should be possible to test the rest of the replication system in a way that meets (a) and (b). Perhaps Andres' idea of tablespaces relative to the data directory will get implemented some day, at which point we wouldn't be punting quite so much. But until then, punt. Would it make sense for this to just be part of 'make check'? That would require creating multiple database clusters under multiple data directories, and having them bind to multiple ports or unix domain sockets. Is that a problem? What's the logic of having replication testing separated from the other pg_regress tests? Granted, not every user of postgres uses replication, but that's true for lots of features, and we don't split things like json into separate test suites. Vendors who run 'make check' as part of their packaging of postgresql would probably benefit from knowing if replication doesn't work on their distro, and they may not change their packaging systems to include a second 'make replicationcheck' step. mark
Re: In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
From
Tom Lane
Date:
Mark Dilger <markdilger@yahoo.com> writes: > Would it make sense for this to just be part of 'make check'? Probably not, as (I imagine) it will take quite a bit longer than "make check" does today. People who are not working on replication related features will be annoyed if a test cycle starts taking 10X longer than it used to, for tests of no value to them. It's already not the case that "make check" runs every available automated test; the isolation tests, the PL tests, the contrib tests are all separate. There is a make check-world, which I think should reasonably run all of these. regards, tom lane
Re: In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
From
Mark Dilger
Date:
Michael Paquier wrote:
> A possible input for a test that users could provide would be something> like that:
>
> # Node information for tests
> nodes> {
> {node1, postgresql.conf params, recovery.conf params}
> {node2, postgresql.conf params, recovery.conf params, slave of node1}
> }
> # Run test
> init node1
> run_sql node1 file1.sql
> # Check output
> init node2
> run_sql node2 file2.sql
> # Check that results are fine
> # Process
>
> The main problem is actually how to do that. Having some smart shell
> infrastructure would be simple and would facilitate (?) the maintenance
> of code used to run the tests. On the contrary having a C program would
> make the maintenance of code to run the tests more difficult (?) for a
> trade with more readable test suite input like the one I wrote above.
> This might also make the test input more readable for a human eye, in
> the shape of what is already available in src/test/isolation.
I like making this part of src/test/isolation, if folks do not object.
The core infrastructure in src/test/isolation seems applicable to
replication testing, and I'd hate to duplicate that code.
As for the node setup in your example above, I don't think it can be as
simple as defining nodes first, then running tests. The configurations
themselves may need to be changed during the execution of a test, and
services stopped and started, all under test control and specified in
the same easy format.
I have started working on this, and will post WIP patches from time to
time, unless you all feel the need to point me in a different direction.
mark
On Mon, Jan 6, 2014 at 4:51 AM, Mark Dilger <markdilger@yahoo.com> wrote:
> I am building a regression test system for replication and came across
> this email thread. I have gotten pretty far into my implementation, but
> would be happy to make modifications if folks have improvements to
> suggest. If the community likes my design, or a modified version based
> on your feedback, I'd be happy to submit a patch.
Yeah, this would be nice to look at, core code definitely needs to have some more infrastructure for such a test suite. I didn't get the time to go back to it since I began this thread though :)
> Currently I am canibalizing src/test/pg_regress.c, but that could instead
> be copied to src/test/pg_regress_replication.c or whatever. The regression
> test creates and configures multiple database clusters, sets up the
> replication configuration for them, runs them each in nonprivileged mode
> and bound to different ports, feeds all the existing 141 regression tests
> into the master database with the usual checking that all the right results
> are obtained, and then checks that the standbys have the expected
> data. This is possible all on one system because the database clusters
> are chroot'ed to see their own /data directory and not the /data directory
> of the other chroot'ed clusters, although the rest of the system, like /bin
> and /etc and /dev are all bind mounted and visible to each cluster.
Having vanilla regressions run in a cluster with multiple nodes and check the results on a standby is the top of the iceberg though. What I had in mind when I began this thread was to have more than a copy/paste of pg_regress, but an infrastructure that people could use to create and customize tests by having an additional control layer on the cluster itself. For example, testing replication is not only a matter of creating and setting up the nodes, but you might want to be able to initialize, add, remove nodes during the tests. Node addition would be either a new fresh master (this would be damn useful for a test suite for logical replication I think), or a slave node with custom recovery parameters to test replication, as well as PITR, archiving, etc. Then you need to be able to run SQL commands on top of that to check if the results are consistent with what you want.
A possible input for a test that users could provide would be something like that:
# Node information for tests
nodes
{
{node1, postgresql.conf params, recovery.conf params}
{node2, postgresql.conf params, recovery.conf params, slave of node1}
}
# Run test
init node1
run_sql node1 file1.sql
# Check output
init node2
run_sql node2 file2.sql
# Check that results are fine
# Process
> There of course is room to add as many replication tests as you like,
> and the main 141 tests fed into the master could be extended to feed
> more data and such.
>
> The main drawbacks that I don't care for are:
>
> 1) 'make check' becomes 'sudo make check' because it needs permission
> to run chroot.
> 2) I have no win32 version of the logic
> The main advantages that I like about this design are:
>
> 1) Only one system is required. The developer does not need network
> access to a second replication system. Moreover, multiple database
> clusters can be established with interesting replication hierarchies between
> them, and the cost of each additional cluster is just another chroot
> environment
> A possible input for a test that users could provide would be something> like that:
>
> # Node information for tests
> nodes> {
> {node1, postgresql.conf params, recovery.conf params}
> {node2, postgresql.conf params, recovery.conf params, slave of node1}
> }
> # Run test
> init node1
> run_sql node1 file1.sql
> # Check output
> init node2
> run_sql node2 file2.sql
> # Check that results are fine
> # Process
>
> The main problem is actually how to do that. Having some smart shell
> infrastructure would be simple and would facilitate (?) the maintenance
> of code used to run the tests. On the contrary having a C program would
> make the maintenance of code to run the tests more difficult (?) for a
> trade with more readable test suite input like the one I wrote above.
> This might also make the test input more readable for a human eye, in
> the shape of what is already available in src/test/isolation.
I like making this part of src/test/isolation, if folks do not object.
The core infrastructure in src/test/isolation seems applicable to
replication testing, and I'd hate to duplicate that code.
As for the node setup in your example above, I don't think it can be as
simple as defining nodes first, then running tests. The configurations
themselves may need to be changed during the execution of a test, and
services stopped and started, all under test control and specified in
the same easy format.
I have started working on this, and will post WIP patches from time to
time, unless you all feel the need to point me in a different direction.
mark
On Sunday, January 5, 2014 6:13 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Mon, Jan 6, 2014 at 4:51 AM, Mark Dilger <markdilger@yahoo.com> wrote:
> I am building a regression test system for replication and came across
> this email thread. I have gotten pretty far into my implementation, but
> would be happy to make modifications if folks have improvements to
> suggest. If the community likes my design, or a modified version based
> on your feedback, I'd be happy to submit a patch.
Yeah, this would be nice to look at, core code definitely needs to have some more infrastructure for such a test suite. I didn't get the time to go back to it since I began this thread though :)
> Currently I am canibalizing src/test/pg_regress.c, but that could instead
> be copied to src/test/pg_regress_replication.c or whatever. The regression
> test creates and configures multiple database clusters, sets up the
> replication configuration for them, runs them each in nonprivileged mode
> and bound to different ports, feeds all the existing 141 regression tests
> into the master database with the usual checking that all the right results
> are obtained, and then checks that the standbys have the expected
> data. This is possible all on one system because the database clusters
> are chroot'ed to see their own /data directory and not the /data directory
> of the other chroot'ed clusters, although the rest of the system, like /bin
> and /etc and /dev are all bind mounted and visible to each cluster.
Having vanilla regressions run in a cluster with multiple nodes and check the results on a standby is the top of the iceberg though. What I had in mind when I began this thread was to have more than a copy/paste of pg_regress, but an infrastructure that people could use to create and customize tests by having an additional control layer on the cluster itself. For example, testing replication is not only a matter of creating and setting up the nodes, but you might want to be able to initialize, add, remove nodes during the tests. Node addition would be either a new fresh master (this would be damn useful for a test suite for logical replication I think), or a slave node with custom recovery parameters to test replication, as well as PITR, archiving, etc. Then you need to be able to run SQL commands on top of that to check if the results are consistent with what you want.
A possible input for a test that users could provide would be something like that:
# Node information for tests
nodes
{
{node1, postgresql.conf params, recovery.conf params}
{node2, postgresql.conf params, recovery.conf params, slave of node1}
}
# Run test
init node1
run_sql node1 file1.sql
# Check output
init node2
run_sql node2 file2.sql
The main problem is actually how to do that. Having some smart shell infrastructure would be simple and would facilitate (?) the maintenance of code used to run the tests. On the contrary having a C program would make the maintenance of code to run the tests more difficult (?) for a trade with more readable test suite input like the one I wrote above. This might also make the test input more readable for a human eye, in the shape of what is already available in src/test/isolation.
Another possibility could be also to integrate directly a recovery/backup manager in PG core, and have some tests for it, or even include those tests directly with pg_basebackup or an upper layer of it.
> There of course is room to add as many replication tests as you like,
> and the main 141 tests fed into the master could be extended to feed
> more data and such.
>
> The main drawbacks that I don't care for are:
>
> 1) 'make check' becomes 'sudo make check' because it needs permission
> to run chroot.
-1 for that developers should not need to use root to run regression suite.
> 2) I have no win32 version of the logic
For a first shot I am not sure that it matters much.
> The main advantages that I like about this design are:
>
> 1) Only one system is required. The developer does not need network
> access to a second replication system. Moreover, multiple database
> clusters can be established with interesting replication hierarchies between
> them, and the cost of each additional cluster is just another chroot
> environment
An assumption of the test suite is I think to allow developers to check for bugs on a local server only. This facilitates how the test suite is written and you don't need to enter in things like VM settings or cross-environment tests, things that could be done already nicely by frameworks of the type Jenkins. What I think people would like to have is that:
cd src/test/replication && make check/installcheck
And have the test run for them.
Regards,
--
Michael
Michael
Re: In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
From
Michael Paquier
Date:
On Thu, Jan 9, 2014 at 12:34 PM, Mark Dilger <markdilger@yahoo.com> wrote: > Michael Paquier wrote: >> A possible input for a test that users could provide would be something> >> like that: >> >> # Node information for tests >> nodes> { >> {node1, postgresql.conf params, recovery.conf params} >> {node2, postgresql.conf params, recovery.conf params, slave of node1} >> } >> # Run test >> init node1 >> run_sql node1 file1.sql >> # Check output >> init node2 >> run_sql node2 file2.sql >> # Check that results are fine >> # Process >> >> The main problem is actually how to do that. Having some smart shell >> infrastructure would be simple and would facilitate (?) the maintenance >> of code used to run the tests. On the contrary having a C program would >> make the maintenance of code to run the tests more difficult (?) for a >> trade with more readable test suite input like the one I wrote above. >> This might also make the test input more readable for a human eye, in >> the shape of what is already available in src/test/isolation. > > I like making this part of src/test/isolation, if folks do not object. > The core infrastructure in src/test/isolation seems applicable to > replication testing, and I'd hate to duplicate that code. > > As for the node setup in your example above, I don't think it can be as > simple as defining nodes first, then running tests. The configurations > themselves may need to be changed during the execution of a test, and > services stopped and started, all under test control and specified in > the same easy format. Yes, my example was very basic :). What you actually need is the possibility to perform actions on nodes during a test run, basically: stop, start, init, reload, run SQL, change params/create new conf files (like putting a node in recovery could be = create recovery.conf + restart). The place of the code does not matter much, but don't think that it should be part of isolation as clustering and isolation are too different test suites. I would have for example seen that as src/test/cluster, with src/test/common for things that are shared between test infrastructures. As mentioned by Steve, the test suite of Slony might be interesting to look at to get some ideas. Regards, -- Michael
Re: In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > I thought the goal here was to have a testing framework that (a) is > portable to every platform we support and (b) doesn't require root > privileges to run. None of those options sound like they'll help meet > those requirements. FWIW, I hacked up a Perl-based testing system as a proof of concept some time ago. I can dust it off if anyone is interested. Perl has a very nice testing ecosystem and is probably the most portable language we support, other than C. My quick goals for the project were: * allow granular testing (ala Andrew's recent email, which reminded me of this) * allow stackable methods and dependencies * make it very easy to write new tests * test various features that are way too diificult in our existing system (e.g. PITR, fdws) * get some automated code coverage metrics (this one was tricky) * allow future git integration based on subsytems - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201401261211 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAlLlQeMACgkQvJuQZxSWSsiYhACggHJgQWB/Q2HEfjGZCwR3yEZg zMsAnAssOStAmMuaJEScCGHGKWYNow1v =zi0Y -----END PGP SIGNATURE-----