Thread: Re: In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier

I am building a regression test system for replication and came across
this email thread.  I have gotten pretty far into my implementation, but
would be happy to make modifications if folks have improvements to
suggest.  If the community likes my design, or a modified version based
on your feedback, I'd be happy to submit a patch.

Currently I am canibalizing src/test/pg_regress.c, but that could instead
be copied to src/test/pg_regress_replication.c or whatever.  The regression
test creates and configures multiple database clusters, sets up the
replication configuration for them, runs them each in nonprivileged mode
and bound to different ports, feeds all the existing 141 regression tests
into the master database with the usual checking that all the right results
are obtained, and then checks that the standbys have the expected
data.  This is possible all on one system because the database clusters
are chroot'ed to see their own /data directory and not the /data directory
of the other chroot'ed clusters, although the rest of the system, like /bin
and /etc and /dev are all bind mounted and visible to each cluster.

There of course is room to add as many replication tests as you like,
and the main 141 tests fed into the master could be extended to feed
more data and such.

The main drawbacks that I don't care for are:

1) 'make check' becomes 'sudo make check' because it needs permission
to run chroot.

2) I have no win32 version of the logic

3) Bind mounts either have to be created by the privileged pg_regress
process or have to be pre-existing on the system


#1 would not be as bad if pg_regress became pg_regress_replication, as
we could make the mantra into 'sudo make replicationcheck' or similar.
Splitting it from 'make check' also means IMHO that it could have heavier
tests that take longer to run, since people merely interested in building
and installing postgres would not be impacted by this.

#2 might be fixed by someone more familiar with win32 programming
than I am.

#3 cannot be avoided as far as I can tell, but we could chose between
the two options.  So far, I have chosen to set up the directory structure
and add the bind mount logic to my /etc/fstab only once, rather than
having this get recreated every time I invoke 'sudo make check'.  The
community might prefer to go the other way, and have the directories
and bind mounts get set up each invocation; I have avoided that thus
far as I don't want 'sudo make check' (or 'sudo make replicationcheck')
to abuse its raised privileges and muck with the filesystem in a way
that could cause the user unexpected problems.


The main advantages that I like about this design are:

1) Only one system is required.  The developer does not need network
access to a second replication system.  Moreover, multiple database
clusters can be established with interesting replication hierarchies between
them, and the cost of each additional cluster is just another chroot
environment

2) Checking out the sources from git and then running

    ./
configure && make && sudo make replicationtest

is not particularly difficult, assuming the directories and mounts are
in place, or alternatively assuming that 'sudo make regressioncheck'
creates them for you if they don't already exist.

Comments and advice sincerely solicited,

mark


<p dir="ltr"><p dir="ltr">-- <br /> greg<br /> On 5 Jan 2014 14:54, "Mark Dilger" <<a
href="mailto:markdilger@yahoo.com">markdilger@yahoo.com</a>>wrote:<br /> ><br /> > I am building a regression
testsystem for replication and came across<br /> > this email thread.  I have gotten pretty far into my
implementation,but<br /> > would be happy to make modifications if folks have improvements to<br /> > suggest. 
Ifthe community likes my design, or a modified version based<br /> > on your feedback, I'd be happy to submit a
patch.<pdir="ltr">This sounds pretty cool. The real trick will be in testing concurrent behaviour -- I.e. queries on
theslave when it's replaying logs at a certain point. But right now we have nothing so anything would be an
improvement.<br/><p dir="ltr">>  This is possible all on one system because the database clusters<br /> > are
chroot'edto see their own /data directory and not the /data directory<br /> > of the other chroot'ed clusters,
althoughthe rest of the system, like /bin<br /> > and /etc and /dev are all bind mounted and visible to each
cluster.<pdir="ltr">This isn't necessary. You can use the same binaries and run initdb with a different location just
fine.Then start up the database with -D to specify the directory.<br /> 


On Mon, Jan 6, 2014 at 4:51 AM, Mark Dilger <markdilger@yahoo.com> wrote:
> I am building a regression test system for replication and came across
> this email thread.  I have gotten pretty far into my implementation, but
> would be happy to make modifications if folks have improvements to
> suggest.  If the community likes my design, or a modified version based
> on your feedback, I'd be happy to submit a patch.
Yeah, this would be nice to look at, core code definitely needs to have some more infrastructure for such a test suite. I didn't get the time to go back to it since I began this thread though :)

> Currently I am canibalizing src/test/pg_regress.c, but that could instead
> be copied to src/test/pg_regress_replication.c or whatever.  The regression
> test creates and configures multiple database clusters, sets up the
> replication configuration for them, runs them each in nonprivileged mode
> and bound to different ports, feeds all the existing 141 regression tests
> into the master database with the usual checking that all the right results
> are obtained, and then checks that the standbys have the expected
> data.  This is possible all on one system because the database clusters
> are chroot'ed to see their own /data directory and not the /data directory
> of the other chroot'ed clusters, although the rest of the system, like /bin
> and /etc and /dev are all bind mounted and visible to each cluster.
Having vanilla regressions run in a cluster with multiple nodes and check the results on a standby is the top of the iceberg though. What I had in mind when I began this thread was to have more than a copy/paste of pg_regress, but an infrastructure that people could use to create and customize tests by having an additional control layer on the cluster itself. For example, testing replication is not only a matter of creating and setting up the nodes, but you might want to be able to initialize, add, remove nodes during the tests. Node addition would be either a new fresh master (this would be damn useful for a test suite for logical replication I think), or a slave node with custom recovery parameters to test replication, as well as PITR, archiving, etc. Then you need to be able to run SQL commands on top of that to check if the results are consistent with what you want.

A possible input for a test that users could provide would be something like that:
# Node information for tests
nodes
{
    {node1, postgresql.conf params, recovery.conf params}
    {node2, postgresql.conf params, recovery.conf params, slave of node1}
}
# Run test
init node1
run_sql node1 file1.sql
# Check output
init node2
run_sql node2 file2.sql
# Check that results are fine
# Process

The main problem is actually how to do that. Having some smart shell infrastructure would be simple and would facilitate (?) the maintenance of code used to run the tests. On the contrary having a C program would make the maintenance of code to run the tests more difficult (?) for a trade with more readable test suite input like the one I wrote above. This might also make the test input more readable for a human eye, in the shape of what is already available in src/test/isolation.

Another possibility could be also to integrate directly a recovery/backup manager in PG core, and have some tests for it, or even include those tests directly with pg_basebackup or an upper layer of it.

> There of course is room to add as many replication tests as you like,
> and the main 141 tests fed into the master could be extended to feed
> more data and such.
>
> The main drawbacks that I don't care for are:
>
> 1) 'make check' becomes 'sudo make check' because it needs permission
> to run chroot.
-1 for that developers should not need to use root to run regression suite.

> 2) I have no win32 version of the logic
For a first shot I am not sure that it matters much.

> The main advantages that I like about this design are:
>
> 1) Only one system is required.  The developer does not need network
> access to a second replication system.  Moreover, multiple database
> clusters can be established with interesting replication hierarchies between
> them, and the cost of each additional cluster is just another chroot
> environment
An assumption of the test suite is I think to allow developers to check for bugs on a local server only. This facilitates how the test suite is written and you don't need to enter in things like VM settings or cross-environment tests, things that could be done already nicely by frameworks of the type Jenkins. What I think people would like to have is that:
cd src/test/replication && make check/installcheck
And have the test run for them.

Regards,
--
Michael
On 01/05/2014 09:13 PM, Michael Paquier wrote:
>
>
> On Mon, Jan 6, 2014 at 4:51 AM, Mark Dilger <markdilger@yahoo.com 
> <mailto:markdilger@yahoo.com>> wrote:
> > I am building a regression test system for replication and came across
> > this email thread.  I have gotten pretty far into my implementation, but
> > would be happy to make modifications if folks have improvements to
> > suggest.  If the community likes my design, or a modified version based
> > on your feedback, I'd be happy to submit a patch.
> Yeah, this would be nice to look at, core code definitely needs to 
> have some more infrastructure for such a test suite. I didn't get the 
> time to go back to it since I began this thread though :)
>
> > Currently I am canibalizing src/test/pg_regress.c, but that could 
> instead
> > be copied to src/test/pg_regress_replication.c or whatever.  The 
> regression
> > test creates and configures multiple database clusters, sets up the
> > replication configuration for them, runs them each in nonprivileged mode
> > and bound to different ports, feeds all the existing 141 regression 
> tests
> > into the master database with the usual checking that all the right 
> results
> > are obtained, and then checks that the standbys have the expected
> > data.  This is possible all on one system because the database clusters
> > are chroot'ed to see their own /data directory and not the /data 
> directory
> > of the other chroot'ed clusters, although the rest of the system, 
> like /bin
> > and /etc and /dev are all bind mounted and visible to each cluster.
> Having vanilla regressions run in a cluster with multiple nodes and 
> check the results on a standby is the top of the iceberg though. What 
> I had in mind when I began this thread was to have more than a 
> copy/paste of pg_regress, but an infrastructure that people could use 
> to create and customize tests by having an additional control layer on 
> the cluster itself. For example, testing replication is not only a 
> matter of creating and setting up the nodes, but you might want to be 
> able to initialize, add, remove nodes during the tests. Node addition 
> would be either a new fresh master (this would be damn useful for a 
> test suite for logical replication I think), or a slave node with 
> custom recovery parameters to test replication, as well as PITR, 
> archiving, etc. Then you need to be able to run SQL commands on top of 
> that to check if the results are consistent with what you want.
>

I'd encourage anyone looking at implementing a testing suite for 
replication to look at the stuff we did for Slony at least to get some 
ideas.

We wrote a test driver framework (clustertest - 
https://github.com/clustertest/clustertest-framework) then some 
Javascript base classes for common types of operations.  An individual 
test is then written in Javascript that invokes methods either in the 
framework or base-class to do most of the interesting work.

http://git.postgresql.org/gitweb/?p=slony1-engine.git;a=blob;f=clustertest/disorder/tests/EmptySet.js;h=7b4850c1d24036067f5a659b990c7f05415ed967;hb=HEAD

as an example




> A possible input for a test that users could provide would be 
> something like that:
> # Node information for tests
> nodes
> {
>     {node1, postgresql.conf params, recovery.conf params}
>     {node2, postgresql.conf params, recovery.conf params, slave of node1}
> }
> # Run test
> init node1
> run_sql node1 file1.sql
> # Check output
> init node2
> run_sql node2 file2.sql
> # Check that results are fine
> # Process
>
> The main problem is actually how to do that. Having some smart shell 
> infrastructure would be simple and would facilitate (?) the 
> maintenance of code used to run the tests. On the contrary having a C 
> program would make the maintenance of code to run the tests more 
> difficult (?) for a trade with more readable test suite input like the 
> one I wrote above. This might also make the test input more readable 
> for a human eye, in the shape of what is already available in 
> src/test/isolation.
>
> Another possibility could be also to integrate directly a 
> recovery/backup manager in PG core, and have some tests for it, or 
> even include those tests directly with pg_basebackup or an upper layer 
> of it.
>
> > There of course is room to add as many replication tests as you like,
> > and the main 141 tests fed into the master could be extended to feed
> > more data and such.
> >
> > The main drawbacks that I don't care for are:
> >
> > 1) 'make check' becomes 'sudo make check' because it needs permission
> > to run chroot.
> -1 for that developers should not need to use root to run regression 
> suite.
>
> > 2) I have no win32 version of the logic
> For a first shot I am not sure that it matters much.
>
> > The main advantages that I like about this design are:
> >
> > 1) Only one system is required.  The developer does not need network
> > access to a second replication system.  Moreover, multiple database
> > clusters can be established with interesting replication hierarchies 
> between
> > them, and the cost of each additional cluster is just another chroot
> > environment
> An assumption of the test suite is I think to allow developers to 
> check for bugs on a local server only. This facilitates how the test 
> suite is written and you don't need to enter in things like VM 
> settings or cross-environment tests, things that could be done already 
> nicely by frameworks of the type Jenkins. What I think people would 
> like to have is that:
> cd src/test/replication && make check/installcheck
> And have the test run for them.
>
> Regards,
> -- 
> Michael




On 2014-01-06 01:25:57 +0000, Greg Stark wrote:
> -- 
> greg
> On 5 Jan 2014 14:54, "Mark Dilger" <markdilger@yahoo.com> wrote:
> >
> > I am building a regression test system for replication and came across
> > this email thread.  I have gotten pretty far into my implementation, but
> > would be happy to make modifications if folks have improvements to
> > suggest.  If the community likes my design, or a modified version based
> > on your feedback, I'd be happy to submit a patch.
> 
> This sounds pretty cool. The real trick will be in testing concurrent
> behaviour -- I.e. queries on the slave when it's replaying logs at a
> certain point. But right now we have nothing so anything would be an
> improvement.

Abhijit Menon-Sen (CCed) has prototyped an isolationtester version that
can connect to multiple nodes. Once we've got automated setup of
multiple nodes, pursuing that makes sense again.

> >  This is possible all on one system because the database clusters
> > are chroot'ed to see their own /data directory and not the /data directory
> > of the other chroot'ed clusters, although the rest of the system, like
> /bin
> > and /etc and /dev are all bind mounted and visible to each cluster.
> 
> This isn't necessary. You can use the same binaries and run initdb with a
> different location just fine. Then start up the database with -D to specify
> the directory.

Very emphathically seconded. It should absolutely not be necessary to
use different chroots. Pretty much the only case that will require that
is tablespaces unless you do some pretty ugly hackery...

In almost all scenarios you'll have to change either
unix_socket_directory or port (recommended) in addition to the datadir -
but that's not a problem.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



The reason I was going to all the trouble of creating
chrooted environments was to be able to replicate
clusters that have tablespaces.  Not doing so makes
the test code simpler at the expense of reducing
test coverage.

I am using the same binaries.  The chroot directories
are not "chroot jails".  I'm intentionally bind mounting
out to all the other directories on the system, except
the other clusters' data directories and tablespace
directories.  The purpose of the chroot is to make the
paths the same on all clusters without the clusters
clobbering each other.

So:

(the '->' means "is bind mounted to")

/master/bin -> /bin
/master/dev -> /dev
/master/etc -> /etc
/master/lib -> /lib
/master/usr -> /usr
/master/data
/master/tablespace

/hotstandby/bin -> /bin
/hotstandby/dev -> /dev
/hotstandby/etc -> /etc
/hotstandby/lib -> /lib
/hotstandby/usr -> /usr
/hotstandby/data
/hotstandby/tablespace

So from inside the master chroot, you see the system's
/bin as /bin, the system's /dev as /dev, etc, but what
you see as /data and /tablespace are your own private
ones.  Likewise from the hotstandby chroot.  But since
the binaries are in something like

/home/myuser/postgresql/src/test/regress/tmp_check/install/usr/local/pgsql/bin

each cluster uses the same binaries, refered to by the
same path.



On Sunday, January 5, 2014 5:25 PM, Greg Stark <stark@mit.edu> wrote:
--
greg
On 5 Jan 2014 14:54, "Mark Dilger" <markdilger@yahoo.com> wrote:
>
> I am building a regression test system for replication and came across
> this email thread.  I have gotten pretty far into my implementation, but
> would be happy to make modifications if folks have improvements to
> suggest.  If the community likes my design, or a modified version based
> on your feedback, I'd be happy to submit a patch.
This sounds pretty cool. The real trick will be in testing concurrent behaviour -- I.e. queries on the slave when it's replaying logs at a certain point. But right now we have nothing so anything would be an improvement.

>  This is possible all on one system because the database clusters
> are chroot'ed to see their own /data directory and not the /data directory
> of the other chroot'ed clusters, although the rest of the system, like /bin
> and /etc and /dev are all bind mounted and visible to each cluster.
This isn't necessary. You can use the same binaries and run initdb with a different location just fine. Then start up the database with -D to specify the directory.



I agree that merely setting up masters and slaves is
the tip of the iceberg.  It seems to be what needs
to be tackled first, though, because until we have
a common framework, we cannot all contribute
tests to it.

I imagine setting up a whole hierarchy of master,
hotstandbys, warmstandbys, etc., and having
over the course of the test, base backups made,
new clusters spun up from those backups,
masters stopped and standbys promoted to
master, etc.

But I also imagine there needs to be SQL run
on the master that changes the data, so that
replication of those changes can be confirmed.
There are lots of ways to change data, such as
through the large object interface.  The current
'make check' test suite exercises all those
code paths.  If we incorporate them into our
replication testing suite, then we get the
advantage of knowing that all those paths are
being tested in our suite as well.  And if some
new interface, call it huge object, ever gets
made, then there should be a hugeobject.sql
in src/test/regress/sql, and we automatically
get that in our replication tests.

mark


On Sunday, January 5, 2014 6:13 PM, Michael Paquier <michael.paquier@gmail.com> wrote:


On Mon, Jan 6, 2014 at 4:51 AM, Mark Dilger <markdilger@yahoo.com> wrote:
> I am building a regression test system for replication and came across
> this email thread.  I have gotten pretty far into my implementation, but
> would be happy to make modifications if folks have improvements to
> suggest.  If the community likes my design, or a modified version based
> on your feedback, I'd be happy to submit a patch.
Yeah, this would be nice to look at, core code definitely needs to have some more infrastructure for such a test suite. I didn't get the time to go back to it since I began this thread though :)

> Currently I am canibalizing src/test/pg_regress.c, but that could instead
> be copied to src/test/pg_regress_replication.c or whatever.  The regression
> test creates and configures multiple database clusters, sets up the
> replication configuration for them, runs them each in nonprivileged mode
> and bound to different ports, feeds all the existing 141 regression tests
> into the master database with the usual checking that all the right results
> are obtained, and then checks that the standbys have the expected
> data.  This is possible all on one system because the database clusters
> are chroot'ed to see their own /data directory and not the /data directory
> of the other chroot'ed clusters, although the rest of the system, like /bin
> and /etc and /dev are all bind mounted and visible to each cluster.
Having vanilla regressions run in a cluster with multiple nodes and check the results on a standby is the top of the iceberg though. What I had in mind when I began this thread was to have more than a copy/paste of pg_regress, but an infrastructure that people could use to create and customize tests by having an additional control layer on the cluster itself. For example, testing replication is not only a matter of creating and setting up the nodes, but you might want to be able to initialize, add, remove nodes during the tests. Node addition would be either a new fresh master (this would be damn useful for a test suite for logical replication I think), or a slave node with custom recovery parameters to test replication, as well as PITR, archiving, etc. Then you need to be able to run SQL commands on top of that to check if the results are consistent with what you want.

A possible input for a test that users could provide would be something like that:
# Node information for tests
nodes
{
    {node1, postgresql.conf params, recovery.conf params}
    {node2, postgresql.conf params, recovery.conf params, slave of node1}
}
# Run test
init node1
run_sql node1 file1.sql
# Check output
init node2
run_sql node2 file2.sql
# Check that results are fine
# Process

The main problem is actually how to do that. Having some smart shell infrastructure would be simple and would facilitate (?) the maintenance of code used to run the tests. On the contrary having a C program would make the maintenance of code to run the tests more difficult (?) for a trade with more readable test suite input like the one I wrote above. This might also make the test input more readable for a human eye, in the shape of what is already available in src/test/isolation.

Another possibility could be also to integrate directly a recovery/backup manager in PG core, and have some tests for it, or even include those tests directly with pg_basebackup or an upper layer of it.

> There of course is room to add as many replication tests as you like,
> and the main 141 tests fed into the master could be extended to feed
> more data and such.
>
> The main drawbacks that I don't care for are:
>
> 1) 'make check' becomes 'sudo make check' because it needs permission
> to run chroot.
-1 for that developers should not need to use root to run regression suite.

> 2) I have no win32 version of the logic
For a first shot I am not sure that it matters much.


> The main advantages that I like about this design are:
>
> 1) Only one system is required.  The developer does not need network
> access to a second replication system.  Moreover, multiple database
> clusters can be established with interesting replication hierarchies between
> them, and the cost of each additional cluster is just another chroot
> environment

An assumption of the test suite is I think to allow developers to check for bugs on a local server only. This facilitates how the test suite is written and you don't need to enter in things like VM settings or cross-environment tests, things that could be done already nicely by frameworks of the type Jenkins. What I think people would like to have is that:
cd src/test/replication && make check/installcheck
And have the test run for them.

Regards,
--
Michael


On 2014-01-06 09:12:03 -0800, Mark Dilger wrote:
> The reason I was going to all the trouble of creating
> chrooted environments was to be able to replicate
> clusters that have tablespaces.  Not doing so makes
> the test code simpler at the expense of reducing
> test coverage.

> I am using the same binaries.  The chroot directories
> are not "chroot jails".  I'm intentionally bind mounting
> out to all the other directories on the system, except
> the other clusters' data directories and tablespace
> directories.  The purpose of the chroot is to make the
> paths the same on all clusters without the clusters
> clobbering each other.

I don't think the benefit of being able to test tablespaces without
restarts comes even close to offsetting the cost of requiring sudo
permissions and introducing OS dependencies. E.g. there's pretty much no
hope of making this work sensibly on windows.

So I'd just leave out that part.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Andres Freund-3 wrote
> On 2014-01-06 09:12:03 -0800, Mark Dilger wrote:
>> The reason I was going to all the trouble of creating
>> chrooted environments was to be able to replicate
>> clusters that have tablespaces.  Not doing so makes
>> the test code simpler at the expense of reducing
>> test coverage.
>
>> I am using the same binaries.  The chroot directories
>> are not "chroot jails".  I'm intentionally bind mounting
>> out to all the other directories on the system, except
>> the other clusters' data directories and tablespace
>> directories.  The purpose of the chroot is to make the
>> paths the same on all clusters without the clusters
>> clobbering each other.
>
> I don't think the benefit of being able to test tablespaces without
> restarts comes even close to offsetting the cost of requiring sudo
> permissions and introducing OS dependencies. E.g. there's pretty much no
> hope of making this work sensibly on windows.
>
> So I'd just leave out that part.

Only skimming this thread but even if only a handful of buildfarm animals
can run this extended test bundle because of the restrictive requirements it
is likely better than discarding them altogether.  The main thing in this
case is to segregate out this routine so that it has to be invoked
explicitly and ideally in a "ignore if pre-reqs are missing" manner.

Increasing the likelihood and frequency of test runs in what is a fairly
popular platform and that covers non-OS specific code as well has benefits.
As long at it doesn't poison anything else I don't see that much harm coming
of it.

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Re-In-core-regression-tests-for-replication-cascading-archiving-PITR-etc-Michael-Paquier-tp5785400p5785555.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



I was already starting to consider making the chroot logic optional, based on the resistence expressed by folks on this thread.

How about the following:

During the configure phase, it checks for chroot and setuid and friends that it will need.

The regression suite has config parameters to specify where the chroot directories are to live, defaulting to something sensible.

We have two almost identical make targets, called something like 'replicationcheck' and 'sudofullreplicationcheck', and only do the chroot stuff if uid=0, the directories exist, and the bind mounts exist, and the make target was the 'sudofullreplicationcheck'.  The tablespace tests would have to be optional, only running in the full test and not the non-full test, and that makes some complications with having two different expectations (in the sense of the results/ vs. expected/ directories).

I'm inclined to change the name of the tests from 'replicationtest' or 'replicationcheck' to something broader like 'clustercheck', owing to the expectation that more than replication could be tested in this framework.  The "sudofull" prefix is just a placefiller -- I don't like that naming convention.  Not sure about the name to use.


mark



On Monday, January 6, 2014 10:17 AM, David Johnston <polobo@yahoo.com> wrote:
Andres Freund-3 wrote
> On 2014-01-06 09:12:03 -0800, Mark Dilger wrote:
>> The reason I was going to all the trouble of creating
>> chrooted environments was to be able to replicate
>> clusters that have tablespaces.  Not doing so makes
>> the test code simpler at the expense of reducing
>> test coverage.
>
>> I am using the same binaries.  The chroot directories
>> are not "chroot jails".  I'm intentionally bind mounting
>> out to all the other directories on the system, except
>> the other clusters' data directories and tablespace
>> directories.  The purpose of the chroot is to make the
>> paths the same on all clusters without the clusters
>> clobbering each other.
>
> I don't think the benefit of being able to test tablespaces without
> restarts comes even close to offsetting the cost of requiring sudo
> permissions and introducing OS dependencies. E.g. there's pretty much no
> hope of making this work sensibly on windows.
>
> So I'd just leave out that part.

Only skimming this thread but even if only a handful of buildfarm animals
can run this extended test bundle because of the restrictive requirements it
is likely better than discarding them altogether.  The main thing in this
case is to segregate out this routine so that it has to be invoked
explicitly and ideally in a "ignore if pre-reqs are missing" manner.

Increasing the likelihood and frequency of test runs in what is a fairly
popular platform and that covers non-OS specific code as well has benefits.
As long at it doesn't poison anything else I don't see that much harm coming
of it.

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/Re-In-core-regression-tests-for-replication-cascading-archiving-PITR-etc-Michael-Paquier-tp5785400p5785555.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


On 01/06/2014 07:12 PM, Mark Dilger wrote:
> The reason I was going to all the trouble of creating
> chrooted environments was to be able to replicate
> clusters that have tablespaces.

You can remove and recreate the symlink in pg_tblspc directory, after 
creating the cluster, to point it to a different location. It might be a 
bit tricky to do that if you have two clusters running at the same time, 
but it's probably easier than chrooting anyway. For example:

1. stop the standby
2. create the tablespace in master
3. stop master
4. mv the tablespace directory, and modify the symlink in master to 
point to the new location
5. start standby. It will replay the tablespace creation in the original 
location
6. restart master.

You now have the same tablespace in master and standby, but they point 
to different locations. This doesn't allow dynamically creating and 
dropping tablespaces during tests, but at least it gives you one 
tablespace to use.

Another idea would be to do something like chroot, but more lightweight, 
using FUSE, private mount namespaces, or cgroups.

- Heikki



Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> Another idea would be to do something like chroot, but more lightweight, 
> using FUSE, private mount namespaces, or cgroups.

I thought the goal here was to have a testing framework that (a) is
portable to every platform we support and (b) doesn't require root
privileges to run.  None of those options sound like they'll help meet
those requirements.
        regards, tom lane



On 2014-01-07 10:27:14 -0500, Tom Lane wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> > Another idea would be to do something like chroot, but more lightweight, 
> > using FUSE, private mount namespaces, or cgroups.
> 
> I thought the goal here was to have a testing framework that (a) is
> portable to every platform we support and (b) doesn't require root
> privileges to run.  None of those options sound like they'll help meet
> those requirements.

Seconded.

Perhaps the solution is to simply introduce tablespaces located relative
to PGDATA? That'd be fracking useful anyway.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services







On Tuesday, January 7, 2014 7:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:

> Another idea would be to do something like chroot, but more lightweight,
> using FUSE, private mount namespaces, or cgroups.

I thought the goal here was to have a testing framework that (a) is
portable to every platform we support and (b) doesn't require root
privileges to run.  None of those options sound like they'll help meet
those requirements.

            regards, tom lane


If I drop the idea of sudo/chroot and punt for now on testing
tablespaces under replication, it should be possible to test
the rest of the replication system in a way that meets (a) and
(b).  Perhaps Andres' idea of tablespaces relative to the
data directory will get implemented some day, at which point
we wouldn't be punting quite so much.  But until then, punt.

Would it make sense for this to just be part of 'make check'?
That would require creating multiple database clusters under
multiple data directories, and having them bind to multiple
ports or unix domain sockets.  Is that a problem?

What's the logic of having replication testing separated from
the other pg_regress tests?  Granted, not every user of postgres
uses replication, but that's true for lots of features, and
we don't split things like json into separate test suites.
Vendors who run 'make check' as part of their packaging of
postgresql would probably benefit from knowing if replication
doesn't work on their distro, and they may not change their
packaging systems to include a second 'make replicationcheck'
step.

mark




Mark Dilger <markdilger@yahoo.com> writes:
> Would it make sense for this to just be part of 'make check'?

Probably not, as (I imagine) it will take quite a bit longer than
"make check" does today.  People who are not working on replication
related features will be annoyed if a test cycle starts taking 10X
longer than it used to, for tests of no value to them.

It's already not the case that "make check" runs every available
automated test; the isolation tests, the PL tests, the contrib
tests are all separate.

There is a make check-world, which I think should reasonably run
all of these.
        regards, tom lane



Michael Paquier wrote:
> A possible input for a test that users could provide would be something> like that:
>
> # Node information for tests
> nodes> {
>     {node1, postgresql.conf params, recovery.conf params}
>     {node2, postgresql.conf params, recovery.conf params, slave of node1}
> }
> # Run test
> init node1
> run_sql node1 file1.sql
> # Check output
> init node2
> run_sql node2 file2.sql
> # Check that results are fine
> # Process
>
> The main problem is actually how to do that. Having some smart shell
> infrastructure would be simple and would facilitate (?) the maintenance
> of code used to run the tests. On the contrary having a C program would
> make the maintenance of code to run the tests more difficult (?) for a
> trade with more readable test suite input like the one I wrote above.
> This might also make the test input more readable for a human eye, in
> the shape of what is already available in src/test/isolation.

I like making this part of src/test/isolation, if folks do not object.
The core infrastructure in src/test/isolation seems applicable to
replication testing, and I'd hate to duplicate that code.

As for the node setup in your example above, I don't think it can be as
simple as defining nodes first, then running tests.  The configurations
themselves may need to be changed during the execution of a test, and
services stopped and started, all under test control and specified in
the same easy format.

I have started working on this, and will post WIP patches from time to
time, unless you all feel the need to point me in a different direction.


mark




On Sunday, January 5, 2014 6:13 PM, Michael Paquier <michael.paquier@gmail.com> wrote:


On Mon, Jan 6, 2014 at 4:51 AM, Mark Dilger <markdilger@yahoo.com> wrote:
> I am building a regression test system for replication and came across
> this email thread.  I have gotten pretty far into my implementation, but
> would be happy to make modifications if folks have improvements to
> suggest.  If the community likes my design, or a modified version based
> on your feedback, I'd be happy to submit a patch.
Yeah, this would be nice to look at, core code definitely needs to have some more infrastructure for such a test suite. I didn't get the time to go back to it since I began this thread though :)

> Currently I am canibalizing src/test/pg_regress.c, but that could instead
> be copied to src/test/pg_regress_replication.c or whatever.  The regression
> test creates and configures multiple database clusters, sets up the
> replication configuration for them, runs them each in nonprivileged mode
> and bound to different ports, feeds all the existing 141 regression tests
> into the master database with the usual checking that all the right results
> are obtained, and then checks that the standbys have the expected
> data.  This is possible all on one system because the database clusters
> are chroot'ed to see their own /data directory and not the /data directory
> of the other chroot'ed clusters, although the rest of the system, like /bin
> and /etc and /dev are all bind mounted and visible to each cluster.
Having vanilla regressions run in a cluster with multiple nodes and check the results on a standby is the top of the iceberg though. What I had in mind when I began this thread was to have more than a copy/paste of pg_regress, but an infrastructure that people could use to create and customize tests by having an additional control layer on the cluster itself. For example, testing replication is not only a matter of creating and setting up the nodes, but you might want to be able to initialize, add, remove nodes during the tests. Node addition would be either a new fresh master (this would be damn useful for a test suite for logical replication I think), or a slave node with custom recovery parameters to test replication, as well as PITR, archiving, etc. Then you need to be able to run SQL commands on top of that to check if the results are consistent with what you want.

A possible input for a test that users could provide would be something like that:
# Node information for tests
nodes
{
    {node1, postgresql.conf params, recovery.conf params}
    {node2, postgresql.conf params, recovery.conf params, slave of node1}
}
# Run test
init node1
run_sql node1 file1.sql
# Check output
init node2
run_sql node2 file2.sql
# Check that results are fine
# Process

The main problem is actually how to do that. Having some smart shell infrastructure would be simple and would facilitate (?) the maintenance of code used to run the tests. On the contrary having a C program would make the maintenance of code to run the tests more difficult (?) for a trade with more readable test suite input like the one I wrote above. This might also make the test input more readable for a human eye, in the shape of what is already available in src/test/isolation.

Another possibility could be also to integrate directly a recovery/backup manager in PG core, and have some tests for it, or even include those tests directly with pg_basebackup or an upper layer of it.

> There of course is room to add as many replication tests as you like,
> and the main 141 tests fed into the master could be extended to feed
> more data and such.
>
> The main drawbacks that I don't care for are:
>
> 1) 'make check' becomes 'sudo make check' because it needs permission
> to run chroot.
-1 for that developers should not need to use root to run regression suite.

> 2) I have no win32 version of the logic
For a first shot I am not sure that it matters much.


> The main advantages that I like about this design are:
>
> 1) Only one system is required.  The developer does not need network
> access to a second replication system.  Moreover, multiple database
> clusters can be established with interesting replication hierarchies between
> them, and the cost of each additional cluster is just another chroot
> environment

An assumption of the test suite is I think to allow developers to check for bugs on a local server only. This facilitates how the test suite is written and you don't need to enter in things like VM settings or cross-environment tests, things that could be done already nicely by frameworks of the type Jenkins. What I think people would like to have is that:
cd src/test/replication && make check/installcheck
And have the test run for them.

Regards,
--
Michael


On Thu, Jan 9, 2014 at 12:34 PM, Mark Dilger <markdilger@yahoo.com> wrote:
> Michael Paquier wrote:
>> A possible input for a test that users could provide would be something>
>> like that:
>>
>> # Node information for tests
>> nodes> {
>>     {node1, postgresql.conf params, recovery.conf params}
>>     {node2, postgresql.conf params, recovery.conf params, slave of node1}
>> }
>> # Run test
>> init node1
>> run_sql node1 file1.sql
>> # Check output
>> init node2
>> run_sql node2 file2.sql
>> # Check that results are fine
>> # Process
>>
>> The main problem is actually how to do that. Having some smart shell
>> infrastructure would be simple and would facilitate (?) the maintenance
>> of code used to run the tests. On the contrary having a C program would
>> make the maintenance of code to run the tests more difficult (?) for a
>> trade with more readable test suite input like the one I wrote above.
>> This might also make the test input more readable for a human eye, in
>> the shape of what is already available in src/test/isolation.
>
> I like making this part of src/test/isolation, if folks do not object.
> The core infrastructure in src/test/isolation seems applicable to
> replication testing, and I'd hate to duplicate that code.
>
> As for the node setup in your example above, I don't think it can be as
> simple as defining nodes first, then running tests.  The configurations
> themselves may need to be changed during the execution of a test, and
> services stopped and started, all under test control and specified in
> the same easy format.
Yes, my example was very basic :). What you actually need is the
possibility to perform actions on nodes during a test run, basically:
stop, start, init, reload, run SQL, change params/create new conf
files (like putting a node in recovery could be = create recovery.conf
+ restart). The place of the code does not matter much, but don't
think that it should be part of isolation as clustering and isolation
are too different test suites. I would have for example seen that as
src/test/cluster, with src/test/common for things that are shared
between test infrastructures.

As mentioned by Steve, the test suite of Slony might be interesting to
look at to get some ideas.

Regards,
-- 
Michael



-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> I thought the goal here was to have a testing framework that (a) is
> portable to every platform we support and (b) doesn't require root
> privileges to run.  None of those options sound like they'll help meet
> those requirements.

FWIW, I hacked up a Perl-based testing system as a proof of concept 
some time ago. I can dust it off if anyone is interested. Perl has 
a very nice testing ecosystem and is probably the most portable 
language we support, other than C. My quick goals for the project were:

* allow granular testing (ala Andrew's recent email, which reminded me of this)
* allow stackable methods and dependencies
* make it very easy to write new tests
* test various features that are way too diificult in our existing system (e.g. PITR, fdws)
* get some automated code coverage metrics (this one was tricky)
* allow future git integration based on subsytems

- -- 
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201401261211
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAlLlQeMACgkQvJuQZxSWSsiYhACggHJgQWB/Q2HEfjGZCwR3yEZg
zMsAnAssOStAmMuaJEScCGHGKWYNow1v
=zi0Y
-----END PGP SIGNATURE-----