Thread: Re: Rewriting the test of pg_upgrade as a TAP test

Re: Rewriting the test of pg_upgrade as a TAP test

From

Stephen Frost

Date:

05 April 2017, 19:52:12

Michael,

* Michael Paquier (michael.paquier@gmail.com) wrote:
> On Tue, Apr 4, 2017 at 10:52 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
> > On 4/3/17 11:32, Andres Freund wrote:
> >> That doesn't strike as particularly future proof.  We intentionally
> >> leave objects behind pg_regress runs, but that only works if we actually
> >> run them...
> >
> > I generally agree with the sentiments expressed later in this thread.
> > But just to clarify what I meant here:  We don't need to run a, say,
> > 1-minute serial test to load a few "left behind" objects for the
> > pg_upgrade test, if we can load the same set of objects using dedicated
> > scripting in say 2 seconds.  This would make both the pg_upgrade tests
> > faster and would reduce the hidden dependencies in the main tests about
> > which kinds of objects need to be left behind.
>
> Making the tests run shorter while maintaining the current code
> coverage is nice. But this makes more complicated the test suite
> maintenance as this needs either a dedicated regression schedule or an
> extra test suite where objects are created just for the sake of
> pg_upgrade. This increases the risks of getting a rotten test suite
> with the time if patch makers and reviewers are not careful.

I believe that what Peter was getting at is that the pg_dump TAP tests
create a whole slew of objects in just a few seconds and are able to
then exercise those code-paths in pg_dump, without needing to run the
entire serial regression test run.

I'm still not completely convinced that we actually need to
independently test pg_upgrade by creating all the objects which the
pg_dump TAP tests do, given that pg_upgrade just runs pg_dump
underneath.  If we really want to do that, however, what we should do is
abstract out the pg_dump set of tests into a place that both the pg_dump
and pg_upgrade TAP tests could use them to create all the types of
objects which are supported to perform their tests.

Thanks!

Stephen

Re: Rewriting the test of pg_upgrade as a TAP test

From

Tom Lane

Date:

05 April 2017, 20:15:13

Stephen Frost <sfrost@snowman.net> writes:
> I believe that what Peter was getting at is that the pg_dump TAP tests
> create a whole slew of objects in just a few seconds and are able to
> then exercise those code-paths in pg_dump, without needing to run the
> entire serial regression test run.

Right.  But there's a certain amount of serendipity involved in using the
core regression tests' final results.  For example, I don't know how long
it would've taken us to understand the problems around dumping and
reloading child tables with inconsistent column orders, had there not been
examples of that in the regression tests.  I worry that creating a sterile
set of objects for testing pg_dump will leave blind spots, because it will
mean that we only test cases that we explicitly created test cases for.

> I'm still not completely convinced that we actually need to
> independently test pg_upgrade by creating all the objects which the
> pg_dump TAP tests do, given that pg_upgrade just runs pg_dump
> underneath.  If we really want to do that, however, what we should do is
> abstract out the pg_dump set of tests into a place that both the pg_dump
> and pg_upgrade TAP tests could use them to create all the types of
> objects which are supported to perform their tests.

I think it's largely pointless to test pg_dump --binary-upgrade except
as a part of pg_upgrade.
        regards, tom lane

Re: Rewriting the test of pg_upgrade as a TAP test

From

Stephen Frost

Date:

05 April 2017, 20:40:41

Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > I believe that what Peter was getting at is that the pg_dump TAP tests
> > create a whole slew of objects in just a few seconds and are able to
> > then exercise those code-paths in pg_dump, without needing to run the
> > entire serial regression test run.
>
> Right.  But there's a certain amount of serendipity involved in using the
> core regression tests' final results.  For example, I don't know how long
> it would've taken us to understand the problems around dumping and
> reloading child tables with inconsistent column orders, had there not been
> examples of that in the regression tests.  I worry that creating a sterile
> set of objects for testing pg_dump will leave blind spots, because it will
> mean that we only test cases that we explicitly created test cases for.

We don't need to only create sterile sets of objects in the pg_dump TAP
tests.  I don't believe we need to populate GIN indexes or vacuum them
to test pg_dump/pg_upgrade either (at least, not if we're going to stick
to the pg_upgrade test basically being if pg_dump returns the same
results before-and-after).

I'm all for adding tests into pg_dump which do things like drop columns
and change column names and other cases which could impact if the
pg_dump is correct or not, and there's nothing preventing those tests
from being added in the existing structure.  Certainly, before we remove
the coverage provided by running the serial test suite and then using
pg_upgrade, we should analyze what is being tested and ensure that we're
providing that same set of testing in the pg_dump TAP tests.

> > I'm still not completely convinced that we actually need to
> > independently test pg_upgrade by creating all the objects which the
> > pg_dump TAP tests do, given that pg_upgrade just runs pg_dump
> > underneath.  If we really want to do that, however, what we should do is
> > abstract out the pg_dump set of tests into a place that both the pg_dump
> > and pg_upgrade TAP tests could use them to create all the types of
> > objects which are supported to perform their tests.
>
> I think it's largely pointless to test pg_dump --binary-upgrade except
> as a part of pg_upgrade.

That's how I discovered that comments and security labels weren't being
pulled through to the new cluster for blobs, so I would have to disagree
with this.  Frankly, it's also much more straight-forward to run
pg_dump --binary-upgrade than it is to get pg_upgrade to do the same.

Still, I'm not actually against centralizing the tests done with pg_dump
such that they could be used by pg_upgrade also.  Creating all those
objects takes less than a second, at least on my system, so it would
still be quite a bit faster than running the serial regression suite.

We might also consider if there's a way to change the format for those
tests to make them a bit less impenetrable for non-Perl folks to work
with and to make it simpler to add new tests as new features are added.

Thanks!

Stephen

Re: Rewriting the test of pg_upgrade as a TAP test

From

Andres Freund

Date:

05 April 2017, 20:45:21

Hi,

On 2017-04-05 10:40:41 -0400, Stephen Frost wrote:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> > Stephen Frost <sfrost@snowman.net> writes:
> > > I believe that what Peter was getting at is that the pg_dump TAP tests
> > > create a whole slew of objects in just a few seconds and are able to
> > > then exercise those code-paths in pg_dump, without needing to run the
> > > entire serial regression test run.
> > 
> > Right.  But there's a certain amount of serendipity involved in using the
> > core regression tests' final results.  For example, I don't know how long
> > it would've taken us to understand the problems around dumping and
> > reloading child tables with inconsistent column orders, had there not been
> > examples of that in the regression tests.  I worry that creating a sterile
> > set of objects for testing pg_dump will leave blind spots, because it will
> > mean that we only test cases that we explicitly created test cases for.
> 
> We don't need to only create sterile sets of objects in the pg_dump TAP
> tests.

I really, really don't understand why we're conflating making pg_upgrade
tests less fragile / duplicative with changing what we use to test it.
This seems to have the sole result that we're not going to get anywhere.


> I don't believe we need to populate GIN indexes or vacuum them
> to test pg_dump/pg_upgrade either (at least, not if we're going to stick
> to the pg_upgrade test basically being if pg_dump returns the same
> results before-and-after).

I think we *should* have populated GIN indexes. Yes, the coverage isn't
perfect, but the VACUUM definitely gives a decent amount of coverage
whether the gin index looks halfway sane after the upgrade.


Greetings,

Andres Freund

Re: Rewriting the test of pg_upgrade as a TAP test

From

Stephen Frost

Date:

05 April 2017, 20:50:19

Andres,

* Andres Freund (andres@anarazel.de) wrote:
> On 2017-04-05 10:40:41 -0400, Stephen Frost wrote:
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> > > Stephen Frost <sfrost@snowman.net> writes:
> > > > I believe that what Peter was getting at is that the pg_dump TAP tests
> > > > create a whole slew of objects in just a few seconds and are able to
> > > > then exercise those code-paths in pg_dump, without needing to run the
> > > > entire serial regression test run.
> > >
> > > Right.  But there's a certain amount of serendipity involved in using the
> > > core regression tests' final results.  For example, I don't know how long
> > > it would've taken us to understand the problems around dumping and
> > > reloading child tables with inconsistent column orders, had there not been
> > > examples of that in the regression tests.  I worry that creating a sterile
> > > set of objects for testing pg_dump will leave blind spots, because it will
> > > mean that we only test cases that we explicitly created test cases for.
> >
> > We don't need to only create sterile sets of objects in the pg_dump TAP
> > tests.
>
> I really, really don't understand why we're conflating making pg_upgrade
> tests less fragile / duplicative with changing what we use to test it.
> This seems to have the sole result that we're not going to get anywhere.

Probably because the point was brought up that the regression tests for
pg_upgrade spend a bunch of time doing something which, ultimately,
don't actually add any real value.  Yes, there are bits of the core
regression tests that currently add value over what we have through
other approaches, but that's not where the bulk of running those tests
go.

> > I don't believe we need to populate GIN indexes or vacuum them
> > to test pg_dump/pg_upgrade either (at least, not if we're going to stick
> > to the pg_upgrade test basically being if pg_dump returns the same
> > results before-and-after).
>
> I think we *should* have populated GIN indexes. Yes, the coverage isn't
> perfect, but the VACUUM definitely gives a decent amount of coverage
> whether the gin index looks halfway sane after the upgrade.

We don't look at the gin index after the upgrade in the current
pg_upgrade testing, so I don't see why you feel it's at all valuable.
If we *did* do that (and I'm all for adding such tests), then perhaps
this argument would make sense, but we don't today and I haven't seen
anyone propose changing that.

Thanks!

Stephen

Re: Rewriting the test of pg_upgrade as a TAP test

From

Andres Freund

Date:

05 April 2017, 21:01:34

On 2017-04-05 10:50:19 -0400, Stephen Frost wrote:
> Andres,
> 
> * Andres Freund (andres@anarazel.de) wrote:
> > On 2017-04-05 10:40:41 -0400, Stephen Frost wrote:
> > > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> > > > Stephen Frost <sfrost@snowman.net> writes:
> > > > > I believe that what Peter was getting at is that the pg_dump TAP tests
> > > > > create a whole slew of objects in just a few seconds and are able to
> > > > > then exercise those code-paths in pg_dump, without needing to run the
> > > > > entire serial regression test run.
> > > > 
> > > > Right.  But there's a certain amount of serendipity involved in using the
> > > > core regression tests' final results.  For example, I don't know how long
> > > > it would've taken us to understand the problems around dumping and
> > > > reloading child tables with inconsistent column orders, had there not been
> > > > examples of that in the regression tests.  I worry that creating a sterile
> > > > set of objects for testing pg_dump will leave blind spots, because it will
> > > > mean that we only test cases that we explicitly created test cases for.
> > > 
> > > We don't need to only create sterile sets of objects in the pg_dump TAP
> > > tests.
> > 
> > I really, really don't understand why we're conflating making pg_upgrade
> > tests less fragile / duplicative with changing what we use to test it.
> > This seems to have the sole result that we're not going to get anywhere.
> 
> Probably because the point was brought up that the regression tests for
> pg_upgrade spend a bunch of time doing something which, ultimately,
> don't actually add any real value.  Yes, there are bits of the core
> regression tests that currently add value over what we have through
> other approaches, but that's not where the bulk of running those tests
> go.

Create a separate patch [& thread] about that, don't conflate the
topics.  I'm very much in favor of this rewrite, I'm very much not in
favor of only using some targeted testsuite.  By combining two
independent changes, you're just making it less likely that anything
happens.


> > > I don't believe we need to populate GIN indexes or vacuum them
> > > to test pg_dump/pg_upgrade either (at least, not if we're going to stick
> > > to the pg_upgrade test basically being if pg_dump returns the same
> > > results before-and-after).
> > 
> > I think we *should* have populated GIN indexes. Yes, the coverage isn't
> > perfect, but the VACUUM definitely gives a decent amount of coverage
> > whether the gin index looks halfway sane after the upgrade.
> 
> We don't look at the gin index after the upgrade in the current
> pg_upgrade testing, so I don't see why you feel it's at all valuable.

It's be trivial to add a VACUUM to the point where analyze_new_cluster
is currently run.  And I've previously run more manual tests.  Is that
perfect - no, definitely not.


Greetings,

Andres Freund

Re: Rewriting the test of pg_upgrade as a TAP test

From

Tom Lane

Date:

05 April 2017, 21:02:09

Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> Right.  But there's a certain amount of serendipity involved in using the
>> core regression tests' final results.  For example, I don't know how long
>> it would've taken us to understand the problems around dumping and
>> reloading child tables with inconsistent column orders, had there not been
>> examples of that in the regression tests.  I worry that creating a sterile
>> set of objects for testing pg_dump will leave blind spots, because it will
>> mean that we only test cases that we explicitly created test cases for.

> I'm all for adding tests into pg_dump which do things like drop columns
> and change column names and other cases which could impact if the
> pg_dump is correct or not, and there's nothing preventing those tests
> from being added in the existing structure.  Certainly, before we remove
> the coverage provided by running the serial test suite and then using
> pg_upgrade, we should analyze what is being tested and ensure that we're
> providing that same set of testing in the pg_dump TAP tests.

I don't think you grasped my basic point, which is that I'm worried about
emergent cases that we don't foresee needing to test (and that no amount
of code coverage checking would have shown up as being overlooked).
Admittedly, relying on the core regression tests to trigger such cases is
a pretty haphazard strategy, but it's way better than no strategy at all.

> We might also consider if there's a way to change the format for those
> tests to make them a bit less impenetrable for non-Perl folks to work
> with and to make it simpler to add new tests as new features are added.

TBH, that's part of my allergy to this concept, ie that this test
mechanism seems pretty write-only.  I do not think that people will add
pg_dump test cases except when required to by project policy, so that
we will end up with a very skeletal set of tests that won't find any
unforeseen behaviors.

The TAP tests in general are utterly developer-unfriendly from where
I sit: not only is the code pretty unreadable, but god help you when
you need to try to debug a failure.  I think that some serious effort
needs to be spent on improving that situation before we imagine that
we can throw away other test mechanisms we have today in favor of
TAP tests.
        regards, tom lane

Re: Rewriting the test of pg_upgrade as a TAP test

From

Stephen Frost

Date:

05 April 2017, 21:07:02

Andres,

* Andres Freund (andres@anarazel.de) wrote:
> On 2017-04-05 10:50:19 -0400, Stephen Frost wrote:
> > Probably because the point was brought up that the regression tests for
> > pg_upgrade spend a bunch of time doing something which, ultimately,
> > don't actually add any real value.  Yes, there are bits of the core
> > regression tests that currently add value over what we have through
> > other approaches, but that's not where the bulk of running those tests
> > go.
>
> Create a separate patch [& thread] about that, don't conflate the
> topics.  I'm very much in favor of this rewrite, I'm very much not in
> favor of only using some targeted testsuite.  By combining two
> independent changes, you're just making it less likely that anything
> happens.

I've made it clear, I thought, a couple of times that I agree with the
rewrite and that we should move forward with it.  Nothing on this
sub-thread changes that.  It's also registered in the 2017-07
commitfest, so I wouldn't think that there's a risk of it being
forgotten or that we need to cut off all discussion about what may
change between now and July that would be relevant to this patch.

> > We don't look at the gin index after the upgrade in the current
> > pg_upgrade testing, so I don't see why you feel it's at all valuable.
>
> It's be trivial to add a VACUUM to the point where analyze_new_cluster
> is currently run.  And I've previously run more manual tests.  Is that
> perfect - no, definitely not.

Being trivial doesn't mean it's something we're actually doing today.

Given that we aren't actually changing anything in the index during a
same-version pg_upgrade, nor are we changing the code that's run by
that VACUUM, I'm curious just what we're ending up testing that's
different from just restarting the existing cluster and running a new
VACUUM.

Thanks!

Stephen

Re: Rewriting the test of pg_upgrade as a TAP test

From

Stephen Frost

Date:

05 April 2017, 21:13:33

Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > I'm all for adding tests into pg_dump which do things like drop columns
> > and change column names and other cases which could impact if the
> > pg_dump is correct or not, and there's nothing preventing those tests
> > from being added in the existing structure.  Certainly, before we remove
> > the coverage provided by running the serial test suite and then using
> > pg_upgrade, we should analyze what is being tested and ensure that we're
> > providing that same set of testing in the pg_dump TAP tests.
>
> I don't think you grasped my basic point, which is that I'm worried about
> emergent cases that we don't foresee needing to test (and that no amount
> of code coverage checking would have shown up as being overlooked).
> Admittedly, relying on the core regression tests to trigger such cases is
> a pretty haphazard strategy, but it's way better than no strategy at all.

The tests that were added to the core regression suite were done so for
a reason and hopefully we can identify cases where it'd make sense to
also run those tests for pg_upgrade/pg_dump testing.  More-or-less
anything that materially changes the catalog should be included, I would
think.  Things that are only really only working with the heap/index
files don't really need to be performed because the pg_upgrade process
doesn't change those.

> > We might also consider if there's a way to change the format for those
> > tests to make them a bit less impenetrable for non-Perl folks to work
> > with and to make it simpler to add new tests as new features are added.
>
> TBH, that's part of my allergy to this concept, ie that this test
> mechanism seems pretty write-only.  I do not think that people will add
> pg_dump test cases except when required to by project policy, so that
> we will end up with a very skeletal set of tests that won't find any
> unforeseen behaviors.

I certainly agree that the current structure for the tests isn't trivial
to work with and would welcome suggestions as to how to improve it.  Now
that we've had this testing structure for a year and have added quite a
bit more to it, it's definitely clear that we need to find a more
developer-friendly approach.

> The TAP tests in general are utterly developer-unfriendly from where
> I sit: not only is the code pretty unreadable, but god help you when
> you need to try to debug a failure.  I think that some serious effort
> needs to be spent on improving that situation before we imagine that
> we can throw away other test mechanisms we have today in favor of
> TAP tests.

Agreed.

Thanks!

Stephen

Re: Rewriting the test of pg_upgrade as a TAP test

From

Noah Misch

Date:

06 April 2017, 07:30:38

On Wed, Apr 05, 2017 at 11:13:33AM -0400, Stephen Frost wrote:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> > Stephen Frost <sfrost@snowman.net> writes:
> > > I'm all for adding tests into pg_dump which do things like drop columns
> > > and change column names and other cases which could impact if the
> > > pg_dump is correct or not, and there's nothing preventing those tests
> > > from being added in the existing structure.  Certainly, before we remove
> > > the coverage provided by running the serial test suite and then using
> > > pg_upgrade, we should analyze what is being tested and ensure that we're
> > > providing that same set of testing in the pg_dump TAP tests.
> > 
> > I don't think you grasped my basic point, which is that I'm worried about
> > emergent cases that we don't foresee needing to test (and that no amount
> > of code coverage checking would have shown up as being overlooked).
> > Admittedly, relying on the core regression tests to trigger such cases is
> > a pretty haphazard strategy, but it's way better than no strategy at all.
> 
> The tests that were added to the core regression suite were done so for
> a reason and hopefully we can identify cases where it'd make sense to
> also run those tests for pg_upgrade/pg_dump testing.

I think you _are_ missing Tom's point.  We've caught pg_dump and pg_upgrade
bugs thanks to regression database objects created for purposes unrelated to
pg_dump.  It's true that there exist other test strategies that are more
efficient or catch more bugs overall.  None of them substitute 100% for the
serendipity seen in testing dump/restore on the regression database.

> More-or-less
> anything that materially changes the catalog should be included, I would
> think.  Things that are only really only working with the heap/index
> files don't really need to be performed because the pg_upgrade process
> doesn't change those.

That is formally true.


Also, I agree with Andres that this is not a thread for discussing test
changes beyond mechanical translation of the pg_upgrade test suite.