Thread: Readd use of TAP subtests

Readd use of TAP subtests

From
Peter Eisentraut
Date:
Now that subtests in TAP are supported again, I want to correct the 
great historical injustice of 7912f9b7dc9e2d3f6cd81892ef6aa797578e9f06 
and put those subtests back.

Much more work like this is possible, of course.  I just wanted to get 
this out of the way since the code was already written and I've had this 
on my list for, uh, 7 years.
Attachment

Re: Readd use of TAP subtests

From
Dagfinn Ilmari Mannsåker
Date:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

> Now that subtests in TAP are supported again, I want to correct the
> great historical injustice of 7912f9b7dc9e2d3f6cd81892ef6aa797578e9f06 
> and put those subtests back.

The updated Test::More version requirement also gives us done_testing()
(added in 0.88), which saves us from the constant maintenance headache
of updating the test counts every time.  Do you fancy switching the
tests you're modifying anyway to that?


- ilmari



Re: Readd use of TAP subtests

From
Daniel Gustafsson
Date:
> On 8 Dec 2021, at 14:49, Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> wrote:
>
> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
>
>> Now that subtests in TAP are supported again, I want to correct the
>> great historical injustice of 7912f9b7dc9e2d3f6cd81892ef6aa797578e9f06
>> and put those subtests back.
>
> The updated Test::More version requirement also gives us done_testing()
> (added in 0.88), which saves us from the constant maintenance headache
> of updating the test counts every time.  Do you fancy switching the
> tests you're modifying anyway to that?

We already call done_testing() in a number of tests, and have done so for a
number of years.  src/test/ssl/t/002_scram.pl is one example.

--
Daniel Gustafsson        https://vmware.com/




Re: Readd use of TAP subtests

From
Andrew Dunstan
Date:
On 12/8/21 08:26, Peter Eisentraut wrote:
>
> Now that subtests in TAP are supported again, I want to correct the
> great historical injustice of 7912f9b7dc9e2d3f6cd81892ef6aa797578e9f06
> and put those subtests back.
>
> Much more work like this is possible, of course.  I just wanted to get
> this out of the way since the code was already written and I've had
> this on my list for, uh, 7 years.


+many as long as we cover all the cases in Cluster.pm and Utils.pm. I
suspect they have acquired some new multi-test subs in the intervening 7
years :-)


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Readd use of TAP subtests

From
Dagfinn Ilmari Mannsåker
Date:
Daniel Gustafsson <daniel@yesql.se> writes:

>> On 8 Dec 2021, at 14:49, Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> wrote:
>> 
>> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
>> 
>>> Now that subtests in TAP are supported again, I want to correct the
>>> great historical injustice of 7912f9b7dc9e2d3f6cd81892ef6aa797578e9f06 
>>> and put those subtests back.
>> 
>> The updated Test::More version requirement also gives us done_testing()
>> (added in 0.88), which saves us from the constant maintenance headache
>> of updating the test counts every time.  Do you fancy switching the
>> tests you're modifying anyway to that?
>
> We already call done_testing() in a number of tests, and have done so for a
> number of years.  src/test/ssl/t/002_scram.pl is one example.

Reading the Test::More changelog more closely, it turns out that even
though we used to depend on version 0.87, that's effectively equivalent
0.88, because there was no stable 0.87 release, only 0.86 and
development releases 0.87_01 through _03.

Either way, I think we should be switching tests to done_testing()
whenever it would otherwise have to adjust the test count, to avoid
having to do that again and again and again going forward.

- ilmari



Re: Readd use of TAP subtests

From
Andrew Dunstan
Date:
On 12/8/21 09:08, Dagfinn Ilmari Mannsåker wrote:
>
> Either way, I think we should be switching tests to done_testing()
> whenever it would otherwise have to adjust the test count, to avoid
> having to do that again and again and again going forward.
>

I'm not so sure. I don't think its necessarily a bad idea to have to
declare how many tests you're going to run. I appreciate it gets hard in
some cases, which is why we have now insisted on a Test::More version
that supports subtests. I suppose we could just take the attitude that
we're happy with however many tests it actually runs, and as long as
they all pass we're good. It just seems slightly sloppy.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Readd use of TAP subtests

From
Dagfinn Ilmari Mannsåker
Date:
Andrew Dunstan <andrew@dunslane.net> writes:

> On 12/8/21 09:08, Dagfinn Ilmari Mannsåker wrote:
>>
>> Either way, I think we should be switching tests to done_testing()
>> whenever it would otherwise have to adjust the test count, to avoid
>> having to do that again and again and again going forward.
>>
>
> I'm not so sure. I don't think its necessarily a bad idea to have to
> declare how many tests you're going to run. I appreciate it gets hard in
> some cases, which is why we have now insisted on a Test::More version
> that supports subtests. I suppose we could just take the attitude that
> we're happy with however many tests it actually runs, and as long as
> they all pass we're good. It just seems slightly sloppy.

The point of done_testing() is to additionally assert that the test
script ran to completion, so you don't get silent failures if something
should end up calling exit(0) prematurely (a non-zero exit status is
considered a failure by the test harness).

The only cases where an explicit plan adds value is if you're running
tests in a loop and care about the number of iterations, or have a
callback with a test inside that you want to make sure gets called.  For
these, it's better to explicitly assert that the list you're iterating
over is of the right length, or increment a counter in the loop or
callback and assert that it has the expected value.  This has the added
benefit of the failure being coming from the relevant place and having a
helpful description, rather than a plan mismatch at the end which you
then have to hunt down the cause of.

- ilmari



Re: Readd use of TAP subtests

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 12/8/21 09:08, Dagfinn Ilmari Mannsåker wrote:
>> Either way, I think we should be switching tests to done_testing()
>> whenever it would otherwise have to adjust the test count, to avoid
>> having to do that again and again and again going forward.

> I'm not so sure. I don't think its necessarily a bad idea to have to
> declare how many tests you're going to run.

I think the main point is to make sure that the test script reached an
intended exit point, rather than dying early someplace.  It's not apparent
to me why reaching a done_testing() call is a less reliable indicator of
that than executing some specific number of tests --- and I agree with
ilmari that maintaining the test count is a serious PITA.

            regards, tom lane



Re: Readd use of TAP subtests

From
Tom Lane
Date:
=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= <ilmari@ilmari.org> writes:
> The only cases where an explicit plan adds value is if you're running
> tests in a loop and care about the number of iterations, or have a
> callback with a test inside that you want to make sure gets called.  For
> these, it's better to explicitly assert that the list you're iterating
> over is of the right length, or increment a counter in the loop or
> callback and assert that it has the expected value.  This has the added
> benefit of the failure being coming from the relevant place and having a
> helpful description, rather than a plan mismatch at the end which you
> then have to hunt down the cause of.

Yeah.  A different way of stating that is that the test count adds
security only if you re-derive its proper value from first principles
every time you modify the test script.  I don't know about you guys,
but the only way I've ever adjusted those numbers is to put in whatever
the error message said was right.  I don't see how that's adding
anything but make-work; it's certainly not doing much to help verify
the script's control flow.

A question that seems pretty relevant here is: what exactly is the
point of using the subtest feature, if we aren't especially interested
in its effect on the overall test count?  I can see that it'd have
value when you wanted to use skip_all to control a subset of a test
run, but I'm not detecting where is the value-added in the cases in
Peter's proposed patch.

            regards, tom lane



Re: Readd use of TAP subtests

From
Peter Eisentraut
Date:
On 08.12.21 18:31, Tom Lane wrote:
> A question that seems pretty relevant here is: what exactly is the
> point of using the subtest feature, if we aren't especially interested
> in its effect on the overall test count?  I can see that it'd have
> value when you wanted to use skip_all to control a subset of a test
> run, but I'm not detecting where is the value-added in the cases in
> Peter's proposed patch.

It's useful if you edit a test file and add (what would appear to be) N 
tests and want to update the number.

But I'm also OK with the done_testing() style, if there are no drawbacks 
to that.

Does that call into question why we raised the Test::More version to 
begin with?  Or were there other reasons?




Re: Readd use of TAP subtests

From
Daniel Gustafsson
Date:
> On 8 Dec 2021, at 16:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I think the main point is to make sure that the test script reached an
> intended exit point, rather than dying early someplace.  It's not apparent
> to me why reaching a done_testing() call is a less reliable indicator of
> that than executing some specific number of tests --- and I agree with
> ilmari that maintaining the test count is a serious PITA.

FWIW I agree with this and am in favor of using done_testing().

--
Daniel Gustafsson        https://vmware.com/




Re: Readd use of TAP subtests

From
Peter Eisentraut
Date:
Now that we have switched everything to done_testing(), the subtests 
feature isn't that relevant anymore, but it might still be useful to get 
better output when running with PROVE_FLAGS=--verbose.  Compare before:

t/001_basic.pl ..
1..8
ok 1 - vacuumlo --help exit code 0
ok 2 - vacuumlo --help goes to stdout
ok 3 - vacuumlo --help nothing to stderr
ok 4 - vacuumlo --version exit code 0
ok 5 - vacuumlo --version goes to stdout
ok 6 - vacuumlo --version nothing to stderr
ok 7 - vacuumlo with invalid option nonzero exit code
ok 8 - vacuumlo with invalid option prints error message
ok
All tests successful.
Files=1, Tests=8,  0 wallclock secs ( 0.02 usr  0.00 sys +  0.08 cusr 
0.05 csys =  0.15 CPU)
Result: PASS

After (with attached patch):

t/001_basic.pl ..
# Subtest: vacuumlo --help
     ok 1 - exit code 0
     ok 2 - goes to stdout
     ok 3 - nothing to stderr
     1..3
ok 1 - vacuumlo --help
# Subtest: vacuumlo --version
     ok 1 - exit code 0
     ok 2 - goes to stdout
     ok 3 - nothing to stderr
     1..3
ok 2 - vacuumlo --version
# Subtest: vacuumlo options handling
     ok 1 - invalid option nonzero exit code
     ok 2 - invalid option prints error message
     1..2
ok 3 - vacuumlo options handling
1..3
ok
All tests successful.
Files=1, Tests=3,  0 wallclock secs ( 0.02 usr  0.01 sys +  0.11 cusr 
0.07 csys =  0.21 CPU)
Result: PASS

I think for deeply nested tests and test routines defined in other 
modules, this helps structure the test output more like the test source 
code is laid out, so it makes following the tests and locating failing 
test code easier.
Attachment

Re: Readd use of TAP subtests

From
Daniel Gustafsson
Date:
> On 24 Feb 2022, at 11:16, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

> I think for deeply nested tests and test routines defined in other modules,
> this helps structure the test output more like the test source code is laid
> out, so it makes following the tests and locating failing test code easier.

I don't have any strong opinions on this, but if we go ahead with it I think
there should be a short note in src/test/perl/README about when substest could
be considered.

--
Daniel Gustafsson        https://vmware.com/




Re: Readd use of TAP subtests

From
Andres Freund
Date:
Hi,

On 2022-02-24 11:16:03 +0100, Peter Eisentraut wrote:
> Now that we have switched everything to done_testing(), the subtests feature
> isn't that relevant anymore, but it might still be useful to get better
> output when running with PROVE_FLAGS=--verbose.

I've incidentally played with subtests yesterdays, when porting
src/interfaces/libpq/test/regress.pl to a tap test. Unfortunately it seems
that subtests aren't actually specified in the tap format, and that different
libraries generate different output formats. The reason this matters somewhat
is that meson's testrunner can parse tap and give nicer progress / error
reports. But since subtests aren't in the spec it can't currently parse
them...

Open issue since 2015:
https://github.com/TestAnything/Specification/issues/2

The perl ecosystem is so moribund :(.


> t/001_basic.pl ..
> # Subtest: vacuumlo --help
>     ok 1 - exit code 0
>     ok 2 - goes to stdout
>     ok 3 - nothing to stderr
>     1..3

It's clearly better.

We can approximate some of it by using is_deeply() and comparing exit, stdout,
stderr at once. Particularly for helpers like program_help() that are used in
a lot of places.

Greetings,

Andres Freund



Re: Readd use of TAP subtests

From
Peter Eisentraut
Date:
On 24.02.22 16:00, Andres Freund wrote:
> I've incidentally played with subtests yesterdays, when porting
> src/interfaces/libpq/test/regress.pl to a tap test. Unfortunately it seems
> that subtests aren't actually specified in the tap format, and that different
> libraries generate different output formats. The reason this matters somewhat
> is that meson's testrunner can parse tap and give nicer progress / error
> reports. But since subtests aren't in the spec it can't currently parse
> them...

Ok that's good to know.  What exactly happens when it tries to parse 
them?  Does it not count them or does it fail somehow?  The way the 
output is structured

t/001_basic.pl ..
# Subtest: vacuumlo --help
     ok 1 - exit code 0
     ok 2 - goes to stdout
     ok 3 - nothing to stderr
     1..3
ok 1 - vacuumlo --help

it appears that it should be able to parse it nonetheless and should 
just count the non-indented lines.



Re: Readd use of TAP subtests

From
Andrew Dunstan
Date:
On 2/25/22 08:39, Peter Eisentraut wrote:
> On 24.02.22 16:00, Andres Freund wrote:
>> I've incidentally played with subtests yesterdays, when porting
>> src/interfaces/libpq/test/regress.pl to a tap test. Unfortunately it
>> seems
>> that subtests aren't actually specified in the tap format, and that
>> different
>> libraries generate different output formats. The reason this matters
>> somewhat
>> is that meson's testrunner can parse tap and give nicer progress / error
>> reports. But since subtests aren't in the spec it can't currently parse
>> them...
>
> Ok that's good to know.  What exactly happens when it tries to parse
> them?  Does it not count them or does it fail somehow?  The way the
> output is structured
>
> t/001_basic.pl ..
> # Subtest: vacuumlo --help
>     ok 1 - exit code 0
>     ok 2 - goes to stdout
>     ok 3 - nothing to stderr
>     1..3
> ok 1 - vacuumlo --help
>
> it appears that it should be able to parse it nonetheless and should
> just count the non-indented lines.


AIUI TAP consumers are supposed to ignore lines they don't understand.
The Node TAP setup produces output like this, so perl is hardly alone
here. See <https://node-tap.org/docs/api/subtests/>


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Readd use of TAP subtests

From
Andres Freund
Date:
Hi,

On 2022-02-25 14:39:15 +0100, Peter Eisentraut wrote:
> On 24.02.22 16:00, Andres Freund wrote:
> > I've incidentally played with subtests yesterdays, when porting
> > src/interfaces/libpq/test/regress.pl to a tap test. Unfortunately it seems
> > that subtests aren't actually specified in the tap format, and that different
> > libraries generate different output formats. The reason this matters somewhat
> > is that meson's testrunner can parse tap and give nicer progress / error
> > reports. But since subtests aren't in the spec it can't currently parse
> > them...
>
> Ok that's good to know.  What exactly happens when it tries to parse them?
> Does it not count them or does it fail somehow?  The way the output is
> structured

Says that it can't pase a line of the tap output:
16:06:55 MALLOC_PERTURB_=156 /usr/bin/perl /tmp/meson-test/build/../subtest.pl
----------------------------------- output -----------------------------------
stdout:
# Subtest: a
    ok 1 - a: a
    ok 2 - a: b
    1..2
ok 1 - a
1..1
stderr:

TAP parsing error: unexpected input at line 4
------------------------------------------------------------------------------


> t/001_basic.pl ..
> # Subtest: vacuumlo --help
>     ok 1 - exit code 0
>     ok 2 - goes to stdout
>     ok 3 - nothing to stderr
>     1..3
> ok 1 - vacuumlo --help
>
> it appears that it should be able to parse it nonetheless and should just
> count the non-indented lines.

It looks like it's not ignoring indented lines...

Greetings,

Andres Freund



Re: Readd use of TAP subtests

From
Jacob Champion
Date:
On Fri, 2022-02-25 at 09:43 -0500, Andrew Dunstan wrote:
> AIUI TAP consumers are supposed to ignore lines they don't understand.

It's undefined behavior [1]:

> Any output that is not a version, a plan, a test line, a YAML block,
> a diagnostic or a bail out is incorrect. How a harness handles the
> incorrect line is undefined. Test::Harness silently ignores incorrect
> lines, but will become more stringent in the future. TAP::Harness
> reports TAP syntax errors at the end of a test run.

--Jacob

[1] https://testanything.org/tap-version-13-specification.html

Re: Readd use of TAP subtests

From
Jacob Champion
Date:
On Fri, 2022-02-25 at 16:35 +0000, Jacob Champion wrote:
> On Fri, 2022-02-25 at 09:43 -0500, Andrew Dunstan wrote:
> > AIUI TAP consumers are supposed to ignore lines they don't understand.
> 
> It's undefined behavior [1]:

And of course the minute I send this I notice that I've linked the v13
spec instead of the original... sorry. Assuming Perl isn't marking its
tests as version 13, you are correct:

> Any output line that is not a version, a plan, a test line, a
> diagnostic or a bail out is considered an “unknown” line. A TAP
> parser is required to not consider an unknown line as an error but
> may optionally choose to capture said line and hand it to the test
> harness, which may have custom behavior attached. This is to allow
> for forward compatability. Test::Harness silently ignores incorrect
> lines, but will become more stringent in the future. TAP::Harness
> reports TAP syntax errors at the end of a test run.

--Jacob

Re: Readd use of TAP subtests

From
Andres Freund
Date:
Hi,

On 2022-02-25 09:43:20 -0500, Andrew Dunstan wrote:
> AIUI TAP consumers are supposed to ignore lines they don't understand.

Are they?

In http://testanything.org/tap-version-13-specification.html there's:

"Lines written to standard output matching /^(not )?ok\b/ must be interpreted
as test lines. [...]All other lines must not be considered test output."

That says that all other lines aren't "test ouput". But what does that mean?
It certainly doesn't mean they can just be ignored, because obviously
^(TAP version|#|1..|Bail out) isn't to be ignored.

And then there's:


"
Anything else

  Any output that is not a version, a plan, a test line, a YAML block, a
  diagnostic or a bail out is incorrect. How a harness handles the incorrect
  line is undefined. Test::Harness silently ignores incorrect lines, but will
  become more stringent in the future. TAP::Harness reports TAP syntax errors at
  the end of a test run.
"

If I were to to implement a tap parser this wouldn't make me ignore lines.


Contrasting to that:
http://testanything.org/tap-specification.html

"
Anything else

  A TAP parser is required to not consider an unknown line as an error but may
  optionally choose to capture said line and hand it to the test harness,
  which may have custom behavior attached. This is to allow for forward
  compatability. Test::Harness silently ignores incorrect lines, but will
  become more stringent in the future. TAP::Harness reports TAP syntax errors
  at the end of a test run.
"

I honestly don't know what to make of that. Parsers are supposed to ignore
unknown lines, Test::Harness silently ignores, but also "TAP::Harness reports
TAP syntax errors"?  This seems like a self contradictory mess.

Greetings,

Andres Freund



Re: Readd use of TAP subtests

From
Andrew Dunstan
Date:
On 2/25/22 11:41, Andres Freund wrote:
> Hi,
>
> On 2022-02-25 09:43:20 -0500, Andrew Dunstan wrote:
>> AIUI TAP consumers are supposed to ignore lines they don't understand.
> Are they?
>
> In http://testanything.org/tap-version-13-specification.html there's:
>
> "Lines written to standard output matching /^(not )?ok\b/ must be interpreted
> as test lines. [...]All other lines must not be considered test output."
>
> That says that all other lines aren't "test ouput". But what does that mean?
> It certainly doesn't mean they can just be ignored, because obviously
> ^(TAP version|#|1..|Bail out) isn't to be ignored.



I don't think we're following spec 13.


>
> And then there's:
>
>
> "
> Anything else
>
>   Any output that is not a version, a plan, a test line, a YAML block, a
>   diagnostic or a bail out is incorrect. How a harness handles the incorrect
>   line is undefined. Test::Harness silently ignores incorrect lines, but will
>   become more stringent in the future. TAP::Harness reports TAP syntax errors at
>   the end of a test run.
> "
>
> If I were to to implement a tap parser this wouldn't make me ignore lines.
>
>
> Contrasting to that:
> http://testanything.org/tap-specification.html
>
> "
> Anything else
>
>   A TAP parser is required to not consider an unknown line as an error but may
>   optionally choose to capture said line and hand it to the test harness,
>   which may have custom behavior attached. This is to allow for forward
>   compatability. Test::Harness silently ignores incorrect lines, but will
>   become more stringent in the future. TAP::Harness reports TAP syntax errors
>   at the end of a test run.
> "
>
> I honestly don't know what to make of that. Parsers are supposed to ignore
> unknown lines, Test::Harness silently ignores, but also "TAP::Harness reports
> TAP syntax errors"?  This seems like a self contradictory mess.
>

I agree it's a mess. Both of these "specs" are incredibly loose. I guess
that happens when the spec comes after the implementation.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Readd use of TAP subtests

From
Peter Eisentraut
Date:
On 25.02.22 17:26, Andres Freund wrote:
>> Ok that's good to know.  What exactly happens when it tries to parse them?
>> Does it not count them or does it fail somehow?  The way the output is
>> structured
> Says that it can't pase a line of the tap output:

Ok, then I suppose I'm withdrawing this.

Perhaps in another 7 years or so this will be resolved and we can make 
another attempt at this. ;-)



Re: Readd use of TAP subtests

From
Jacob Champion
Date:
On Mon, 2022-02-28 at 17:02 +0100, Peter Eisentraut wrote:
> Perhaps in another 7 years or so this will be resolved and we can make 
> another attempt at this. ;-)

For what it's worth, the TAP 14 spec was officially released today:

    https://testanything.org/tap-version-14-specification.html

--Jacob

Re: Readd use of TAP subtests

From
Daniel Gustafsson
Date:
> On 19 Apr 2022, at 21:07, Jacob Champion <pchampion@vmware.com> wrote:
>
> On Mon, 2022-02-28 at 17:02 +0100, Peter Eisentraut wrote:
>> Perhaps in another 7 years or so this will be resolved and we can make
>> another attempt at this. ;-)
>
> For what it's worth, the TAP 14 spec was officially released today:

Interesting.  I hadn't even registered that a v14 was in the works, and come to
think of it I'm not sure I've yet seen a v13 consumer or producer.  For the TAP
support in pg_regress I've kept to the original spec.

--
Daniel Gustafsson        https://vmware.com/