Thread: Adding "large" to PG_TEST_EXTRA

Adding "large" to PG_TEST_EXTRA

From
Andres Freund
Date:
Hi,

I'm working on rebasing [1], my patch to make relation extension scale
better.

As part of that I'd like to add tests for relation extension. To be able to
test the bulk write strategy path, we need to have a few backends concurrently
load > 16MB files.

It seems pretty clear that doing that on all buildfarm machines wouldn't be
nice / welcome. And it also seems likely that this won't be the last case
where that'd be useful.

So I'd like to add a 'large' class to PG_TEST_EXTRA, that we can use in tests
that we only want to execute on machines with sufficient resources.

Makes sense?

Greetings,

Andres Freund

[1] https://www.postgresql.org/message-id/20221029025420.eplyow6k7tgu6he3%40awork3.anarazel.de



Re: Adding "large" to PG_TEST_EXTRA

From
Stephen Frost
Date:
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> I'm working on rebasing [1], my patch to make relation extension scale
> better.
>
> As part of that I'd like to add tests for relation extension. To be able to
> test the bulk write strategy path, we need to have a few backends concurrently
> load > 16MB files.
>
> It seems pretty clear that doing that on all buildfarm machines wouldn't be
> nice / welcome. And it also seems likely that this won't be the last case
> where that'd be useful.
>
> So I'd like to add a 'large' class to PG_TEST_EXTRA, that we can use in tests
> that we only want to execute on machines with sufficient resources.
>
> Makes sense?

+1 in general.  Are there existing tests that we should add into that
set that you're thinking of..?  I've been working with the Kerberos
tests and that's definitely one that seems to fit this description...

Thanks,

Stephen

Attachment

Re: Adding "large" to PG_TEST_EXTRA

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> As part of that I'd like to add tests for relation extension. To be able to
> test the bulk write strategy path, we need to have a few backends concurrently
> load > 16MB files.
> It seems pretty clear that doing that on all buildfarm machines wouldn't be
> nice / welcome. And it also seems likely that this won't be the last case
> where that'd be useful.
> So I'd like to add a 'large' class to PG_TEST_EXTRA, that we can use in tests
> that we only want to execute on machines with sufficient resources.

Makes sense.  I see that this approach would result in manual check-world
runs not running such tests by default either, which sounds right.

Bikeshedding a bit ... is "large" the right name?  It's not awful but
I wonder if there is a better one; it seems like this class could
eventually include tests that run a long time but don't necessarily
eat disk space.  "resource-intensive" is too long.

            regards, tom lane



Re: Adding "large" to PG_TEST_EXTRA

From
Andres Freund
Date:
Hi,

On 2023-02-13 13:45:41 -0500, Stephen Frost wrote:
> Are there existing tests that we should add into that set that you're
> thinking of..?  I've been working with the Kerberos tests and that's
> definitely one that seems to fit this description...

I think the kerberos tests are already opt-in, so I don't think we need to
gate it further.

Maybe the pgbench tests?

I guess there's an argument to be made that we should use this for e.g.
002_pg_upgrade.pl or 027_stream_regress.pl - but I think both of these test
pretty fundamental behaviour like WAL replay, which is unfortunately is pretty
easy to break, so I'd be hesitant.

I guess we could stop running the full regression tests in 002_pg_upgrade.pl
if !large?

Greetings,

Andres Freund



Re: Adding "large" to PG_TEST_EXTRA

From
Andres Freund
Date:
Hi,

On 2023-02-13 13:54:59 -0500, Tom Lane wrote:
> Bikeshedding a bit ... is "large" the right name?  It's not awful but
> I wonder if there is a better one

I did wonder about that too. But didn't come up with something more poignant.


> it seems like this class could eventually include tests that run a long time
> but don't necessarily eat disk space.  "resource-intensive" is too long.

I'm not sure we'd want to combine time-intensive and disk-space-intensive test
in the same category. Availability of disk space and cpu cycles don't have to
correlate that well.

lotsadisk, lotsacpu? :)

Greetings,

Andres Freund



Re: Adding "large" to PG_TEST_EXTRA

From
Stephen Frost
Date:
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2023-02-13 13:45:41 -0500, Stephen Frost wrote:
> > Are there existing tests that we should add into that set that you're
> > thinking of..?  I've been working with the Kerberos tests and that's
> > definitely one that seems to fit this description...
>
> I think the kerberos tests are already opt-in, so I don't think we need to
> gate it further.

I'd like to lump them in with a bunch of other tests though, to give it
more chance to run..  My issue currently is that they're *too* gated.

> Maybe the pgbench tests?

Sure.

> I guess there's an argument to be made that we should use this for e.g.
> 002_pg_upgrade.pl or 027_stream_regress.pl - but I think both of these test
> pretty fundamental behaviour like WAL replay, which is unfortunately is pretty
> easy to break, so I'd be hesitant.

Hm.  If you aren't playing with that part of the code though, maybe it'd
be nice to not run them.  The pg_dump tests might also make sense to
segregate out as they can add up to be a lot, and there's more that we
could and probably should be doing there.

> I guess we could stop running the full regression tests in 002_pg_upgrade.pl
> if !large?

Perhaps... but then what are we testing?

Thanks,

Stephen

Attachment

Re: Adding "large" to PG_TEST_EXTRA

From
Andres Freund
Date:
Hi,

On 2023-02-13 14:15:24 -0500, Stephen Frost wrote:
> * Andres Freund (andres@anarazel.de) wrote:
> > On 2023-02-13 13:45:41 -0500, Stephen Frost wrote:
> > > Are there existing tests that we should add into that set that you're
> > > thinking of..?  I've been working with the Kerberos tests and that's
> > > definitely one that seems to fit this description...
> > 
> > I think the kerberos tests are already opt-in, so I don't think we need to
> > gate it further.
> 
> I'd like to lump them in with a bunch of other tests though, to give it
> more chance to run..  My issue currently is that they're *too* gated.

Isn't the reason that we gate them that much that the test poses a security
hazard on a multi-user system?

I don't think we should combine opting into security hazards with opting into
using disk space.


FWIW, the kerberos tests run on all CI OSs other than windows. I have
additional CI coverage for openbsd and netbsd in a separate branch, providing
further coverage - but I'm not sure we want those additional covered OSs in
core PG.

I think the tests for kerberos run frequently enough in practice. I don't know
how good the coverage they provide is, though, but that's a separate aspect to
improve anyway.


> > I guess there's an argument to be made that we should use this for e.g.
> > 002_pg_upgrade.pl or 027_stream_regress.pl - but I think both of these test
> > pretty fundamental behaviour like WAL replay, which is unfortunately is pretty
> > easy to break, so I'd be hesitant.
> 
> Hm.  If you aren't playing with that part of the code though, maybe it'd
> be nice to not run them.

It's surprisingly easy to break it accidentally...


> The pg_dump tests might also make sense to segregate out as they can add up
> to be a lot, and there's more that we could and probably should be doing
> there.

IMO the main issue with the pg_dump test is their verbosity, rather than the
runtime... ~8.8k subtests is a lot.

find . -name 'regress_log*'|xargs -n 1 wc -l|sort -nr|head -n 5|less
12712 ./testrun/pg_dump/002_pg_dump/log/regress_log_002_pg_dump
5124 ./testrun/pg_rewind/002_databases/log/regress_log_002_databases
1928 ./testrun/pg_rewind/001_basic/log/regress_log_001_basic
1589 ./testrun/recovery/017_shm/log/regress_log_017_shm
1077 ./testrun/pg_rewind/004_pg_xlog_symlink/log/regress_log_004_pg_xlog_symlink


> > I guess we could stop running the full regression tests in 002_pg_upgrade.pl
> > if !large?
> 
> Perhaps... but then what are we testing?

There's plenty to pg_upgrade other than the pg_dump comparison aspect.


I'm not sure it's worth spending too much energy finding tests that we can run
less commonly than now. We've pushed back on tests using lots of resources so
far, so we don't really have them...

Greetings,

Andres Freund



Re: Adding "large" to PG_TEST_EXTRA

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2023-02-13 13:54:59 -0500, Tom Lane wrote:
>> it seems like this class could eventually include tests that run a long time
>> but don't necessarily eat disk space.  "resource-intensive" is too long.

> I'm not sure we'd want to combine time-intensive and disk-space-intensive test
> in the same category. Availability of disk space and cpu cycles don't have to
> correlate that well.

Yeah, I was thinking along the same lines.

> lotsadisk, lotsacpu? :)

bigdisk, bigcpu?

            regards, tom lane



Re: Adding "large" to PG_TEST_EXTRA

From
Andres Freund
Date:
On 2023-02-13 14:55:45 -0500, Tom Lane wrote:
> bigdisk, bigcpu?

Works for me.

I'll probably just add bigdisk as part of adding a test for bulk relation
extensions, mentioning in a comment that we might want bigcpu if we have a
test for it?



Re: Adding "large" to PG_TEST_EXTRA

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2023-02-13 14:55:45 -0500, Tom Lane wrote:
>> bigdisk, bigcpu?

> Works for me.

> I'll probably just add bigdisk as part of adding a test for bulk relation
> extensions, mentioning in a comment that we might want bigcpu if we have a
> test for it?

No objection here.

            regards, tom lane



Re: Adding "large" to PG_TEST_EXTRA

From
Andrew Dunstan
Date:


On 2023-02-13 Mo 14:34, Andres Freund wrote:
Hi,

On 2023-02-13 14:15:24 -0500, Stephen Frost wrote:
* Andres Freund (andres@anarazel.de) wrote:
On 2023-02-13 13:45:41 -0500, Stephen Frost wrote:
Are there existing tests that we should add into that set that you're
thinking of..?  I've been working with the Kerberos tests and that's
definitely one that seems to fit this description...
I think the kerberos tests are already opt-in, so I don't think we need to
gate it further.
I'd like to lump them in with a bunch of other tests though, to give it
more chance to run..  My issue currently is that they're *too* gated.
Isn't the reason that we gate them that much that the test poses a security
hazard on a multi-user system?


That's my understanding.



I don't think we should combine opting into security hazards with opting into
using disk space.


I agree


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Adding "large" to PG_TEST_EXTRA

From
Peter Smith
Date:
On Tue, Feb 14, 2023 at 5:42 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> I'm working on rebasing [1], my patch to make relation extension scale
> better.
>
> As part of that I'd like to add tests for relation extension. To be able to
> test the bulk write strategy path, we need to have a few backends concurrently
> load > 16MB files.
>
> It seems pretty clear that doing that on all buildfarm machines wouldn't be
> nice / welcome. And it also seems likely that this won't be the last case
> where that'd be useful.
>
> So I'd like to add a 'large' class to PG_TEST_EXTRA, that we can use in tests
> that we only want to execute on machines with sufficient resources.
>

Oh, I was been thinking about a similar topic recently, but I was
unaware of PG_TEST_EXTRA  [1]

I've observed suggested test cases get rejected as being overkill, or
because they would add precious seconds to the test execution. OTOH, I
felt such tests would still help gain some additional percentages from
the "code coverage" stats. The kind of tests I am thinking of don't
necessarily need a huge disk/CPU - but they just take longer to run
than anyone has wanted to burden the build-farm with.

~

Sorry for the thread interruption -- but I thought this might be the
right place to ask: What is the recommended way to deal with such
tests intended primarily for better code coverage?

I didn't see anything that looked pre-built for 'coverage'. Did I miss
something, or is it a case of just invent-your-own extra tests/values
for your own ad-hoc requirements?

e.g.
make check EXTRA_TESTS=extra_regress_for_coverage
make check-world PG_TEST_EXTRA='extra_tap_tests_for_coverage'

Thanks!

------
[1] https://www.postgresql.org/docs/devel/regress-run.html

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Adding "large" to PG_TEST_EXTRA

From
Andres Freund
Date:
Hi,

On 2023-02-14 09:26:47 +1100, Peter Smith wrote:
> I've observed suggested test cases get rejected as being overkill, or
> because they would add precious seconds to the test execution. OTOH, I
> felt such tests would still help gain some additional percentages from
> the "code coverage" stats. The kind of tests I am thinking of don't
> necessarily need a huge disk/CPU - but they just take longer to run
> than anyone has wanted to burden the build-farm with.

I'd say it depend on the test whether it's worth adding. Code coverage for its
own sake isn't that useful, they have to actually test something useful. And
tests have costs beyond runtime, e.g. new tests tend to fail in some edge
cases.

E.g. just having tests hit more lines, without verifying that the behaviour is
actually correct, only provides limited additional assurance.  It's also not
very useful to add a very expensive test that provides only a very small
additional amount of coverage.

IOW, even if we add more test categories, it'll still be a tradeoff.


> Sorry for the thread interruption -- but I thought this might be the
> right place to ask: What is the recommended way to deal with such
> tests intended primarily for better code coverage?

I don't think that exists today.

Do you have an example of the kind of test you're thinking of?


Greetings,

Andres Freund



Re: Adding "large" to PG_TEST_EXTRA

From
Peter Smith
Date:
On Tue, Feb 14, 2023 at 10:44 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2023-02-14 09:26:47 +1100, Peter Smith wrote:
> > I've observed suggested test cases get rejected as being overkill, or
> > because they would add precious seconds to the test execution. OTOH, I
> > felt such tests would still help gain some additional percentages from
> > the "code coverage" stats. The kind of tests I am thinking of don't
> > necessarily need a huge disk/CPU - but they just take longer to run
> > than anyone has wanted to burden the build-farm with.
>
> I'd say it depend on the test whether it's worth adding. Code coverage for its
> own sake isn't that useful, they have to actually test something useful. And
> tests have costs beyond runtime, e.g. new tests tend to fail in some edge
> cases.
>
> E.g. just having tests hit more lines, without verifying that the behaviour is
> actually correct, only provides limited additional assurance.  It's also not
> very useful to add a very expensive test that provides only a very small
> additional amount of coverage.
>
> IOW, even if we add more test categories, it'll still be a tradeoff.
>
>
> > Sorry for the thread interruption -- but I thought this might be the
> > right place to ask: What is the recommended way to deal with such
> > tests intended primarily for better code coverage?
>
> I don't think that exists today.
>
> Do you have an example of the kind of test you're thinking of?

No, nothing specific in mind. But maybe like these:
- tests for causing obscure errors that would never otherwise be
reached without something deliberately designed to fail a certain way
- tests for trivial user errors apparently deemed not worth bloating
the regression tests with -- e.g. many errorConflictingDefElem not
being called [1].
- timing-related or error tests where some long (multi-second) delay
is a necessary part of the setup.

------
[1] https://coverage.postgresql.org/src/backend/commands/subscriptioncmds.c.gcov.html

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Adding "large" to PG_TEST_EXTRA

From
Andres Freund
Date:
Hi,

On 2023-02-14 11:38:06 +1100, Peter Smith wrote:
> No, nothing specific in mind. But maybe like these:
> - tests for causing obscure errors that would never otherwise be
> reached without something deliberately designed to fail a certain way

I think there's some cases around this that could be usefu, but also a lot
that wouldn't.


> - tests for trivial user errors apparently deemed not worth bloating
> the regression tests with -- e.g. many errorConflictingDefElem not
> being called [1].

I don't think it's worth adding a tests for all of these. The likelihood of
catching a problem seems quite small.


> - timing-related or error tests where some long (multi-second) delay
> is a necessary part of the setup.

IME that's almost always a sign that the test wouldn't be stable anyway.

Greetings,

Andres Freund