Thread: Adding "large" to PG_TEST_EXTRA
Hi, I'm working on rebasing [1], my patch to make relation extension scale better. As part of that I'd like to add tests for relation extension. To be able to test the bulk write strategy path, we need to have a few backends concurrently load > 16MB files. It seems pretty clear that doing that on all buildfarm machines wouldn't be nice / welcome. And it also seems likely that this won't be the last case where that'd be useful. So I'd like to add a 'large' class to PG_TEST_EXTRA, that we can use in tests that we only want to execute on machines with sufficient resources. Makes sense? Greetings, Andres Freund [1] https://www.postgresql.org/message-id/20221029025420.eplyow6k7tgu6he3%40awork3.anarazel.de
Greetings, * Andres Freund (andres@anarazel.de) wrote: > I'm working on rebasing [1], my patch to make relation extension scale > better. > > As part of that I'd like to add tests for relation extension. To be able to > test the bulk write strategy path, we need to have a few backends concurrently > load > 16MB files. > > It seems pretty clear that doing that on all buildfarm machines wouldn't be > nice / welcome. And it also seems likely that this won't be the last case > where that'd be useful. > > So I'd like to add a 'large' class to PG_TEST_EXTRA, that we can use in tests > that we only want to execute on machines with sufficient resources. > > Makes sense? +1 in general. Are there existing tests that we should add into that set that you're thinking of..? I've been working with the Kerberos tests and that's definitely one that seems to fit this description... Thanks, Stephen
Attachment
Andres Freund <andres@anarazel.de> writes: > As part of that I'd like to add tests for relation extension. To be able to > test the bulk write strategy path, we need to have a few backends concurrently > load > 16MB files. > It seems pretty clear that doing that on all buildfarm machines wouldn't be > nice / welcome. And it also seems likely that this won't be the last case > where that'd be useful. > So I'd like to add a 'large' class to PG_TEST_EXTRA, that we can use in tests > that we only want to execute on machines with sufficient resources. Makes sense. I see that this approach would result in manual check-world runs not running such tests by default either, which sounds right. Bikeshedding a bit ... is "large" the right name? It's not awful but I wonder if there is a better one; it seems like this class could eventually include tests that run a long time but don't necessarily eat disk space. "resource-intensive" is too long. regards, tom lane
Hi, On 2023-02-13 13:45:41 -0500, Stephen Frost wrote: > Are there existing tests that we should add into that set that you're > thinking of..? I've been working with the Kerberos tests and that's > definitely one that seems to fit this description... I think the kerberos tests are already opt-in, so I don't think we need to gate it further. Maybe the pgbench tests? I guess there's an argument to be made that we should use this for e.g. 002_pg_upgrade.pl or 027_stream_regress.pl - but I think both of these test pretty fundamental behaviour like WAL replay, which is unfortunately is pretty easy to break, so I'd be hesitant. I guess we could stop running the full regression tests in 002_pg_upgrade.pl if !large? Greetings, Andres Freund
Hi, On 2023-02-13 13:54:59 -0500, Tom Lane wrote: > Bikeshedding a bit ... is "large" the right name? It's not awful but > I wonder if there is a better one I did wonder about that too. But didn't come up with something more poignant. > it seems like this class could eventually include tests that run a long time > but don't necessarily eat disk space. "resource-intensive" is too long. I'm not sure we'd want to combine time-intensive and disk-space-intensive test in the same category. Availability of disk space and cpu cycles don't have to correlate that well. lotsadisk, lotsacpu? :) Greetings, Andres Freund
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2023-02-13 13:45:41 -0500, Stephen Frost wrote: > > Are there existing tests that we should add into that set that you're > > thinking of..? I've been working with the Kerberos tests and that's > > definitely one that seems to fit this description... > > I think the kerberos tests are already opt-in, so I don't think we need to > gate it further. I'd like to lump them in with a bunch of other tests though, to give it more chance to run.. My issue currently is that they're *too* gated. > Maybe the pgbench tests? Sure. > I guess there's an argument to be made that we should use this for e.g. > 002_pg_upgrade.pl or 027_stream_regress.pl - but I think both of these test > pretty fundamental behaviour like WAL replay, which is unfortunately is pretty > easy to break, so I'd be hesitant. Hm. If you aren't playing with that part of the code though, maybe it'd be nice to not run them. The pg_dump tests might also make sense to segregate out as they can add up to be a lot, and there's more that we could and probably should be doing there. > I guess we could stop running the full regression tests in 002_pg_upgrade.pl > if !large? Perhaps... but then what are we testing? Thanks, Stephen
Attachment
Hi, On 2023-02-13 14:15:24 -0500, Stephen Frost wrote: > * Andres Freund (andres@anarazel.de) wrote: > > On 2023-02-13 13:45:41 -0500, Stephen Frost wrote: > > > Are there existing tests that we should add into that set that you're > > > thinking of..? I've been working with the Kerberos tests and that's > > > definitely one that seems to fit this description... > > > > I think the kerberos tests are already opt-in, so I don't think we need to > > gate it further. > > I'd like to lump them in with a bunch of other tests though, to give it > more chance to run.. My issue currently is that they're *too* gated. Isn't the reason that we gate them that much that the test poses a security hazard on a multi-user system? I don't think we should combine opting into security hazards with opting into using disk space. FWIW, the kerberos tests run on all CI OSs other than windows. I have additional CI coverage for openbsd and netbsd in a separate branch, providing further coverage - but I'm not sure we want those additional covered OSs in core PG. I think the tests for kerberos run frequently enough in practice. I don't know how good the coverage they provide is, though, but that's a separate aspect to improve anyway. > > I guess there's an argument to be made that we should use this for e.g. > > 002_pg_upgrade.pl or 027_stream_regress.pl - but I think both of these test > > pretty fundamental behaviour like WAL replay, which is unfortunately is pretty > > easy to break, so I'd be hesitant. > > Hm. If you aren't playing with that part of the code though, maybe it'd > be nice to not run them. It's surprisingly easy to break it accidentally... > The pg_dump tests might also make sense to segregate out as they can add up > to be a lot, and there's more that we could and probably should be doing > there. IMO the main issue with the pg_dump test is their verbosity, rather than the runtime... ~8.8k subtests is a lot. find . -name 'regress_log*'|xargs -n 1 wc -l|sort -nr|head -n 5|less 12712 ./testrun/pg_dump/002_pg_dump/log/regress_log_002_pg_dump 5124 ./testrun/pg_rewind/002_databases/log/regress_log_002_databases 1928 ./testrun/pg_rewind/001_basic/log/regress_log_001_basic 1589 ./testrun/recovery/017_shm/log/regress_log_017_shm 1077 ./testrun/pg_rewind/004_pg_xlog_symlink/log/regress_log_004_pg_xlog_symlink > > I guess we could stop running the full regression tests in 002_pg_upgrade.pl > > if !large? > > Perhaps... but then what are we testing? There's plenty to pg_upgrade other than the pg_dump comparison aspect. I'm not sure it's worth spending too much energy finding tests that we can run less commonly than now. We've pushed back on tests using lots of resources so far, so we don't really have them... Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2023-02-13 13:54:59 -0500, Tom Lane wrote: >> it seems like this class could eventually include tests that run a long time >> but don't necessarily eat disk space. "resource-intensive" is too long. > I'm not sure we'd want to combine time-intensive and disk-space-intensive test > in the same category. Availability of disk space and cpu cycles don't have to > correlate that well. Yeah, I was thinking along the same lines. > lotsadisk, lotsacpu? :) bigdisk, bigcpu? regards, tom lane
On 2023-02-13 14:55:45 -0500, Tom Lane wrote: > bigdisk, bigcpu? Works for me. I'll probably just add bigdisk as part of adding a test for bulk relation extensions, mentioning in a comment that we might want bigcpu if we have a test for it?
Andres Freund <andres@anarazel.de> writes: > On 2023-02-13 14:55:45 -0500, Tom Lane wrote: >> bigdisk, bigcpu? > Works for me. > I'll probably just add bigdisk as part of adding a test for bulk relation > extensions, mentioning in a comment that we might want bigcpu if we have a > test for it? No objection here. regards, tom lane
On 2023-02-13 Mo 14:34, Andres Freund wrote:
Hi, On 2023-02-13 14:15:24 -0500, Stephen Frost wrote:* Andres Freund (andres@anarazel.de) wrote:On 2023-02-13 13:45:41 -0500, Stephen Frost wrote:Are there existing tests that we should add into that set that you're thinking of..? I've been working with the Kerberos tests and that's definitely one that seems to fit this description...I think the kerberos tests are already opt-in, so I don't think we need to gate it further.I'd like to lump them in with a bunch of other tests though, to give it more chance to run.. My issue currently is that they're *too* gated.Isn't the reason that we gate them that much that the test poses a security hazard on a multi-user system?
That's my understanding.
I don't think we should combine opting into security hazards with opting into using disk space.
I agree
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On Tue, Feb 14, 2023 at 5:42 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > I'm working on rebasing [1], my patch to make relation extension scale > better. > > As part of that I'd like to add tests for relation extension. To be able to > test the bulk write strategy path, we need to have a few backends concurrently > load > 16MB files. > > It seems pretty clear that doing that on all buildfarm machines wouldn't be > nice / welcome. And it also seems likely that this won't be the last case > where that'd be useful. > > So I'd like to add a 'large' class to PG_TEST_EXTRA, that we can use in tests > that we only want to execute on machines with sufficient resources. > Oh, I was been thinking about a similar topic recently, but I was unaware of PG_TEST_EXTRA [1] I've observed suggested test cases get rejected as being overkill, or because they would add precious seconds to the test execution. OTOH, I felt such tests would still help gain some additional percentages from the "code coverage" stats. The kind of tests I am thinking of don't necessarily need a huge disk/CPU - but they just take longer to run than anyone has wanted to burden the build-farm with. ~ Sorry for the thread interruption -- but I thought this might be the right place to ask: What is the recommended way to deal with such tests intended primarily for better code coverage? I didn't see anything that looked pre-built for 'coverage'. Did I miss something, or is it a case of just invent-your-own extra tests/values for your own ad-hoc requirements? e.g. make check EXTRA_TESTS=extra_regress_for_coverage make check-world PG_TEST_EXTRA='extra_tap_tests_for_coverage' Thanks! ------ [1] https://www.postgresql.org/docs/devel/regress-run.html Kind Regards, Peter Smith. Fujitsu Australia
Hi, On 2023-02-14 09:26:47 +1100, Peter Smith wrote: > I've observed suggested test cases get rejected as being overkill, or > because they would add precious seconds to the test execution. OTOH, I > felt such tests would still help gain some additional percentages from > the "code coverage" stats. The kind of tests I am thinking of don't > necessarily need a huge disk/CPU - but they just take longer to run > than anyone has wanted to burden the build-farm with. I'd say it depend on the test whether it's worth adding. Code coverage for its own sake isn't that useful, they have to actually test something useful. And tests have costs beyond runtime, e.g. new tests tend to fail in some edge cases. E.g. just having tests hit more lines, without verifying that the behaviour is actually correct, only provides limited additional assurance. It's also not very useful to add a very expensive test that provides only a very small additional amount of coverage. IOW, even if we add more test categories, it'll still be a tradeoff. > Sorry for the thread interruption -- but I thought this might be the > right place to ask: What is the recommended way to deal with such > tests intended primarily for better code coverage? I don't think that exists today. Do you have an example of the kind of test you're thinking of? Greetings, Andres Freund
On Tue, Feb 14, 2023 at 10:44 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2023-02-14 09:26:47 +1100, Peter Smith wrote: > > I've observed suggested test cases get rejected as being overkill, or > > because they would add precious seconds to the test execution. OTOH, I > > felt such tests would still help gain some additional percentages from > > the "code coverage" stats. The kind of tests I am thinking of don't > > necessarily need a huge disk/CPU - but they just take longer to run > > than anyone has wanted to burden the build-farm with. > > I'd say it depend on the test whether it's worth adding. Code coverage for its > own sake isn't that useful, they have to actually test something useful. And > tests have costs beyond runtime, e.g. new tests tend to fail in some edge > cases. > > E.g. just having tests hit more lines, without verifying that the behaviour is > actually correct, only provides limited additional assurance. It's also not > very useful to add a very expensive test that provides only a very small > additional amount of coverage. > > IOW, even if we add more test categories, it'll still be a tradeoff. > > > > Sorry for the thread interruption -- but I thought this might be the > > right place to ask: What is the recommended way to deal with such > > tests intended primarily for better code coverage? > > I don't think that exists today. > > Do you have an example of the kind of test you're thinking of? No, nothing specific in mind. But maybe like these: - tests for causing obscure errors that would never otherwise be reached without something deliberately designed to fail a certain way - tests for trivial user errors apparently deemed not worth bloating the regression tests with -- e.g. many errorConflictingDefElem not being called [1]. - timing-related or error tests where some long (multi-second) delay is a necessary part of the setup. ------ [1] https://coverage.postgresql.org/src/backend/commands/subscriptioncmds.c.gcov.html Kind Regards, Peter Smith. Fujitsu Australia
Hi, On 2023-02-14 11:38:06 +1100, Peter Smith wrote: > No, nothing specific in mind. But maybe like these: > - tests for causing obscure errors that would never otherwise be > reached without something deliberately designed to fail a certain way I think there's some cases around this that could be usefu, but also a lot that wouldn't. > - tests for trivial user errors apparently deemed not worth bloating > the regression tests with -- e.g. many errorConflictingDefElem not > being called [1]. I don't think it's worth adding a tests for all of these. The likelihood of catching a problem seems quite small. > - timing-related or error tests where some long (multi-second) delay > is a necessary part of the setup. IME that's almost always a sign that the test wouldn't be stable anyway. Greetings, Andres Freund