Thread: numeric_big in make check?
numeric_big has been left out of parallel_schedule, requiring EXTRA_TESTS to run it, since going in back in 1999 (AFAICT it was even the reason EXTRA_TESTS was invented). The original commit states that it's huge, and it probably was. Today it runs faster than many tests we have in parallel_schedule, even on slow hardware like my ~5 year old laptop. Repeated runs in CI at various parallel groups place it at 50% the runtime of triggers.sql and 25% of brin.sql. To make sure it's executed and not silently breaks, is it time to add this to the regular make check? -- Daniel Gustafsson
Daniel Gustafsson <daniel@yesql.se> writes: > numeric_big has been left out of parallel_schedule, requiring EXTRA_TESTS to > run it, since going in back in 1999 (AFAICT it was even the reason EXTRA_TESTS > was invented). The original commit states that it's huge, and it probably was. > Today it runs faster than many tests we have in parallel_schedule, even on slow > hardware like my ~5 year old laptop. Repeated runs in CI at various parallel > groups place it at 50% the runtime of triggers.sql and 25% of brin.sql. > To make sure it's executed and not silently breaks, is it time to add this to > the regular make check? Or we could just flush it. It's never detected a bug, and I think you'd find that it adds zero code coverage (or if not, we could fix that in a far more surgical and less expensive manner). regards, tom lane
> On 19 Feb 2024, at 12:48, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Daniel Gustafsson <daniel@yesql.se> writes: >> To make sure it's executed and not silently breaks, is it time to add this to >> the regular make check? > > Or we could just flush it. It's never detected a bug, and I think > you'd find that it adds zero code coverage (or if not, we could > fix that in a far more surgical and less expensive manner). I don't have a problem with that, there isn't much value in keeping it (especially when not connected to make check so that it actually runs). That also means we can remove two make targets which hadn't been ported to meson to get us a hair closer to parity. -- Daniel Gustafsson
Attachment
> > On 19 Feb 2024, at 12:48, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Or we could just flush it. It's never detected a bug, and I think > > you'd find that it adds zero code coverage (or if not, we could > > fix that in a far more surgical and less expensive manner). > Off the top of my head, I can't say to what extent that's true, but it wouldn't surprise me if at least some of the tests added in the last 4 commits to touch that file aren't covered by tests elsewhere. Indeed that certainly looks like the case for 18a02ad2a5. I'm sure those tests could be pared down though. Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > On 19 Feb 2024, at 12:48, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Or we could just flush it. It's never detected a bug, and I think >> you'd find that it adds zero code coverage (or if not, we could >> fix that in a far more surgical and less expensive manner). > Off the top of my head, I can't say to what extent that's true, but it > wouldn't surprise me if at least some of the tests added in the last 4 > commits to touch that file aren't covered by tests elsewhere. Indeed > that certainly looks like the case for 18a02ad2a5. I'm sure those > tests could be pared down though. I thought I'd try to acquire some actual facts here, so I compared the code coverage shown by "make check" as of HEAD, versus "make check" after adding numeric_big to parallel_schedule. I saw the following lines of numeric.c as being covered in the second run and not the first: numeric(): 1285 || !NUMERIC_IS_SHORT(num))) 1293 new->choice.n_long.n_sign_dscale = NUMERIC_SIGN(new) | 1294 ((uint16) dscale & NUMERIC_DSCALE_MASK); div_var_fast(): 9185 idivisor = idivisor * NBASE + var2->digits[1]; 9186 idivisor_weight--; sqrt_var(): 10073 res_ndigits = res_weight + 1 - (-rscale - 1) / DEC_DIGITS; Pretty poor return on investment for the runtime consumed. I don't object to adding something to numeric.sql that's targeted at adding coverage for these (or anyplace else that's not covered), but let's not just throw cycles at the problem. Oddly, there were a few lines in numeric_poly_combine and int8_avg_combine that were hit in the first run and not the second. Apparently our tests of parallel aggregation aren't as reproducible as one could wish. regards, tom lane
On Mon, 19 Feb 2024 at 15:35, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I thought I'd try to acquire some actual facts here, so I compared > the code coverage shown by "make check" as of HEAD, versus "make > check" after adding numeric_big to parallel_schedule. I saw the > following lines of numeric.c as being covered in the second run > and not the first: > I don't think that tells the whole story. Code coverage only tells you whether a particular line of code has been hit, not whether it has been properly tested with all the values that might lead to different cases. For example: > sqrt_var(): > 10073 res_ndigits = res_weight + 1 - (-rscale - 1) / DEC_DIGITS; > To get code coverage of this line, all you need to do is ensure that sqrt_var() is called with rscale < -1 (which can only happen from the range-reduction code in ln_var()). You could do that by computing ln(1e50), which results in calling sqrt_var() with rscale = -2, causing that line of code in sqrt_var() to be hit. That would satisfy code coverage, but I would argue that you've only really tested that line of code properly if you also hit it with rscale = -3, and probably a few more values, to check that the round-down division is working as intended. Similarly, looking at commit 18a02ad2a5, the crucial code change was the following in power_var(): - val = Max(val, -NUMERIC_MAX_RESULT_SCALE); - val = Min(val, NUMERIC_MAX_RESULT_SCALE); val *= 0.434294481903252; /* approximate decimal result weight */ Any test that calls numeric_power() is going to hit those lines of code, but to see a failure, it was necessary to hit them with the absolute value of "val" greater than NUMERIC_MAX_RESULT_SCALE, which is why that commit added 2 new test cases to numeric_big, calling power_var() with "val" outside that range. Code coverage is never going to tell you whether or not that is being tested, since the code change was to delete lines. Even if that weren't the case, any line of code involving Min() or Max() has 2 branches, and code coverage won't tell you if you've hit both of them. > Pretty poor return on investment for the runtime consumed. I don't > object to adding something to numeric.sql that's targeted at adding > coverage for these (or anyplace else that's not covered), but let's > not just throw cycles at the problem. > I agree that blindly performing a bunch of large computations (just throwing cycles at the problem) is not a good approach to testing. And maybe that's a fair characterisation of parts of numeric_big. However, it also contains some fairly well-targeted tests intended to test specific edge cases that only occur with particular ranges of inputs, which don't necessarily show up as increased code coverage. So I think this requires more careful analysis. Simply deleting numeric_big and adding tests to numeric.sql until the same level of code coverage is achieved will not give the same level of testing. It's also worth noting that the cost of running numeric_big has come down very significantly over the last few years, as can be seen by running the current numeric_big script against old backends. There's a lot of random variation in the timings, but the trend is very clear: 9.5 1.641s 9.6 0.856s 10 0.845s 11 0.750s 12 0.760s 13 0.672s 14 0.430s 15 0.347s 16 0.336s Arguably, this is a useful benchmark to spot performance regressions and test proposed performance-improving patches. If I run "EXTRA_TESTS=numeric_big make check | grep 'ok ' | sort -nrk5", numeric_big is not in the top 10 most expensive tests (it's usually down at around 15'th place). Looking at the script itself, the addition, subtraction, multiplication and division tests at the top are probably pointless, since I would expect those operations to be tested adequately (and probably more thoroughly) by the transcendental test cases. In fact, I think it would probably be OK to delete everything above line 650, and just keep the bottom half of the script -- the pow(), exp(), ln() and log() tests, which cover various edge cases, as well as exercising basic arithmetic operations internally. We might want to check that I/O of large numerics is still being tested properly though. If we did that, numeric_big would be even further down the list of expensive tests, and I'd say it should be run by default. Regards, Dean
> On 20 Feb 2024, at 14:23, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > If we did that, numeric_big would be even further down the list of > expensive tests, and I'd say it should be run by default. My motivation for raising this was to get a test which is executed as part of parallel_schedule to make failures aren't missed. If we get there by slimming down numeric_big to keep the unique coverage then that sounds like a good plan to me. -- Daniel Gustafsson
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > Looking at the script itself, the addition, subtraction, > multiplication and division tests at the top are probably pointless, > since I would expect those operations to be tested adequately (and > probably more thoroughly) by the transcendental test cases. In fact, I > think it would probably be OK to delete everything above line 650, and > just keep the bottom half of the script -- the pow(), exp(), ln() and > log() tests, which cover various edge cases, as well as exercising > basic arithmetic operations internally. I could go with that, but let's just transpose those into numeric. regards, tom lane
On Tue, 20 Feb 2024 at 15:16, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Dean Rasheed <dean.a.rasheed@gmail.com> writes: > > Looking at the script itself, the addition, subtraction, > > multiplication and division tests at the top are probably pointless, > > since I would expect those operations to be tested adequately (and > > probably more thoroughly) by the transcendental test cases. In fact, I > > think it would probably be OK to delete everything above line 650, and > > just keep the bottom half of the script -- the pow(), exp(), ln() and > > log() tests, which cover various edge cases, as well as exercising > > basic arithmetic operations internally. > > I could go with that, but let's just transpose those into numeric. > Works for me. Regards, Dean