Thread: Re: how to speed up 002_pg_upgrade.pl and 025_stream_regress.pl under valgrind
Tomas Vondra <tomas@vondra.me> writes: > [ 002_pg_upgrade and 027_stream_regress are slow ] > I don't have a great idea how to speed up these tests, unfortunately. > But one of the problems is that all the TAP tests run serially - one > after each other. Could we instead run them in parallel? The tests setup > their "private" clusters anyway, right? But there's parallelism within those two tests already, or I would hope so at least. If you run them in parallel then you are probably causing 40 backends instead of 20 to be running at once (plus 40 valgrind instances). Maybe you have a machine beefy enough to make that useful, but I don't. Really the way to fix those two tests would be to rewrite them to not depend on the core regression tests. The core tests do a lot of work that's not especially useful for the purposes of those tests, and it's not even clear that they are exercising all that we'd like to have exercised for those purposes. In the case of 002_pg_upgrade, all we really need to do is create objects that will stress all of pg_dump. It's a little harder to scope out what we want to test for 027_stream_regress, but it's still clear that the core tests do a lot of work that's not helpful. regards, tom lane
Re: how to speed up 002_pg_upgrade.pl and 025_stream_regress.pl under valgrind
From
Tomas Vondra
Date:
On 9/15/24 20:31, Tom Lane wrote: > Tomas Vondra <tomas@vondra.me> writes: >> [ 002_pg_upgrade and 027_stream_regress are slow ] > >> I don't have a great idea how to speed up these tests, unfortunately. >> But one of the problems is that all the TAP tests run serially - one >> after each other. Could we instead run them in parallel? The tests setup >> their "private" clusters anyway, right? > > But there's parallelism within those two tests already, or I would > hope so at least. If you run them in parallel then you are probably > causing 40 backends instead of 20 to be running at once (plus 40 > valgrind instances). Maybe you have a machine beefy enough to make > that useful, but I don't. > I did look into that for both tests, albeit not very thoroughly, and most of the time there were only 1-2 valgrind processes using CPU. The stream_regress seems more aggressive, but even for that the CPU spikes are short, and the machine could easily do something else in parallel. I'll try to do better analysis and some charts to visualize this ... > Really the way to fix those two tests would be to rewrite them to not > depend on the core regression tests. The core tests do a lot of work > that's not especially useful for the purposes of those tests, and it's > not even clear that they are exercising all that we'd like to have > exercised for those purposes. In the case of 002_pg_upgrade, all > we really need to do is create objects that will stress all of > pg_dump. It's a little harder to scope out what we want to test for > 027_stream_regress, but it's still clear that the core tests do a lot > of work that's not helpful. > Perhaps, but that's a lot of work and time, and tricky - it seems we might easily remove some useful test, even if it's not the original purpose of that particular script. regards -- Tomas Vondra
Re: how to speed up 002_pg_upgrade.pl and 025_stream_regress.pl under valgrind
From
Thomas Munro
Date:
On Mon, Sep 16, 2024 at 6:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Really the way to fix those two tests would be to rewrite them to not > depend on the core regression tests. The core tests do a lot of work > that's not especially useful for the purposes of those tests, and it's > not even clear that they are exercising all that we'd like to have > exercised for those purposes. In the case of 002_pg_upgrade, all > we really need to do is create objects that will stress all of > pg_dump. It's a little harder to scope out what we want to test for > 027_stream_regress, but it's still clear that the core tests do a lot > of work that's not helpful. 027_stream_regress wants to get test coverage for the _redo routines and replay subsystem, so I've wondered about defining a src/test/regress/redo_schedule that removes what can be removed without reducing _redo coverage. For example, join_hash.sql must eat a *lot* of valgrind CPU cycles, and contributes nothing to redo testing. Thinking along the same lines, 002_pg_upgrade wants to create database objects to dump, so I was thinking you could have a dump_schedule that removes anything that doesn't leave objects behind. But you might be right that it'd be better to start from scratch for that with that goal in mind, and arguably also for the other. (An interesting archeological detail about the regression tests is that they seem to derive from the Wisconsin benchmark, famous for benchmark wars and Oracle lawyers[1]. It seems quaint now that 'tenk' was a lot of tuples, but I guess that Ingres on a PDP 11, which caused offence by running that benchmark 5x faster, ran in something like 128kB of memory[2], so I can only guess the buffer pool must have been something like 8 buffers or not much more in total?) [1] https://jimgray.azurewebsites.net/BenchmarkHandbook/chapter4.pdf [2] https://www.seas.upenn.edu/~zives/cis650/papers/INGRES.PDF
Thomas Munro <thomas.munro@gmail.com> writes: > (An interesting archeological detail about the regression tests is > that they seem to derive from the Wisconsin benchmark, famous for > benchmark wars and Oracle lawyers[1]. This is quite off-topic for the thread, but ... we actually had an implementation of the Wisconsin benchmark in src/test/bench, which we eventually removed (a05a4b478). It does look like the modern regression tests borrowed the definitions of "tenk1" and some related tables from there, but I think it'd be a stretch to say the tests descended from it. regards, tom lane