On Wed, Apr 19, 2023 at 3:20 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2023-04-19 12:16:24 -0500, Justin Pryzby wrote: > On Wed, Apr 19, 2023 at 11:17:04AM -0400, Melanie Plageman wrote: > > Ultimately this is probably fine. If we wanted to modify one of the > > existing tests to cover the multi-batch case, changing the select > > count(*) to a select * would do the trick. I imagine we wouldn't want to > > do this because of the excessive output this would produce. I wondered > > if there was a pattern in the tests for getting around this. > > You could use explain (ANALYZE). But the output is machine-dependant in > various ways (which is why the tests use "explain analyze so rarely).
I think with sufficient options it's not machine specific. We have a bunch of EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) .. in our tests.
Cool. Yea, so ultimately these options are almost enough but memory usage changes from execution to execution. There are some tests which do regexp_replace() on the memory usage part of the EXPLAIN ANALYZE output to allow us to still compare the plans. However, I figured if I was already going to go to the trouble of using regexp_replace(), I might as well write a function that returns the "Actual Rows" field from the EXPLAIN ANALYZE output.
The attached patch does that. I admittedly mostly copy-pasted the plpgsql function from similar examples in other tests, and I suspect it may be overkill and also poorly written.
The nice thing about this approach is that we can modify some of the existing tests in join_hash.sql to use this function and cover the code to reset the matchbit for serial hashjoin, single batch parallel hashjoin, and all batches of parallel multi-batch hashjoin without any additional queries. (I'll leave testing match bit resetting with the skew hashtable and match bit resetting in case of a rescan for another day.)
I was able to delete the tests added in 558c9d75fe, as they became redundant.
I wonder if any other tests are in need of an EXPLAIN (ANALYZE, MEMORY_USAGE OFF) option? Perhaps it is quite unusual to only require a deterministic field like 'Actual Rows'. If we had that option we could also remove the extra EXPLAIN invocations before the actual query executions.