Re: Wrong results from Parallel Hash Full Join - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Wrong results from Parallel Hash Full Join
Date
Msg-id CAAKRu_bdwDN_aHVctHcc9VoDP9av7LUMeuLbch1fHD2ESouw1g@mail.gmail.com
Whole thread Raw
In response to Re: Wrong results from Parallel Hash Full Join  (Andres Freund <andres@anarazel.de>)
Responses Re: Wrong results from Parallel Hash Full Join
List pgsql-hackers

On Wed, Apr 19, 2023 at 3:20 PM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2023-04-19 12:16:24 -0500, Justin Pryzby wrote:
> On Wed, Apr 19, 2023 at 11:17:04AM -0400, Melanie Plageman wrote:
> > Ultimately this is probably fine. If we wanted to modify one of the
> > existing tests to cover the multi-batch case, changing the select
> > count(*) to a select * would do the trick. I imagine we wouldn't want to
> > do this because of the excessive output this would produce. I wondered
> > if there was a pattern in the tests for getting around this.
>
> You could use explain (ANALYZE).  But the output is machine-dependant in
> various ways (which is why the tests use "explain analyze so rarely).

I think with sufficient options it's not machine specific. We have a bunch of
 EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) ..
in our tests.

Cool. Yea, so ultimately these options are almost enough but memory
usage changes from execution to execution. There are some tests which do
regexp_replace() on the memory usage part of the EXPLAIN ANALYZE output
to allow us to still compare the plans. However, I figured if I was
already going to go to the trouble of using regexp_replace(), I might as
well write a function that returns the "Actual Rows" field from the
EXPLAIN ANALYZE output.

The attached patch does that. I admittedly mostly copy-pasted the
plpgsql function from similar examples in other tests, and I suspect it
may be overkill and also poorly written.

The nice thing about this approach is that we can modify some of the
existing tests in join_hash.sql to use this function and cover the code
to reset the matchbit for serial hashjoin, single batch parallel
hashjoin, and all batches of parallel multi-batch hashjoin without any
additional queries. (I'll leave testing match bit resetting with the
skew hashtable and match bit resetting in case of a rescan for another
day.)

I was able to delete the tests added in 558c9d75fe, as they became
redundant.

I wonder if any other tests are in need of an EXPLAIN (ANALYZE,
MEMORY_USAGE OFF) option? Perhaps it is quite unusual to only require a
deterministic field like 'Actual Rows'. If we had that option we could
also remove the extra EXPLAIN invocations before the actual query
executions.

- Melanie
Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Remove io prefix from pg_stat_io columns
Next
From: Melanie Plageman
Date:
Subject: Re: Wrong results from Parallel Hash Full Join