Re: Wrong results from Parallel Hash Full Join - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Wrong results from Parallel Hash Full Join
Date
Msg-id 20230419151704.xkkobswffutslj34@liskov
Whole thread Raw
In response to Re: Wrong results from Parallel Hash Full Join  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: Wrong results from Parallel Hash Full Join
List pgsql-hackers
On Wed, Apr 12, 2023 at 08:31:26PM -0400, Melanie Plageman wrote:
> On Wed, Apr 12, 2023 at 6:50 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > And if we're going to
> > exercise/test that case, should we do the non-parallel version too?
> 
> I've added this. I thought if we were adding the serial case, we might
> as well add the multi-batch case as well. However, that proved a bit
> more challenging. We can get a HOT tuple in one of the existing tables
> with no issues. Doing this and then deleting the reset match bit code
> doesn't cause any of the tests to fail, however, because we use this
> expression as the join condition when we want to emit NULL-extended
> unmatched tuples.
> 
> select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
> 
> I don't think we want to add yet another time-consuming test to this
> test file. So, I was trying to decide if it was worth changing these
> existing tests so that they would fail when the match bit wasn't reset.
> I'm not sure.

I couldn't stop thinking about how my explanation for why this test
didn't fail sounded wrong.

After some further investigation, I found that the real reason that the
HOT bit is already cleared in the tuples inserted into the hashtable for
this query is that the tuple descriptor for the relation "simple" and
the target list for the scan node are not identical (because we only
need to retain a single column from simple in order to eventually do
count(*)), so we make a new virtual tuple and build projection info for
the scan node. The virtual tuple doesn't have the HOT bit set anymore
(the buffer heap tuple would have). So we couldn't fail a test of the
code clearing the match bit.

Ultimately this is probably fine. If we wanted to modify one of the
existing tests to cover the multi-batch case, changing the select
count(*) to a select * would do the trick. I imagine we wouldn't want to
do this because of the excessive output this would produce. I wondered
if there was a pattern in the tests for getting around this. But,
perhaps we don't care enough to cover this code.

- Melanie



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: check_strxfrm_bug()
Next
From: Melanie Plageman
Date:
Subject: Remove io prefix from pg_stat_io columns