Home > mailing lists

Re: Using each rel as both outer and inner for JOIN_ANTI - Mailing list pgsql-hackers

From	Richard Guo
Subject	Re: Using each rel as both outer and inner for JOIN_ANTI
Date	June 30, 2021 07:05:53
Msg-id	CAMbWs49SrEOeq8Vt4XXTJr9Biz3iyNO3qToe-gd1-nAfUDZnrw@mail.gmail.com Whole thread Raw
In response to	Re: Using each rel as both outer and inner for JOIN_ANTI (Ronan Dunklau <ronan.dunklau@aiven.io>)
Responses	Re: Using each rel as both outer and inner for JOIN_ANTI
List	pgsql-hackers

Tree view

On Tue, Jun 29, 2021 at 10:41 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:

Le mardi 29 juin 2021, 10:55:59 CEST Richard Guo a écrit :
> On Tue, Jun 29, 2021 at 3:55 PM Emre Hasegeli <emre@hasegeli.com> wrote:
> > > Thanks for the explanation. Attached is a demo code for the hash-join
> > > case, which is only for PoC to show how we can make it work. It's far
> > > from complete, at least we need to adjust the cost calculation for this
> > > 'right anti join'.
> >
> > I applied the patch and executed some queries. Hash Right Anti Joins
> > seem to be working correctly. Though, some of the tests are failing.
> > I guessed it's because the other join algorithms do not support right
> > anti join, but I couldn't reproduce it.
>
> Thanks for verifying this patch.

I also ran the tests on this patch, and can confirm the tests are failing.

The reason for that is that you request a new outer tuple whenever we have a
match, even when the outer tuple could match several tuples from the hash
table: we end up emitting the duplicates because we switched to another tuple
after the first match.

Yes, thanks! I was making a big mistake here thinking the executor can
stop after the first match. That's not true. We need to use each outer
tuple to find all the matches and mark the corresponding hashtable
entries. I have updated the patch with the fix.

> I think we can basically use the same cost calculation as with anti
> joins, since they share the fact that the executor will stop after the
> first match. However, there are still some differences. Such as when we
> consider the number of tuples that will pass the basic join, I think we
> need to use unmatched inner rows, rather than unmatched outer rows.

Due to the fact we cannot just skip at the first match, I'm not sure this works
either.

This is not correct any more since the fact that the executor will stop
after the first match does not hold true. A brief thought show me that

we can use the same cost calculation as with right joins.

Thanks

Richard

Attachment

v2-0001-Using-each-rel-as-both-outer-and-inner-for-anti-join.patch

pgsql-hackers by date:

From: Amit Kapila
Date: 30 June 2021, 06:59:30
Subject: Re: Allow streaming the changes after speculative aborts.

From: Dilip Kumar
Date: 30 June 2021, 07:25:26
Subject: Re: Allow streaming the changes after speculative aborts.

Re: Using each rel as both outer and inner for JOIN_ANTI - Mailing list pgsql-hackers

Attachment

Previous

Next