Re: Avoiding hash join batch explosions with extreme skew and weird stats - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Avoiding hash join batch explosions with extreme skew and weird stats
Date
Msg-id CA+TgmobqYECs_+bv+GuMZ+dqUPJpD3RpcZnXe6MyCrJ4MGy7QA@mail.gmail.com
Whole thread Raw
In response to Re: Avoiding hash join batch explosions with extreme skew and weird stats  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: Avoiding hash join batch explosions with extreme skew and weird stats
List pgsql-hackers
On Thu, Jun 6, 2019 at 7:31 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> I'm not sure I understand why you would need to compare the original
> tuples to the unmatched tuples file.

I think I was confused.  Actually, I'm still not sure I understand this part:

> Then, you iterate again through the outer side a third time to join it
> to the unmatched tuples in the unmatched tuples file (from the first
> chunk) and write the following to a new unmatched tuples file:
> 5
> 9
> 11
> 11

and likewise here

> Then you iterate a fifth time through the outer side to join it to the
> unmatched tuples in the unmatched tuples file (from the second chunk)
> and write the following to a new unmatched tuples file:
> 11
> 11

So you refer to joining the outer side to the unmatched tuples file,
but how would that tell you which outer tuples had no matches on the
inner side?  I think what you'd need to do is anti-join the unmatched
tuples file to the current inner batch.  So the algorithm would be
something like:

for each inner batch:
  for each outer tuple:
    if tuple matches inner batch then emit match
    if tuple does not match inner batch and this is the first inner batch:
      write tuple to unmatched tuples file
  if this is not the first inner batch:
    for each tuple from the unmatched tuples file:
      if tuple does not match inner batch:
        write to new unmatched tuples file
    discard previous unmatched tuples file and use the new one for the
next iteration

for each tuple in the final unmatched tuples file:
  null-extend and emit

If that's not what you have in mind, maybe you could provide some
similar pseudocode?  Or you can just ignore me. I'm not trying to
interfere with an otherwise-fruitful discussion by being the only one
in the room who is confused...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Avoiding hash join batch explosions with extreme skew and weirdstats
Next
From: Robert Haas
Date:
Subject: Re: Avoiding hash join batch explosions with extreme skew and weird stats