Re: Avoiding hash join batch explosions with extreme skew and weird stats - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Avoiding hash join batch explosions with extreme skew and weird stats
Date
Msg-id CAAKRu_YsWm7gc_b2nBGWFPE6wuhdOLfc1LBZ786DUzaCPUDXCA@mail.gmail.com
Whole thread Raw
In response to Re: Avoiding hash join batch explosions with extreme skew and weird stats  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: Avoiding hash join batch explosions with extreme skew and weird stats  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
So, I finally have a prototype to share of parallel hashloop fallback.

See the commit message for a full description of the functionality of the
patch.

This patch does contain refactoring of nodeHashjoin.

I have split the Parallel HashJoin and Serial HashJoin state machines
up, as they were diverging in my patch to a point that made for a
really cluttered ExecHashJoinImpl() (ExecHashJoinImpl() is now gone).

The reason I didn't do this refactoring in one patch and then put the
adaptive hashjoin code on top of it is that I might like to make
Parallel HashJoin and Serial HashJoin different nodes.

I think that has been discussed elsewhere and was looking to
understand the rationale for keeping them in the same node.

The patch is a rough prototype. Below are some of the high-level
pieces of work that I plan to do next. (there are many TODOs in the
code as well).

Some of the major outstanding work:

- correctness:
  - haven't tried it with anti-joins and don't think it works
  - number of batches is not deterministic from run-to-run

- performance:
  - join_hash.sql is *much* slower.
    While there are loads of performance fixes needed in the patch,
    the basic criteria for "falling back" is likely the culprit here.
  - There are many bottlenecks (there are several places where a
    barrier could be moved to somewhere less hot, an atomic used
    instead of a lock, or a method of coordination could be used to
    allow workers to do backend-local accounting and aggregate it)
  - need to make sure it does not create outer match status files when
    it shouldn't (inner joins, for example)

- testing:
  - many unexercised cases
  - add number of chunks to EXPLAIN (for users and for testing)

- refactoring:
  - The match status bitmap should have its own API or, at least,
    manipulation of it should be done in a centralized set of
    functions
  - Rename "chunk" (as in chunks of inner side) to something that is
    not already used in the context of memory chunks and, more
    importantly, SharedTuplestoreChunk
  - Make references to "hashloop fallback" and "adaptive hashjoin"
    more consistent
  - Rename adaptiveHashjoin.h/.c files and change what is in the files
    which are separate from nodeHashjoin.h/.c (depending on outcome of
    "new node")
  - The state machines are big and unwieldy now, so, there is probably
    some larger restructuring that could be done
  - Should probably use the ParallelHashJoinBatchAccessor to access
    the ParallelHashJoinBatch everywhere (realized this recently)

--
Melanie Plageman
Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [HACKERS] Block level parallel vacuum
Next
From: Amit Kapila
Date:
Subject: Re: [HACKERS] Block level parallel vacuum