Home > mailing lists

Re: Avoiding hash join batch explosions with extreme skew and weird stats - Mailing list pgsql-hackers

From	Melanie Plageman
Subject	Re: Avoiding hash join batch explosions with extreme skew and weird stats
Date	December 30, 2019 06:34:02
Msg-id	CAAKRu_YsWm7gc_b2nBGWFPE6wuhdOLfc1LBZ786DUzaCPUDXCA@mail.gmail.com Whole thread Raw
In response to	Re: Avoiding hash join batch explosions with extreme skew and weird stats (Melanie Plageman <melanieplageman@gmail.com>)
Responses	Re: Avoiding hash join batch explosions with extreme skew and weird stats (Thomas Munro <thomas.munro@gmail.com>)
List	pgsql-hackers

Tree view

So, I finally have a prototype to share of parallel hashloop fallback.

See the commit message for a full description of the functionality of the
patch.

This patch does contain refactoring of nodeHashjoin.

I have split the Parallel HashJoin and Serial HashJoin state machines
up, as they were diverging in my patch to a point that made for a
really cluttered ExecHashJoinImpl() (ExecHashJoinImpl() is now gone).

The reason I didn't do this refactoring in one patch and then put the
adaptive hashjoin code on top of it is that I might like to make
Parallel HashJoin and Serial HashJoin different nodes.

I think that has been discussed elsewhere and was looking to
understand the rationale for keeping them in the same node.

The patch is a rough prototype. Below are some of the high-level
pieces of work that I plan to do next. (there are many TODOs in the
code as well).

Some of the major outstanding work:

- correctness:
- haven't tried it with anti-joins and don't think it works
- number of batches is not deterministic from run-to-run

- performance:
- join_hash.sql is *much* slower.
While there are loads of performance fixes needed in the patch,
the basic criteria for "falling back" is likely the culprit here.
- There are many bottlenecks (there are several places where a
barrier could be moved to somewhere less hot, an atomic used
instead of a lock, or a method of coordination could be used to
allow workers to do backend-local accounting and aggregate it)
- need to make sure it does not create outer match status files when
it shouldn't (inner joins, for example)

- testing:
- many unexercised cases
- add number of chunks to EXPLAIN (for users and for testing)

- refactoring:
- The match status bitmap should have its own API or, at least,
manipulation of it should be done in a centralized set of
functions
- Rename "chunk" (as in chunks of inner side) to something that is
not already used in the context of memory chunks and, more
importantly, SharedTuplestoreChunk
- Make references to "hashloop fallback" and "adaptive hashjoin"
more consistent
- Rename adaptiveHashjoin.h/.c files and change what is in the files
which are separate from nodeHashjoin.h/.c (depending on outcome of
"new node")
- The state machines are big and unwieldy now, so, there is probably
some larger restructuring that could be done
- Should probably use the ParallelHashJoinBatchAccessor to access
the ParallelHashJoinBatch everywhere (realized this recently)

Melanie Plageman

Attachment

v3-0001-hashloop-fallback.patch

pgsql-hackers by date:

From: Amit Kapila
Date: 30 December 2019, 05:55:28
Subject: Re: [HACKERS] Block level parallel vacuum

From: Amit Kapila
Date: 30 December 2019, 08:10:39
Subject: Re: [HACKERS] Block level parallel vacuum

Re: Avoiding hash join batch explosions with extreme skew and weird stats - Mailing list pgsql-hackers

Attachment

Previous

Next