Re: Asymmetric partition-wise JOIN - Mailing list pgsql-hackers

From Kohei KaiGai
Subject Re: Asymmetric partition-wise JOIN
Date
Msg-id CAOP8fzaHqO647xSWsWaA7Tr9Z3kd+CuAKkCwiWwnC+Moh-567Q@mail.gmail.com
Whole thread Raw
In response to Re: Asymmetric partition-wise JOIN  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Asymmetric partition-wise JOIN
Re: Asymmetric partition-wise JOIN
List pgsql-hackers
2019年8月24日(土) 7:02 Thomas Munro <thomas.munro@gmail.com>:
>
> On Fri, Aug 23, 2019 at 4:05 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > We can consider the table join ptable X t1 above is equivalent to:
> >   (ptable_p0 + ptable_p1 + ptable_p2) X t1
> > = (ptable_p0 X t1) + (ptable_p1 X t1) + (ptable_p2 X t1)
> > It returns an equivalent result, however, rows are already reduced by HashJoin
> > in the individual leaf of Append, so CPU-cycles consumed by Append node can
> > be cheaper.
> >
> > On the other hands, it has a downside because t1 must be read 3 times and
> > hash table also must be built 3 times. It increases the expected cost,
> > so planner
> > may not choose the asymmetric partition-wise join plan.
>
> What if you include the partition constraint as a filter on t1?  So you get:
>
> ptable X t1 =
>   (ptable_p0 X (σ hash(dist)%4=0 (t1))) +
>   (ptable_p1 X (σ hash(dist)%4=1 (t1))) +
>   (ptable_p2 X (σ hash(dist)%4=2 (t1))) +
>   (ptable_p3 X (σ hash(dist)%4=3 (t1)))
>
> Pros:
> 1.  The hash tables will not contain unnecessary junk.
> 2.  You'll get the right answer if t1 is on the outer side of an outer join.
> 3.  If this runs underneath a Parallel Append and t1 is big enough
> then workers will hopefully cooperate and do a synchronised scan of
> t1.
> 4.  The filter might enable a selective and efficient plan like an index scan.
>
> Cons:
> 1.  The filter might not enable a selective and efficient plan, and
> therefore cause extra work.
>
> (It's a little weird in this example because don't usually see hash
> functions in WHERE clauses, but that could just as easily be dist
> BETWEEN 1 AND 99 or any other partition constraint.)
>
It requires the join-key must include the partition key and also must be
equality-join, doesn't it?
If ptable and t1 are joined using ptable.dist = t1.foo, we can distribute
t1 for each leaf table with "WHERE hash(foo)%4 = xxx" according to
the partition bounds, indeed.

In case when some of partition leafs are pruned, it is exactly beneficial
because relevant rows to be referenced by the pruned child relations
are waste of memory.

On the other hands, it eventually consumes almost equivalent amount
of memory to load the inner relations, if no leafs are pruned, and if we
could extend the Hash-node to share the hash-table with sibling join-nodess.

> > One idea I have is, sibling HashJoin shares a hash table that was built once
> > by any of the sibling Hash plan. Right now, it is not implemented yet.
>
> Yeah, I've thought a little bit about that in the context of Parallel
> Repartition.  I'm interested in combining intra-node partitioning
> (where a single plan node repartitions data among workers on the fly)
> with inter-node partitioning (like PWJ, where partitions are handled
> by different parts of the plan, considered at planning time); you
> finish up needing to have nodes in the plan that 'receive' tuples for
> each partition, to match up with the PWJ plan structure.  That's not
> entirely unlike CTE references, and not entirely unlike your idea of
> somehow sharing the same hash table.  I ran into a number of problems
> while thinking about that, which I should write about in another
> thread.
>
Hmm. Do you intend the inner-path may have different behavior according
to the partition bounds definition where the outer-path to be joined?
Let me investigate its pros & cons.

The reasons why I think the idea of sharing the same hash table is reasonable
in this scenario are:
1. We can easily extend the idea for parallel optimization. A hash table on DSM
    segment, once built, can be shared by all the siblings in all the
parallel workers.
2. We can save the memory consumption regardless of the join-keys and
    partition-keys, even if these are not involved in the query.

On the other hands, below are the downside. Potentially, combined use of
your idea may help these cases:
3. Distributed inner-relation cannot be outer side of XXX OUTER JOIN.
4. Hash table contains rows to be referenced by only pruned partition leafs.

Best regards,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Cleanup isolation specs from unused steps
Next
From: Peter Eisentraut
Date:
Subject: Re: using explicit_bzero