Re: parallel joins, and better parallel explain - Mailing list pgsql-hackers

From Robert Haas
Subject Re: parallel joins, and better parallel explain
Date
Msg-id CA+Tgmoai=NgNqdtE1aXP=YMntvkXHa_Knn1GLs4O7khz_=rHJw@mail.gmail.com
Whole thread Raw
In response to Re: parallel joins, and better parallel explain  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: parallel joins, and better parallel explain  (Greg Stark <stark@mit.edu>)
Re: parallel joins, and better parallel explain  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Thu, Nov 26, 2015 at 3:45 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Sounds like good progress.

Thanks.

> This gives us multiple copies of the hash table, which means we must either
> use N * work_mem, or we must limit the hash table to work_mem / N per
> partial plan.

We use N * work_mem in this patch.  The other option would be a lot
more expensive in terms of planning time, because we'd have to
generate one set of hash join paths (or at least hash join costs) for
use in serial plans and another set for use in parallel plans.  As it
is, the parallel stuff just piggybacks on the plans we're generating
anyway.  We might have to change that at some point, but I think we'll
do well to put that point off as long as possible.

> How can the partial paths share a hash table?

I'm glad you asked that question.  For that, we need an allocator that
can work with dynamic shared memory segments, and a hash table built
on top of that allocator.  It's relatively complicated because the
same DSM segments can be mapped at different addresses in different
processes, so you can't use native pointers.  However, I'm pleased to
report that my colleague Thomas Munro is working on this problem, and
that he will submit this work to pgsql-hackers when it's ready, which
may be soon enough that we can consider including this in 9.6 if the
design meets with general approval.

As you may recall, I previously proposed an allocator framework,
somewhat of a WIP in progress at that time, and the reaction here was
a bit lukewarm, which is why I shifted from parallel sort to parallel
seq scan as a first project.  I now think that was a good decision,
and as a result of Peter Geoghegan's work on sorting and my own
experience further developing the parallelism code, I no longer think
that allocator is the right solution for parallel sort anyway. But I
still think it might be the right solution for a parallel hash join
with a shared hash table.

My idea is that you'd end up with a plan like this:

Gather
-> Hash Join -> Parallel Seq Scan -> Parallel Hash   -> Parallel Seq Scan

Not only does this build only one copy of the hash table instead of N
copies, but we can parallelize the hash table construction itself by
having all workers insert in parallel, which is pretty cool.  However,
I don't expect this to be better than an unshared hash table in all
cases.  We have a fair amount of evidence that accessing
backend-private data structures can sometimes be much faster than
accessing shared data structures - cf. not only the syscaches and
relcache, but also the use of Materialize nodes by the planner in
certain cases even when there are no filtering quals underneath.  So
there's probably going to be a need to consider both types of plans
and decide between them based on memory usage and expected runtime.
The details are not all clear to me yet, and most likely we'll have to
postpone some of the decisions until the code is written and can be
performance-tested to see in which cases one strategy performs better
or worse than the other.  What I can confirm at this point is that
I've thought about the problem you're asking about here, and that
EnterpriseDB intends to contribute code to address it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Logical replication and multimaster
Next
From: Greg Stark
Date:
Subject: Re: parallel joins, and better parallel explain