Thread: a funnel by any other name
I have discovered that I have reinvented the wheel. In http://www.postgresql.org/message-id/CA+TgmobM7X6jgre442638b+33h1EWa=vcZqnsvzEdX057ZHVuw@mail.gmail.com I invented an operator called Funnel, whose job it is to fire up a bunch of workers and run a plan in all of them, so that we can eventually generate plans like this: Funnel Number of Workers: 4 -> Hash Join -> Partial Seq Scan on a -> Hash -> Seq Scan on b The idea is that each worker will read all of b and build a hash table, and about 1/4 of a, and do the join of its portion of a to all of b, and then the funnel will combine the separate flows of tuples into one. Funny thing though, it turns out that I'm not the first person to invent this operator. Greenplum has it, and there it's called "Gather Motion".[1] However, it seems like the most systems use the name "Exchange" to refer to a more powerful operator which can do this thing among others.[2,3] The Volcano paper[3] is particularly interesting for a couple of reasons. First, it's about 20 years old. Second, it may be the earliest reference to the exchange operator; the authors speak as if they invented it. Third, it describes several variants of exchange: 1. One parent, one child (see p. 11, under A. Vertical Parallelism). The child can execute in parallel with the parent. This is basically the same as a Funnel if you had only one worker and didn't ever let the parent execute the stuff under the funnel. 2. One parent, multiple children (see p. 12, last paragraph). Classic funnel. 3. Multiple parents, multiple children, with a copy to each parent (see p. 13, under "Variants of the Exchange Operator"). 4. Exchange-merge (see p. 13, column two). This is like what I described as a funnel, but instead of returning all of the results in the order they're returned, it merges a set of streams into a single stream. 5. Interchange (see end of p. 13, continuing on p. 14). This is an operator for "repartitioning" or "distributed shuffle" or whatever you want to call it. The portion of the plan tree below the operator is run to generate tuples, which are then consumed by the portion of the plan tree above the operator. But between the two the tuples are shuffled back and forth between cooperating workers so that each worker ends up with the tuples that map to some partition. For example, you can imagine that below the interchange you might have a parallel sequential scan of a table with a column x. Let h be a hash function. The interchange operator will move the tuples around so that, above the interchange, each worker will see all and only those tuples where h(x) % number_of_workers = my_worker_number. This paper basically calls all of these things an Exchange, but they're clearly all somewhat different from each other, so I'm not sure it's a good idea to use the name Exchange for our node. At the same time, I'm not sure it's a good idea to use terminology that I invented completely out of whole cloth in preference to terminology that's seems to be somewhat standard. One idea is to call them all Exchange nodes but with a subtype. For example, using the Volcano's paper's terminology, we could call these: 1. Exchange Bushy 2. Exchange Inter-Operator (this is what's currently implemented) 3. Exchange Replicate 4. Exchange Merge 5. Interchange Or taking inspiration from Greenplum, we could go with: 1. ? 2. Gather 3. Broadcast (sorta) 4. Gather Merge 5. Redistribute Or maybe something like this: 1. Parallel Child 2. Parallel Gather 3. Parallel Replicate 4. Parallel Merge 5. Parallel Redistribute Or, yet another option, we could combine the similar operators under one umbrella while keeping the things that are more different as separate nodes: 1, 2. Exchange (or Gather or Funnel) 3, 5. Distribute (or Redistribute or Interchange or Exchange) 4. Exchange Merge (or Gather Merge or Funnel Merge) Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] http://www.ndm.net/datawarehouse/Greenplum/parallel-query-optimizer [2] http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc00743.1570/html/queryprocessing/CHDHHIIF.htm [3] http://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf
On 2015-09-17 04:39, Robert Haas wrote: > > 1. Exchange Bushy > 2. Exchange Inter-Operator (this is what's currently implemented) > 3. Exchange Replicate > 4. Exchange Merge > 5. Interchange > > Or taking inspiration from Greenplum, we could go with: > > 1. ? > 2. Gather > 3. Broadcast (sorta) > 4. Gather Merge > 5. Redistribute > > Or maybe something like this: > > 1. Parallel Child > 2. Parallel Gather > 3. Parallel Replicate > 4. Parallel Merge > 5. Parallel Redistribute > > Or, yet another option, we could combine the similar operators under > one umbrella while keeping the things that are more different as > separate nodes: > > 1, 2. Exchange (or Gather or Funnel) > 3, 5. Distribute (or Redistribute or Interchange or Exchange) > 4. Exchange Merge (or Gather Merge or Funnel Merge) > > Thoughts? > Interesting read. I think 1 and 2 are similar enough to be same node (Exchange sounds good to me). Exchange Merge for 4 also sounds good. About 3 and 5, if I understand correctly those are similar with the main difference being that in 3 all parents get copy of every tuple while in 5 the tuples are partitioned between the parents. Sounds reasonable to have Redistribute/Interchange or something like that for both with some additional info saying if tuples are being partitioned or duplicated. In any case, let's not name any of the nodes as "Replicate". -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Sep 17, 2015 at 8:09 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> Or, yet another option, we could combine the similar operators under
> one umbrella while keeping the things that are more different as
> separate nodes:
>
> 1, 2. Exchange (or Gather or Funnel)
> 3, 5. Distribute (or Redistribute or Interchange or Exchange)
> 4. Exchange Merge (or Gather Merge or Funnel Merge)
>
With Regards,
Amit Kapila.
>
> Or, yet another option, we could combine the similar operators under
> one umbrella while keeping the things that are more different as
> separate nodes:
>
> 1, 2. Exchange (or Gather or Funnel)
> 3, 5. Distribute (or Redistribute or Interchange or Exchange)
> 4. Exchange Merge (or Gather Merge or Funnel Merge)
>
+1 for combining, but it seems better to call 1,2 as Parallel Gather
and similarly for others. Adding Parallel to Gather makes it
self-explanatory.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
2015-09-17 Robert Haas <robertmhaas@gmail.com>: > 1. Exchange Bushy > 2. Exchange Inter-Operator (this is what's currently implemented) > 3. Exchange Replicate > 4. Exchange Merge > 5. Interchange > 1. ? > 2. Gather > 3. Broadcast (sorta) > 4. Gather Merge > 5. Redistribute > 1. Parallel Child > 2. Parallel Gather > 3. Parallel Replicate > 4. Parallel Merge > 5. Parallel Redistribute FYI, SQL Server has these in its execution plans: * Distribute Streams: read from one thread, write to multiple threads * Repartition Streams: both read and write from/to multiple threads * Gather Streams: read from multiple threads, write to one thread Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad?
On 17 September 2015 at 05:07, Nicolas Barbier <nicolas.barbier@gmail.com> wrote:
--
2015-09-17 Robert Haas <robertmhaas@gmail.com>:
> 1. Exchange Bushy
> 2. Exchange Inter-Operator (this is what's currently implemented)
> 3. Exchange Replicate
> 4. Exchange Merge
> 5. Interchange
> 1. ?
> 2. Gather
> 3. Broadcast (sorta)
> 4. Gather Merge
> 5. Redistribute
> 1. Parallel Child
> 2. Parallel Gather
> 3. Parallel Replicate
> 4. Parallel Merge
> 5. Parallel Redistribute
FYI, SQL Server has these in its execution plans:
* Distribute Streams: read from one thread, write to multiple threads
* Repartition Streams: both read and write from/to multiple threads
* Gather Streams: read from multiple threads, write to one thread
Robert, thanks for asking. We'll be stuck with these words for some time, user visible via EXPLAIN so this is important.
In general we should stick to words already used in other similar situations, which could include DBMS and parallel ETL tools, of which there are many more than mentioned here.
I would be against using any of these words: Funnel, Motion, Bushy because I don't find them very descriptive (I think of spiders, bowels and shrubs respectively, sorry).
These words are liable to confusion with other concepts: Replicate, Duplicate, Distribute, Partition, Repartition, MERGE.
I've seen this concept called Fan-In/Fan-Out and Scatter/Gather
The main operations are the 3 mentioned by Nicolas:
1. Send data from many to one - which has subtypes for Unsorted, Sorted and Evenly balanced (but unsorted)
2. Send data from one process to many
3. Send data from many to many
My preferences for this would be
1. Gather (but not Gather Motion) e.g. Gather, Gather Sorted
2. Scatter (since Broadcast only makes sense in the context of a distributed query, it sounds weird for intra-node query)
3. Redistribution - which implies the description of how we spread data across nodes is "Distribution" (or DISTRIBUTED BY)
For 3 we should definitely use Redistribute, since this is what Teradata has been calling it for 30 years, which is where Greenplum got it from.
For 1, Gather makes most sense.
For 2, it could be either Scatter or Distribute. The former works well with Gather, the latter works well with Redistribute.
Sorry for my absence for further review on parallel ops.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Sep 22, 2015 at 10:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Robert, thanks for asking. We'll be stuck with these words for some time, > user visible via EXPLAIN so this is important. I agree, thanks for taking an interest. > The main operations are the 3 mentioned by Nicolas: > 1. Send data from many to one - which has subtypes for Unsorted, Sorted and > Evenly balanced (but unsorted) > 2. Send data from one process to many > 3. Send data from many to many > > My preferences for this would be > 1. Gather (but not Gather Motion) e.g. Gather, Gather Sorted > 2. Scatter (since Broadcast only makes sense in the context of a distributed > query, it sounds weird for intra-node query) > 3. Redistribution - which implies the description of how we spread data > across nodes is "Distribution" (or DISTRIBUTED BY) "Scatter" isn't one of the things that I mentioned in my original email. Not sure where we'd use that, although there might be somewhere. > For 3 we should definitely use Redistribute, since this is what Teradata has > been calling it for 30 years, which is where Greenplum got it from. That's a reasonable option. We can bikeshed it some more when we get that far. > For 1, Gather makes most sense. Yeah, I'm leaning that way myself. Amit argued for "Parallel Gather" but I think that's overkill. There can't be a non-parallel gather, and long names are a pain. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Tue, Sep 22, 2015 at 10:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > For 1, Gather makes most sense. > > Yeah, I'm leaning that way myself. Amit argued for "Parallel Gather" > but I think that's overkill. There can't be a non-parallel gather, > and long names are a pain. "Gather" seems a pretty decent choice to me too, even if we only have a single worker (your "1"). I don't think there's much need to distinguish 1 from 2, is there? We can bikeshed the other names when the time comes; the insight in the thread is good to have. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 22 September 2015 at 20:34, Robert Haas <robertmhaas@gmail.com> wrote:
--
On Tue, Sep 22, 2015 at 10:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Robert, thanks for asking. We'll be stuck with these words for some time,
> user visible via EXPLAIN so this is important.
I agree, thanks for taking an interest.
> The main operations are the 3 mentioned by Nicolas:
> 1. Send data from many to one - which has subtypes for Unsorted, Sorted and
> Evenly balanced (but unsorted)
> 2. Send data from one process to many
> 3. Send data from many to many
>
> My preferences for this would be
> 1. Gather (but not Gather Motion) e.g. Gather, Gather Sorted
> 2. Scatter (since Broadcast only makes sense in the context of a distributed
> query, it sounds weird for intra-node query)
> 3. Redistribution - which implies the description of how we spread data
> across nodes is "Distribution" (or DISTRIBUTED BY)
"Scatter" isn't one of the things that I mentioned in my original
email. Not sure where we'd use that, although there might be
somewhere.
Understood. Thought it best to cover all the phrases we'll use in the future now in one discussion.
> For 3 we should definitely use Redistribute, since this is what Teradata has
> been calling it for 30 years, which is where Greenplum got it from.
That's a reasonable option. We can bikeshed it some more when we get that far.
Sure
> For 1, Gather makes most sense.
Yeah, I'm leaning that way myself. Amit argued for "Parallel Gather"
but I think that's overkill. There can't be a non-parallel gather,
and long names are a pain.
Agreed
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 22 September 2015 at 21:14, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
--
Robert Haas wrote:
> On Tue, Sep 22, 2015 at 10:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > For 1, Gather makes most sense.
>
> Yeah, I'm leaning that way myself. Amit argued for "Parallel Gather"
> but I think that's overkill. There can't be a non-parallel gather,
> and long names are a pain.
"Gather" seems a pretty decent choice to me too, even if we only have a
single worker (your "1"). I don't think there's much need to
distinguish 1 from 2, is there?
I think so. 1 is Many->1 and the other is 1->Many.
You may wish to do an operation like a parallel merge join.
Parallel Sort -> Scatter -> Parallel Merge
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services