* Distribute Streams: read from one thread, write to multiple threads * Repartition Streams: both read and write from/to multiple threads * Gather Streams: read from multiple threads, write to one thread
Robert, thanks for asking. We'll be stuck with these words for some time, user visible via EXPLAIN so this is important.
In general we should stick to words already used in other similar situations, which could include DBMS and parallel ETL tools, of which there are many more than mentioned here.
I would be against using any of these words: Funnel, Motion, Bushy because I don't find them very descriptive (I think of spiders, bowels and shrubs respectively, sorry).
These words are liable to confusion with other concepts: Replicate, Duplicate, Distribute, Partition, Repartition, MERGE.
I've seen this concept called Fan-In/Fan-Out and Scatter/Gather
The main operations are the 3 mentioned by Nicolas:
1. Send data from many to one - which has subtypes for Unsorted, Sorted and Evenly balanced (but unsorted)
2. Send data from one process to many
3. Send data from many to many
My preferences for this would be
1. Gather (but not Gather Motion) e.g. Gather, Gather Sorted
2. Scatter (since Broadcast only makes sense in the context of a distributed query, it sounds weird for intra-node query)
3. Redistribution - which implies the description of how we spread data across nodes is "Distribution" (or DISTRIBUTED BY)
For 3 we should definitely use Redistribute, since this is what Teradata has been calling it for 30 years, which is where Greenplum got it from.
For 1, Gather makes most sense.
For 2, it could be either Scatter or Distribute. The former works well with Gather, the latter works well with Redistribute.
Sorry for my absence for further review on parallel ops.
--
Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services