On Wed, Oct 2, 2019 at 10:22 AM James Coleman <jtc331@gmail.com> wrote:
> In all cases I've been starting with:
>
> set enable_hashjoin = off;
> set enable_nestloop = off;
> set max_parallel_workers_per_gather = 4;
> set min_parallel_index_scan_size = 0;
> set min_parallel_table_scan_size = 0;
> set parallel_setup_cost = 0;
> set parallel_tuple_cost = 0;
>
> I've also tried various combinations of random_page_cost,
> cpu_index_tuple_cost, cpu_tuple_cost.
>
> Interestingly I've noticed plans joining two relations that look like:
>
> Limit
> -> Merge Join
> Merge Cond: (t1.pk = t2.pk)
> -> Gather Merge
> Workers Planned: 4
> -> Parallel Index Scan using t_pkey on t t1
> -> Gather Merge
> Workers Planned: 4
> -> Parallel Index Scan using t_pkey on t t2
>
> Where I would have expected a Gather Merge above a parallelized merge
> join. Is that reasonable to expect?
Well, you told the planner that parallel_setup_cost = 0, so starting
workers is free. And you told the planner that parallel_tuple_cost =
0, so shipping tuples from the worker to the leader is also free. So
it is unclear why it should prefer a single Gather Merge over two
Gather Merges: after all, the Gather Merge is free!
If you use give those things some positive cost, even if it's smaller
than the default, you'll probably get a saner-looking plan choice.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company