Re: fix cost subqueryscan wrong parallel cost - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: fix cost subqueryscan wrong parallel cost |
Date | |
Msg-id | 343340.1651255791@sss.pgh.pa.us Whole thread Raw |
In response to | Re: fix cost subqueryscan wrong parallel cost (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: fix cost subqueryscan wrong parallel cost
|
List | pgsql-hackers |
I wrote: > So I think the actual problem here is that we leave the decision > to elide no-op SubqueryScan nodes till setrefs.c. We should detect > that earlier, and when it applies, label the SubqueryScanPath with > the exact cost of its child. Hmm ... actually, while doing that seems like it'd be a good idea, it doesn't have much bearing on the case at hand. I approximated the results of such a change on this test case by just removing the "cpu_tuple_cost" component from cost_subqueryscan: @@ -1420,7 +1420,7 @@ cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root, get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost); startup_cost = qpqual_cost.startup; - cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple; + cpu_per_tuple = qpqual_cost.per_tuple; run_cost = cpu_per_tuple * baserel->tuples; /* tlist eval costs are paid per output row, not per tuple scanned */ and what I got was Unique (cost=28144.85..29344.85 rows=120000 width=12) -> Sort (cost=28144.85..28444.85 rows=120000 width=12) Sort Key: "*SELECT* 1".a, "*SELECT* 1".b, "*SELECT* 1".c -> Append (cost=0.00..1750.00 rows=120000 width=12) -> Subquery Scan on "*SELECT* 1" (cost=0.00..575.00 rows=60000 width=12) -> Gather (cost=0.00..575.00 rows=60000 width=12) Workers Planned: 2 -> Parallel Seq Scan on t (cost=0.00..575.00 rows=25000 width=12) -> Subquery Scan on "*SELECT* 2" (cost=0.00..575.00 rows=60000 width=12) -> Gather (cost=0.00..575.00 rows=60000 width=12) Workers Planned: 2 -> Parallel Seq Scan on t t_1 (cost=0.00..575.00 rows=25000 width=12) The subqueryscans are being costed the way we want now, but it's still going for the append-atop-gathers plan. So I dug a bit deeper, and found that generate_union_paths does also create the gather-atop-append plan, but *it's costed at 1750 units*, exactly the same as the other path. So add_path is just making an arbitrary choice between two paths of identical cost, and it happens to pick this one. (If I don't make the above tweak to cost_subqueryscan, the same thing happens, although the two competing paths now each have cost 2950.) So: why do we come out with a cost of 1750 for the very same plan that in the UNION ALL case is costed at 1400? AFAICS it's because the UNION ALL case thinks that the two inputs of the parallel append produce 25000 rows apiece so the parallel append produces 50000 rows, and it costs the append's overhead on that basis. But in the UNION case, the parallel append sees two inputs that are claimed to return 60000 rows, so the append produces 120000 rows, meaning more append overhead. We can't readily get EXPLAIN to print this tree since it's rejected by add_path, but what I'm seeing (with the above costing hack) is {GATHERPATH :rows 120000 :startup_cost 0.00 :total_cost 1750.00 :subpath {APPENDPATH :parallel_aware true :parallel_safe true :parallel_workers 2 :rows 120000 :startup_cost 0.00 :total_cost 1750.00 :subpaths ( {SUBQUERYSCANPATH :rows 60000 :startup_cost 0.00 :total_cost 575.00 :subpath {PATH :parallel_aware true :parallel_safe true :parallel_workers 2 :rows 25000 :startup_cost 0.00 :total_cost 575.00 } } {SUBQUERYSCANPATH :rows 60000 :startup_cost 0.00 :total_cost 575.00 :subpath {PATH :parallel_aware true :parallel_safe true :parallel_workers 2 :rows 25000 :startup_cost 0.00 :total_cost 575.00 } } ) } } In short, these SubqueryScans are being labeled as producing 60000 rows when their input only produces 25000 rows, which is surely insane. So: even though the SubqueryScan itself isn't parallel-aware, the number of rows it processes has to be de-rated according to the number of workers involved. Perhaps something like the original patch in this thread is indeed correct. But I can't help feeling that this definition of path_rows is mighty unintuitive and error-prone, and that if cost_subqueryscan is wrong then it's likely got lots of company. Maybe we need to take two steps back and rethink the whole approach. regards, tom lane
pgsql-hackers by date: