Re: fix cost subqueryscan wrong parallel cost - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: fix cost subqueryscan wrong parallel cost |
Date | |
Msg-id | 439901.1651260710@sss.pgh.pa.us Whole thread Raw |
In response to | Re: fix cost subqueryscan wrong parallel cost ("David G. Johnston" <david.g.johnston@gmail.com>) |
Responses |
Re: fix cost subqueryscan wrong parallel cost
Re: fix cost subqueryscan wrong parallel cost |
List | pgsql-hackers |
"David G. Johnston" <david.g.johnston@gmail.com> writes: > The fact that (baserel.rows > path->subpath->rows) here seems like a > straight bug: there are no filters involved in this case but in the > presence of filters baserel->rows should be strictly (<= > path->subpath->rows), right? No, because the subpath's rowcount has been derated to represent the number of rows any one worker is expected to process. Since the SubqueryScan is below the Gather, it has to represent that number of rows too. Like I said, this design is pretty confusing; though I do not have a better alternative to suggest. Anyway, after chewing on this for awhile, it strikes me that the sanest way to proceed is for cost_subqueryscan to compute its rows estimate as the subpath's rows estimate times the selectivity of the subqueryscan's quals (if any). That'd be the natural way to proceed for most sorts of non-bottom-level paths to begin with. I think there are a few reasons why cost_subqueryscan currently does it the way it does: * By analogy to other sorts of relation-scan nodes. But those don't have any subpath that they could consult instead. This analogy is really a bit faulty, since SubqueryScan isn't a relation scan node in any ordinary meaning of that term. * To ensure that all paths for the same relation have the same rowcount estimate (modulo different parameterization or parallelism). But we'll have that anyway because the only possible path type for an unflattened RTE_SUBQUERY rel is a SubqueryScan, so there's no risk of different cost_xxx functions arriving at slightly different estimates. * To avoid redundant computation of the quals' selectivity. This is slightly annoying; but in practice the quals will always be RestrictInfos which will cache the per-qual selectivities, so it's not going to cost us very much to perform that calculation over again. So perhaps we should do it more like the attached, which produces this plan for the UNION case: Unique (cost=28994.85..30194.85 rows=120000 width=12) -> Sort (cost=28994.85..29294.85 rows=120000 width=12) Sort Key: t.a, t.b, t.c -> Gather (cost=0.00..2600.00 rows=120000 width=12) Workers Planned: 2 -> Parallel Append (cost=0.00..2600.00 rows=50000 width=12) -> Parallel Seq Scan on t (cost=0.00..575.00 rows=25000 width=12) -> Parallel Seq Scan on t t_1 (cost=0.00..575.00 rows=25000 width=12) It'd still be a good idea to fix the cost estimation to not charge extra for SubqueryScans that will be elided later, because this Parallel Append's cost should be 1400 not 2600. But as I showed upthread, that won't affect the plan choice for this particular test case. regards, tom lane #text/x-diff; name="change-cost_subqueryscan-rowcount-estimate-2.patch" [change-cost_subqueryscan-rowcount-estimate-2.patch]/home/postgres/zzz
pgsql-hackers by date: