On Fri, 2020-06-19 at 17:12 +0200, Benjamin Coutu wrote:
> please consider the following SQL query:
>
> SELECT * FROM "transactions" WHERE
> "account" IN (SELECT "ID" FROM "accounts" WHERE "name" ~~* '%test%') OR
> "contract" IN (SELECT "ID" FROM "contracts" WHERE "name" ~~* '%test%')
>
> This yields the following plan on Postgres 11:
>
> Seq Scan on transactions (cost=67.21..171458.03 rows=1301316 width=1206)
> Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
> SubPlan 1
> -> Bitmap Heap Scan on accounts (cost=33.36..61.16 rows=46 width=4)
> Recheck Cond: ((name)::text ~~* '%test%'::text)
> -> Bitmap Index Scan on s_accounts (cost=0.00..33.35 rows=46 width=0)
> Index Cond: ((name)::text ~~* '%test%'::text)
> SubPlan 2
> -> Seq Scan on contracts (cost=0.00..5.93 rows=5 width=4)
> Filter: ((name)::text ~~* '%test%'::text)
>
> So the where clause of this query has just two subplans OR-ed together, one is estimated to yield 46 rows and one is
estimatedto yield 5 rows.
> I'd expect the total rows for the seqscan to be estimated at 46 then, following the logic that rows_seqscan =
max(rows_subplan1,rows_subplan2). As you can see, the optimizer estimates a whopping
> 1301316 rows instead.
>
> I am absolutely aware that those are hashed sub plans below a seqscan and that Postgres therefore has to scan all
tuplesof the table. But the problem is that upper nodes (which are excluded from
> this example for simplicity) think they will receive 1301316 rows from the seqscan, when in fact they will probably
onlysee a hand full, which the planner could have (easily?) deduced by taking the
> greater of the two subplan row estimates.
>
> What am I missing, or is this perhaps a shortfall of the planner?
The subplans are executed *fpr each row* found in "transactions",
and the estimate on the subplans is *per execution".
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com