Re: Unclamped row estimates whith OR-ed subplans - Mailing list pgsql-performance

From Laurenz Albe
Subject Re: Unclamped row estimates whith OR-ed subplans
Date
Msg-id 7334b30adf19881431a344be6372d270327739ba.camel@cybertec.at
Whole thread Raw
In response to Unclamped row estimates whith OR-ed subplans  ("Benjamin Coutu" <ben.coutu@zeyos.com>)
Responses Re: Unclamped row estimates whith OR-ed subplans  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-performance
On Fri, 2020-06-19 at 17:12 +0200, Benjamin Coutu wrote:
> please consider the following SQL query:
> 
> SELECT * FROM "transactions" WHERE 
>         "account" IN (SELECT "ID" FROM "accounts" WHERE "name" ~~* '%test%') OR
>         "contract" IN (SELECT "ID" FROM "contracts" WHERE "name" ~~* '%test%')
> 
> This yields the following plan on Postgres 11:
> 
> Seq Scan on transactions  (cost=67.21..171458.03 rows=1301316 width=1206)
>   Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
>   SubPlan 1
>     ->  Bitmap Heap Scan on accounts  (cost=33.36..61.16 rows=46 width=4)
>           Recheck Cond: ((name)::text ~~* '%test%'::text)
>           ->  Bitmap Index Scan on s_accounts  (cost=0.00..33.35 rows=46 width=0)
>                 Index Cond: ((name)::text ~~* '%test%'::text)
>   SubPlan 2
>     ->  Seq Scan on contracts  (cost=0.00..5.93 rows=5 width=4)
>           Filter: ((name)::text ~~* '%test%'::text)
> 
> So the where clause of this query has just two subplans OR-ed together, one is estimated to yield 46 rows and one is
estimatedto yield 5 rows.
 
> I'd expect the total rows for the seqscan to be estimated at 46 then, following the logic that rows_seqscan =
max(rows_subplan1,rows_subplan2). As you can see, the optimizer estimates a whopping
 
> 1301316 rows instead.
> 
> I am absolutely aware that those are hashed sub plans below a seqscan and that Postgres therefore has to scan all
tuplesof the table. But the problem is that upper nodes (which are excluded from
 
> this example for simplicity) think they will receive 1301316 rows from the seqscan, when in fact they will probably
onlysee a hand full, which the planner could have (easily?) deduced by taking the
 
> greater of the two subplan row estimates.
> 
> What am I missing, or is this perhaps a shortfall of the planner?

The subplans are executed *fpr each row* found in "transactions",
and the estimate on the subplans is *per execution".

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




pgsql-performance by date:

Previous
From: "Benjamin Coutu"
Date:
Subject: Unclamped row estimates whith OR-ed subplans
Next
From: Tom Lane
Date:
Subject: Re: Unclamped row estimates whith OR-ed subplans