Re: Unclamped row estimates whith OR-ed subplans - Mailing list pgsql-performance

From Tom Lane
Subject Re: Unclamped row estimates whith OR-ed subplans
Date
Msg-id 1666226.1592582687@sss.pgh.pa.us
Whole thread Raw
In response to Unclamped row estimates whith OR-ed subplans  ("Benjamin Coutu" <ben.coutu@zeyos.com>)
List pgsql-performance
"Benjamin Coutu" <ben.coutu@zeyos.com> writes:
> please consider the following SQL query:

> SELECT * FROM "transactions" WHERE
>     "account" IN (SELECT "ID" FROM "accounts" WHERE "name" ~~* '%test%') OR
>     "contract" IN (SELECT "ID" FROM "contracts" WHERE "name" ~~* '%test%')

> This yields the following plan on Postgres 11:

> Seq Scan on transactions  (cost=67.21..171458.03 rows=1301316 width=1206)
>   Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
>   SubPlan 1
>     ->  Bitmap Heap Scan on accounts  (cost=33.36..61.16 rows=46 width=4)
>           Recheck Cond: ((name)::text ~~* '%test%'::text)
>           ->  Bitmap Index Scan on s_accounts  (cost=0.00..33.35 rows=46 width=0)
>                 Index Cond: ((name)::text ~~* '%test%'::text)
>   SubPlan 2
>     ->  Seq Scan on contracts  (cost=0.00..5.93 rows=5 width=4)
>           Filter: ((name)::text ~~* '%test%'::text)

> So the where clause of this query has just two subplans OR-ed together, one is estimated to yield 46 rows and one is
estimatedto yield 5 rows. 
> I'd expect the total rows for the seqscan to be estimated at 46 then, following the logic that rows_seqscan =
max(rows_subplan1,rows_subplan2). As you can see, the optimizer estimates a whopping 1301316 rows instead. 

No.  The subplan estimates are for the number of rows produced by one
execution of the subplan, ie the numbers of "accounts" or "contracts"
rows that match those inner WHERE conditions.  This has very little
a-priori relationship to the number of "transactions" rows that will
satisfy the outer WHERE condition.  If we knew that transactions.account
and transactions.contract were unique columns, then we might be able
to say that there shouldn't be more than one outer match per subplan
result row ... but you didn't say that, and it seems unlikely.

(Having said that, I think that the estimates for these cases very
possibly are quite stupid.  But that doesn't mean that 46+5 would
be the right answer.)

            regards, tom lane



pgsql-performance by date:

Previous
From: Laurenz Albe
Date:
Subject: Re: Unclamped row estimates whith OR-ed subplans
Next
From: "David G. Johnston"
Date:
Subject: Re: Unclamped row estimates whith OR-ed subplans