Unclamped row estimates whith OR-ed subplans - Mailing list pgsql-performance

From Benjamin Coutu
Subject Unclamped row estimates whith OR-ed subplans
Date
Msg-id 20200619151232.7F3285FB26@mx.zeyos.com
Whole thread Raw
Responses Re: Unclamped row estimates whith OR-ed subplans  (Laurenz Albe <laurenz.albe@cybertec.at>)
Re: Unclamped row estimates whith OR-ed subplans  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Hello,

please consider the following SQL query:

SELECT * FROM "transactions" WHERE
    "account" IN (SELECT "ID" FROM "accounts" WHERE "name" ~~* '%test%') OR
    "contract" IN (SELECT "ID" FROM "contracts" WHERE "name" ~~* '%test%')

This yields the following plan on Postgres 11:

Seq Scan on transactions  (cost=67.21..171458.03 rows=1301316 width=1206)
  Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
  SubPlan 1
    ->  Bitmap Heap Scan on accounts  (cost=33.36..61.16 rows=46 width=4)
          Recheck Cond: ((name)::text ~~* '%test%'::text)
          ->  Bitmap Index Scan on s_accounts  (cost=0.00..33.35 rows=46 width=0)
                Index Cond: ((name)::text ~~* '%test%'::text)
  SubPlan 2
    ->  Seq Scan on contracts  (cost=0.00..5.93 rows=5 width=4)
          Filter: ((name)::text ~~* '%test%'::text)

So the where clause of this query has just two subplans OR-ed together, one is estimated to yield 46 rows and one is
estimatedto yield 5 rows. 
I'd expect the total rows for the seqscan to be estimated at 46 then, following the logic that rows_seqscan =
max(rows_subplan1,rows_subplan2). As you can see, the optimizer estimates a whopping 1301316 rows instead. 

I am absolutely aware that those are hashed sub plans below a seqscan and that Postgres therefore has to scan all
tuplesof the table. But the problem is that upper nodes (which are excluded from this example for simplicity) think
theywill receive 1301316 rows from the seqscan, when in fact they will probably only see a hand full, which the planner
couldhave (easily?) deduced by taking the greater of the two subplan row estimates. 

What am I missing, or is this perhaps a shortfall of the planner?

Thanks,

Ben

--

Bejamin Coutu
ben.coutu@zeyos.com

ZeyOS GmbH & Co. KG
http://www.zeyos.com



pgsql-performance by date:

Previous
From: Laurenz Albe
Date:
Subject: Re: simple query running for ever
Next
From: Laurenz Albe
Date:
Subject: Re: Unclamped row estimates whith OR-ed subplans