Re: Unclamped row estimates whith OR-ed subplans - Mailing list pgsql-performance

From Tom Lane
Subject Re: Unclamped row estimates whith OR-ed subplans
Date
Msg-id 1729591.1592590609@sss.pgh.pa.us
Whole thread Raw
In response to Re: Unclamped row estimates whith OR-ed subplans  ("Benjamin Coutu" <ben.coutu@zeyos.com>)
Responses Re: Unclamped row estimates whith OR-ed subplans  (Michael Lewis <mlewis@entrata.com>)
List pgsql-performance
"Benjamin Coutu" <ben.coutu@zeyos.com> writes:
> I don't want to waste your time but maybe there is room for improvement as both "account" and "contract" are highly
distinctand the individual subplan estimates are quite accurate: 

Yeah, as I said, the estimates you're getting for the OR'd subplans are
pretty stupid.  Once you throw the OR in there, it's not possible to
convert the IN clauses to semi-joins, so they just stay as generic
subplans.  It looks like we have exactly zero intelligence about the
generic case --- unless I'm missing something in clause_selectivity,
you just end up with a default 0.5 selectivity estimate.  So yeah,
there's a lot of room for improvement, whenever anyone finds some
round tuits to work on that.

While you're waiting, you might think about recasting the query to
avoid the OR.  Perhaps you could do a UNION of two scans of the
transactions table?

> Btw, I don't quite understand why the nested loop on contract only is expected to yield 31662 rows, when the
null_fracof field transactions.contract is 1. Shouldn't that indicate zero rows or some kind of default minimum
estimatefor that query? 

That I don't understand.  I get a minimal rowcount estimate for an
all-nulls outer table, as long as I'm using just one IN rather than
an OR:

regression=# create table contracts (id int);
CREATE TABLE
regression=# insert into contracts values(1),(2),(3),(4);
INSERT 0 4
regression=# analyze contracts ;
ANALYZE
regression=# create table transactions (contract int);
CREATE TABLE
regression=# insert into transactions select null from generate_series(1,100000);
INSERT 0 100000
regression=# analyze transactions;
ANALYZE
regression=# explain select * from transactions where contract in (select id from contracts);
                                QUERY PLAN
--------------------------------------------------------------------------
 Hash Semi Join  (cost=1.09..1607.59 rows=1 width=4)
   Hash Cond: (transactions.contract = contracts.id)
   ->  Seq Scan on transactions  (cost=0.00..1344.00 rows=100000 width=4)
   ->  Hash  (cost=1.04..1.04 rows=4 width=4)
         ->  Seq Scan on contracts  (cost=0.00..1.04 rows=4 width=4)
(5 rows)

            regards, tom lane



pgsql-performance by date:

Previous
From: "Benjamin Coutu"
Date:
Subject: Re: Unclamped row estimates whith OR-ed subplans
Next
From: Michael Lewis
Date:
Subject: Re: Unclamped row estimates whith OR-ed subplans