Re: Optimize cardinality estimation when unique keys are fully covered - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Optimize cardinality estimation when unique keys are fully covered
Date
Msg-id 2222808.1763751127@sss.pgh.pa.us
Whole thread Raw
In response to Optimize cardinality estimation when unique keys are fully covered  ("feichanghong" <feichanghong@qq.com>)
List pgsql-hackers
"=?utf-8?B?ZmVpY2hhbmdob25n?=" <feichanghong@qq.com> writes:
> We may consider checking in cardinality estimation whether the restrictlist
> fully covers a unique index. If so, we can directly set the estimated rows
> to 1. The attached patch provides a very early demo of this approach.

I think this is far harder than you believe; a simplistic approach
like this will mainly result in breaking things.  The reason is that
we need to have the same estimate of the size of a join relation
regardless of how it is implemented (and, indeed, that size estimate
is made before we ever consider individual join paths).  Otherwise the
fundamental model of generating different join paths and comparing
them for merit breaks down, either at this join level or higher ones.
But the conclusion that "where t1.a = t2.a and t1.b = 0" means that
t1's rowcount is 1 only applies if the join is implemented as an inner
indexscan.  If we choose some other method, say a hash join based on
a seqscan of t1, having forced that rowcount to 1 would produce
completely false estimates.

Now, what you are proposing would make the EXPLAIN output cosmetically
better, in that instead of

 Nested Loop  (cost=0.43..1.57 rows=32 width=12)
   ->  Seq Scan on t2  (cost=0.00..1.01 rows=1 width=4)
   ->  Index Only Scan using t1_pkey on t1  (cost=0.43..4.76 rows=32 width=8)
         Index Cond: ((a = t2.a) AND (b = 0))

we would get a rowcount estimate of "1" for the parameterized t1 scan.
But the estimate for the output of the nestloop would have to remain
the same.  And the t1 scan is not where the problem is: it already
made the right choice there.  To the extent that this EXPLAIN output
is problematic, it's because if this join is part of a bigger query
then we may make poor choices at upper join levels due to having a bad
estimate of this join's output size.  I don't see how this line of
work can fix that.  (Yes, I see the hack you put into
set_joinrel_size_estimates.  It's a hack not a workable solution,
because it will distort the size estimates in many cases that are
not quite what you have here.)

            regards, tom lane



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Adding basic NUMA awareness
Next
From: Jacob Champion
Date:
Subject: [PATCH] Reorganize pqcomm.h a bit