Estimate of the inner_rows - Mailing list pgsql-performance

From 陈雁飞
Subject Estimate of the inner_rows
Date
Msg-id 408f9706.17a2.191d0b90567.Coremail.postgresql_2016@163.com
Whole thread Raw
Responses Re: Estimate of the inner_rows
List pgsql-performance
Hi
I encounter an query plan problem like as the following. It's contain two nodes which assume the result is 1 and 7, but however, the last result is 7418. And the actual result is just 1, but because of the result is too big, which will affect the following join methods. And I've analyze the reason, but I think we can do bettwer.

postgres=# explain select * from test t1 left join test t2 on t1.b = t2.b and t2.c = 10 where t1.a = 1;
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.85..48.16 rows=7418 width=24)
   ->  Index Scan using a_idx on test t1  (cost=0.43..8.45 rows=1 width=12)
         Index Cond: (a = 1)
   ->  Index Scan using b_idx on test t2  (cost=0.43..39.64 rows=7 width=12)
         Index Cond: (b = t1.b)
         Filter: (c = 10)
(6 rows)

postgres=# \d test
                Table "public.test"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 a      | integer |           |          |
 b      | integer |           |          |
 c      | integer |           |          |
Indexes:
    "a_idx" btree (a)
    "b_idx" btree (b)

Throughing the source code, I know it how to get this result.
Firstly, the result 7 is assume fron the condition  t1.b = t2.b and t2.c = 10,  in the function clauselist_selectivity_ext compute the selectivity, and the result is:
  t2.b selc    *   t2.c = 10 selc           *  ntuples
(1/134830)  *  (1002795/1201000) *  1201000 = 7

Secondly, when compute the join selec, the compute function is eqjoinsel,and the result function is calc_joinrel_size_estimate
case JOIN_LEFT:
nrows = outer_rows * inner_rows * fkselec * jselec;
if (nrows < outer_rows)
nrows = outer_rows;
nrows *= pselec;
break;
outer_rows is 1, inner_rows is 1002795, which is the result the estimate result of t2.c = 10, while not the 7.
So, through the analyze, I think the reason is the estimate result of inner_rows, now we just consider the condition t2.c = 10, not the condition t1.b = t2,b and t2.c = 10.

pgsql-performance by date:

Previous
From: Rui DeSousa
Date:
Subject: Re: checking for a NULL date in a partitioned table kills performance
Next
From: Tom Lane
Date:
Subject: Re: Estimate of the inner_rows