Re: a wrong index choose when statistics is out of date - Mailing list pgsql-hackers

From Andy Fan
Subject Re: a wrong index choose when statistics is out of date
Date
Msg-id 87msra9vgo.fsf@163.com
Whole thread Raw
In response to Re: a wrong index choose when statistics is out of date  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: a wrong index choose when statistics is out of date
List pgsql-hackers
David Rowley <dgrowleyml@gmail.com> writes:

> On Wed, 6 Mar 2024 at 02:09, Andy Fan <zhihuifan1213@163.com> wrote:
>>     This patch introduces a new attoptions like this:
>>
>>     ALTER TABLE t ALTER COLUMN col set (force_generic=true);
>>
>>     Then selfunc.c realizes this and ignore the special Const value, then
>>     average selectivity is chosen. This fall into the weakness of generic
>>     plan, but this patch doesn't introduce any new weakness and we leave the
>>     decision to user which could resolve some problem. Also this logic only
>>     apply to eqsel since the ineqsel have the get_actual_variable_range
>>     mechanism which is helpful for index choose case at least.
>
> If you don't want the planner to use the statistics for the column why
> not just do the following?

Acutally I didn't want the planner to ignore the statistics totally, I
want the planner to treat the "Const" which probably miss optimizer part
average, which is just like what we did for generic plan for the blow
query.  

prepare s as SELECT * FROM t WHERE a = $1 and b = $2;
explain (costs off) execute s(109, 8);
           QUERY PLAN            
---------------------------------
 Index Scan using t_a_c_idx on t
   Index Cond: (a = 109)
   Filter: (b = 8) 

(3 rows)

custom plan, Wrong index due to we have a bad estimation for a = 109.


set plan_cache_mode to force_generic_plan ;
explain (costs off) execute s(109, 8);
              QUERY PLAN               
---------------------------------------
 Index Scan using t_a_b_idx on t
   Index Cond: ((a = $1) AND (b = $2))   -- Correct index.
(2 rows)

Generic plan - we use the average estimation for the missed optimizer
statistics part and *if the new value is not so different from existing
ones*, we can get a disired result. 

It is true that the "generic" way is not as exactly accurate as the
"custom" way since the later one can use the data in MCV, but that is
the cost we have to pay to make the missed optimizer statistics less
imporant and generic plan has the same issue as well. As for this
aspect, I think the way you proposed probably have a wider use case.

-- 
Best Regards
Andy Fan




pgsql-hackers by date:

Previous
From: Jelte Fennema-Nio
Date:
Subject: Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Next
From: Dean Rasheed
Date:
Subject: Re: bug report: some issues about pg_15_stable(8fa4a1ac61189efffb8b851ee77e1bc87360c445)