Home > mailing lists

Re: a wrong index choose when statistics is out of date - Mailing list pgsql-hackers

From	Andy Fan
Subject	Re: a wrong index choose when statistics is out of date
Date	March 5, 2024 05:24:29
Msg-id	87le6xb5j9.fsf@163.com Whole thread Raw
In response to	Re: a wrong index choose when statistics is out of date (David Rowley <dgrowleyml@gmail.com>)
Responses	Re: a wrong index choose when statistics is out of date
List	pgsql-hackers

Tree view

David Rowley <dgrowleyml@gmail.com> writes:

> On Tue, 5 Mar 2024 at 00:37, Andy Fan <zhihuifan1213@163.com> wrote:
>>
>> David Rowley <dgrowleyml@gmail.com> writes:
>> > I don't think it would be right to fudge the costs in any way, but I
>> > think the risk factor for IndexPaths could take into account the
>> > number of unmatched index clauses and increment the risk factor, or
>> > "certainty_factor" as it is currently in my brain-based design. That
>> > way add_path() would be more likely to prefer the index that matches
>> > the most conditions.
>>
>> This is somehow similar with my proposal at [1]?  What do you think
>> about the treat 'col op const' as 'col op $1' for the marked column?
>> This could just resolve a subset of questions in your mind, but the
>> method looks have a solid reason.
>
> Do you mean this?

Yes, it is not cautious enough to say "similar" too quick.

After reading your opinion again, I think what you are trying to do is
adding one more dimension to Path compared with the existing cost and
pathkey information and it would take effects on add_path stage. That is
impressive, and I'm pretty willing to do more testing once the v1 is
done.

I just noted you have expressed your idea about my proposal 1,

> We should do anything like add column options in the meantime. Those
> are hard to remove once added.

I will try it very soon.  and I'm a bit of upset no one care about my
proposal 2 which is the AI method, I see many companies want to
introduce AI to planner even I don't seen any impressive success, but
this user case looks like a candidate. 

>> + /*
>> + * To make the planner more robust to handle some inaccurate statistics
>> + * issue, we will add a extra cost to qpquals so that the less qpquals
>> + * the lower cost it has.
>> + */
>> + cpu_run_cost += 0.01 * list_length(qpquals);
>
> I don't think it's a good idea to add cost penalties like you proposed
> there. This is what I meant by "I don't think it would be right to
> fudge the costs in any way".
>
> If you modify the costs to add some small penalty so that the planner
> is more likely to favour some other plan, what happens if we then
> decide the other plan has some issue and we want to penalise that for
> some other reason? Adding the 2nd penalty might result in the original
> plan choice again. Which one should be penalised more? I think the
> uncertainty needs to be tracked separately.
>
> Fudging the costs like this is also unlikely to play nicely with
> add_path's use of STD_FUZZ_FACTOR.  There'd be an incentive to do
> things like total_cost *= STD_FUZZ_FACTOR; to ensure we get a large
> enough penalty.

I agree and I just misunderstood your proposal yesterday. 

-- 
Best Regards
Andy Fan

pgsql-hackers by date:

From: Shlok Kyal
Date: 05 March 2024, 05:17:14
Subject: Re: speed up a logical replica setup

From: Andrei Lepikhov
Date: 05 March 2024, 05:30:25
Subject: Re: POC, WIP: OR-clause support for indexes

Re: a wrong index choose when statistics is out of date - Mailing list pgsql-hackers

Previous

Next