On 12/9/2024 12:12, David Rowley wrote:
> On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote:
>> Initial problem causes wrong cost_sort estimation. Right now I think
>> about providing cost_sort() the sort clauses instead of (or in addition
>> to) the pathkeys.
>
> I'm not quite sure why the sort clauses matter any more than the
> EquivalenceClass. If the EquivalanceClass defines that all members
> will have the same value for any given row, then, if we had to choose
> any single member to drive the n_distinct estimate from, isn't the
> most accurate distinct estimate from the member with the smallest
> n_distinct estimate? (That assumes the less distinct member has every
> value the more distinct member has, which might not be true)
Thanks for your efforts! Your idea looks more stable and applicable than
my patch.
BTW, it could still provide wrong ndistinct estimations if we choose a
sorting operator under clauses mentioned in the EquivalenceClass.
However, this thread's primary intention is to stabilize query plans, so
I'll try to implement your idea.
The second reason was to distinguish sortings by cost (see proposal [1])
because sometimes it could help to save CPU cycles on comparisons.
Having a lot of sort/grouping queries with only sporadic joins, I see
how profitable it could sometimes be - text or numeric grouping over
mostly Cartesian join may be painful without fine tuned sorting.
[1]
https://www.postgresql.org/message-id/8742aaa8-9519-4a1f-91bd-364aec65f5cf@gmail.com
--
regards, Andrei Lepikhov