> On 10/01/23 10:53, David Rowley wrote:
> On Tue, 10 Jan 2023 at 06:15, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
> > Do we have any pending items for this patch now?
>
> I'm just wondering if not trying this when the query has a DISTINCT
> clause is a copout. What I wanted to avoid was doing additional
> sorting work for WindowAgg just to have it destroyed by Hash
> Aggregate. I'm now wondering if adding both the original
> slightly-less-sorted path plus the new slightly-more-sorted path then
> if distinct decides to Hash Aggregate then it'll still be able to pick
> the cheapest input path to do that on. Unfortunately, our sort
> costing just does not seem to be advanced enough to know that sorting
> by fewer columns might be cheaper, so adding the additional path is
> likely just going to result in add_path() ditching the old
> slightly-less-sorted path due to the new slightly-more-sorted path
> having better pathkeys. So, we'd probably be wasting our time if we
> added both paths with the current sort costing code.
> Maybe we should try and do this for DISTINCT queries if the
> distinct_pathkeys match the orderby_pathkeys. That seems a little less
> copout-ish. If the ORDER BY is the same as the DISTINCT then it seems
> likely that the ORDER BY might opt to use the Unique path for DISTINCT
> since it'll already have the correct pathkeys. However, if the ORDER
> BY has fewer columns then it might be cheaper to Hash Aggregate and
> then sort all over again, especially so when the DISTINCT removes a
> large proportion of the rows.
>
> Ideally, our sort costing would just be better, but I think that
> raises the bar a little too high to start thinking of making
> improvements to that for this patch.
Let me take a stab at this. Depending on complexity, we can take
a call to address this in current patch or a follow up.
--
Regards,
Ankit Kumar Pandey