Re: Group by reordering optimization - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Group by reordering optimization
Date
Msg-id 20200901210743.lutgvnfzntvhuban@development
Whole thread Raw
In response to Group by reordering optimization  (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses Re: Group by reordering optimization  (Peter Geoghegan <pg@bowt.ie>)
Re: Group by reordering optimization  (Dmitry Dolgov <9erthalion6@gmail.com>)
List pgsql-hackers
On Tue, Sep 01, 2020 at 01:15:31PM +0200, Dmitry Dolgov wrote:
>Hi,
>
>Better late than never, to follow up on the original thread [1] I would like to
>continue the discussion with the another version of the patch for group by
>reordering optimization. To remind, it's about reordering of group by clauses
>to do sorting more efficiently. The patch is rebased and modified to address
>(at least partially) the suggestions about making it consider new additional
>paths instead of changing original ones. It is still pretty much
>proof-of-concept version though with many blind spots, but I wanted to start
>kicking it and post at least something, otherwise it will never happen. An
>incremental approach so to say.
>
>In many ways it still contains the original code from Teodor. Changes and notes:
>
>* Instead of changing the order directly, now patch creates another patch with
>  modifier order of clauses. It does so for the normal sort as well as for
>  incremental sort. The whole thing is done in two steps: first it finds a
>  potentially better ordering taking into account number of groups, widths and
>  comparison costs; afterwards this information is used to produce a cost
>  estimation. This is implemented via a separate create_reordered_sort_path to
>  not introduce too many changes, I couldn't find any better place.
>

I haven't tested the patch with any queries, but I agree this seems like
the right approach in general.

I'm a bit worried about how complex the code in planner.c is getting -
the incremental sort patch already made it a bit too complex, and this
is just another step in that direction.  I suppose we should refactor
add_paths_to_grouping_rel() by breaking it into smaller / more readable
pieces ...

>* Function get_func_cost was removed at some point, but unfortunately this
>  patch was implemented before that, so it's still present there.
>
>* For simplicity I've removed support in create_partial_grouping_paths, since
>  they were not covered by the existing tests anyway.
>

Hmmm, OK. I think that's something we'll need to address for the final
patch, but I agree we can add it after improving the costing etc.

>* The costing part is pretty rudimentary and looks only at the first group.
>  It's mostly hand crafted to pass the existing tests.
>

OK, understood.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: v13: show extended stats target in \d
Next
From: Peter Eisentraut
Date:
Subject: Re: Maximum password length