Re: POC: GROUP BY optimization - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: POC: GROUP BY optimization |
Date | |
Msg-id | 20200516145609.vm7nrqy7frj4ha6r@development Whole thread Raw |
In response to | Re: POC: GROUP BY optimization (Dmitry Dolgov <9erthalion6@gmail.com>) |
Responses |
Re: POC: GROUP BY optimization
Re: POC: GROUP BY optimization |
List | pgsql-hackers |
On Sat, May 16, 2020 at 02:24:31PM +0200, Dmitry Dolgov wrote: >> On Fri, May 15, 2020 at 01:52:20AM +0200, Tomas Vondra wrote: >> >> I wonder if anyone has plans to try again with this optimization in v14 >> cycle? The patches no longer apply thanks to the incremental sort patch, >> but I suppose fixing that should not be extremely hard. >> >> The 2020-07 CF is still a couple weeks away, but it'd be good to know if >> there are any plans to revive this. I'm willing to spend some time on >> reviewing / testing this, etc. > >Yes, if you believe that this patch has potential, I would love to pick >it up again. > I think it's worth another try. I've been reminded of this limitation when working on the incremental sort, which significantly increases the number of orderings that we can sort efficiently. But we can't possibly leverage that unless it matches the GROUP BY ordering. The attached WIP patch somewhat addresses this by trying to reorder the group_pathkeys so allow leveraging sort of existing ordering (even just partial, with incremental sort). >> I've only quickly skimmed the old thread, but IIRC there were two main >> challenges in getting the optimization right: >> >> >> 1) deciding which orderings are interesting / worth additional work >> >> I think we need to consider these orderings, in addition to the one >> specified in GROUP BY: >> >> 1) as specified in ORDER BY (if different from 1) > >What is the idea behind considering this ordering? I'll try to answer this in general, i.e. what I think this patch needs to consider. Hopefully that'll answer why we should look at ORDER BY pathkeys too ... Reordering the pathkeys has both local and global effects, and we need to consider all of that to make the optimization reliable, otherwise we'll inevitably end up with trivial regressions. The local effects are trivial - it's for example reordering the pathkeys to make the explicit sort as cheap as possible. This thread already discussed a number of things to base this on - ndistinct for columns, cost of comparison function, ... In any case, this is something we can decide locally, when building the grouping paths. The global effects are much harder to tackle, because the decision can't be made locally when building the grouping paths. It requires changes both below and above the point where we build grouping paths. An example of a decision we need to make before we even get to building a grouping path is which index paths to build. Currently we only build index paths with "useful" pathkeys, and without tweaking that we'll never even see the index in add_paths_to_grouping_rel(). But there are also decisions that can be made only after we build the grouping paths. For example, we may have both GROUP BY and ORDER BY, and there is no "always correct" way to combine those. In some cases it may be correct to use the same pathkeys, in other cases it's better to use different ones (which will require an extra Sort, with additional cost). So I don't think there will be a single "interesting" grouping pathkeys (i.e. root->group_pathkeys), but a collection of pathkeys. And we'll need to build grouping paths for all of those, and leave the planner to eventually pick the one giving us the cheapest plan ... A "brute-force" approach would be to generate all possible orderings of group_pathkeys, but that's probably not going to fly - it might easily cause an explosion in number of paths we track, etc. So we'll need to pick/prioritize orderings that are more likely to give us efficient plans, and ORDER BY seems like a good example because it means we won't need an explicit sort. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: