Re: POC: GROUP BY optimization - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: POC: GROUP BY optimization |
Date | |
Msg-id | 20201027201509.jfv6sfhrmxhdwlvk@development Whole thread Raw |
In response to | Re: POC: GROUP BY optimization (Dmitry Dolgov <9erthalion6@gmail.com>) |
Responses |
Re: POC: GROUP BY optimization
|
List | pgsql-hackers |
On Mon, Oct 26, 2020 at 11:40:40AM +0100, Dmitry Dolgov wrote: >> On Mon, Oct 26, 2020 at 01:28:59PM +0400, Pavel Borisov wrote: >> > Thanks for your interest! FYI there is a new thread about this topic [1] >> > with the next version of the patch and more commentaries (I've created >> > it for visibility purposes, but probably it also created some confusion, >> > sorry for that). >> > >> > Thanks! >> >> I made a very quick look at your updates and noticed that it is intended to >> be simple and some parts of the code are removed as they have little test >> coverage. I'd propose vice versa to increase test coverage to enjoy more >> precise cost calculation and probably partial grouping. >> >> Or maybe it's worth to benchmark both patches and then re-decide what we >> want more to have a more complicated or a simpler version. >> >> Good to know that this feature is not stuck anymore and we have more than >> one proposal. >> Thanks! > >Just to clarify, the patch that I've posted in another thread mentioned >above is not an alternative proposal, but a development of the same >patch I had posted in this thread. As mentioned in [1], reduce of >functionality is an attempt to reduce the scope, and as soon as the base >functionality looks good enough it will be returned back. > I find it hard to follow two similar threads trying to do the same (or very similar) things in different ways. Is there any chance to join forces and produce a single patch series merging the changes? With the "basic" functionality at the beginning, then patches with the more complex stuff. That's the usual way, I think. As I said in my response on the other thread [1], I think constructing additional paths with alternative orderings of pathkeys is the right approach. Otherwise we can't really deal with optimizations above the place where we consider this optimization. That's essentially what I was trying in explain May 16 response [2] when I actually said this: So I don't think there will be a single "interesting" grouping pathkeys (i.e. root->group_pathkeys), but a collection of pathkeys. And we'll need to build grouping paths for all of those, and leave the planner to eventually pick the one giving us the cheapest plan. I wouldn't go as far as saying the approach in this patch (i.e. picking one particular ordering) is doomed, but it's going to be very hard to make it work reliably. Even if we get the costing *at this node* right, who knows how it'll affect costing of the nodes above us? So if I can suggest something, I'd merge the two patches, adopting the path-based approach. With the very basic functionality/costing in the first patch, and the more advanced stuff in additional patches. Does that make sense? regards [1] https://www.postgresql.org/message-id/20200901210743.lutgvnfzntvhuban%40development [2] https://www.postgresql.org/message-id/20200516145609.vm7nrqy7frj4ha6r%40development -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: