Home > mailing lists

Re: POC: GROUP BY optimization - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: POC: GROUP BY optimization
Date	October 27, 2020 20:19:51
Msg-id	20201027201509.jfv6sfhrmxhdwlvk@development Whole thread
In response to	Re: POC: GROUP BY optimization (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses	Re: POC: GROUP BY optimization
List	pgsql-hackers

Tree view

On Mon, Oct 26, 2020 at 11:40:40AM +0100, Dmitry Dolgov wrote:
>> On Mon, Oct 26, 2020 at 01:28:59PM +0400, Pavel Borisov wrote:
>> > Thanks for your interest! FYI there is a new thread about this topic [1]
>> > with the next version of the patch and more commentaries (I've created
>> > it for visibility purposes, but probably it also created some confusion,
>> > sorry for that).
>> >
>> > Thanks!
>>
>> I made a very quick look at your updates and noticed that it is intended to
>> be simple and some parts of the code are removed as they have little test
>> coverage. I'd propose vice versa to increase test coverage to enjoy more
>> precise cost calculation and probably partial grouping.
>>
>> Or maybe it's worth to benchmark both patches and then re-decide what we
>> want more to have a more complicated or a simpler version.
>>
>> Good to know that this feature is not stuck anymore and we have more than
>> one proposal.
>> Thanks!
>
>Just to clarify, the patch that I've posted in another thread mentioned
>above is not an alternative proposal, but a development of the same
>patch I had posted in this thread. As mentioned in [1], reduce of
>functionality is an attempt to reduce the scope, and as soon as the base
>functionality looks good enough it will be returned back.
>

I find it hard to follow two similar threads trying to do the same (or
very similar) things in different ways. Is there any chance to join
forces and produce a single patch series merging the changes? With the
"basic" functionality at the beginning, then patches with the more
complex stuff. That's the usual way, I think.

As I said in my response on the other thread [1], I think constructing
additional paths with alternative orderings of pathkeys is the right
approach. Otherwise we can't really deal with optimizations above the
place where we consider this optimization.

That's essentially what I was trying in explain May 16 response [2]
when I actually said this:

    So I don't think there will be a single "interesting" grouping
    pathkeys (i.e. root->group_pathkeys), but a collection of pathkeys.
    And we'll need to build grouping paths for all of those, and leave
    the planner to eventually pick the one giving us the cheapest plan.

I wouldn't go as far as saying the approach in this patch (i.e. picking
one particular ordering) is doomed, but it's going to be very hard to
make it work reliably. Even if we get the costing *at this node* right,
who knows how it'll affect costing of the nodes above us?

So if I can suggest something, I'd merge the two patches, adopting the
path-based approach. With the very basic functionality/costing in the
first patch, and the more advanced stuff in additional patches.

Does that make sense?

regards

[1] https://www.postgresql.org/message-id/20200901210743.lutgvnfzntvhuban%40development

[2] https://www.postgresql.org/message-id/20200516145609.vm7nrqy7frj4ha6r%40development

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: John Naylor
Date: 27 October 2020, 20:19:47
Subject: Re: cutting down the TODO list thread

From: John Naylor
Date: 27 October 2020, 20:54:24
Subject: Re: cutting down the TODO list thread

Re: POC: GROUP BY optimization - Mailing list pgsql-hackers

Previous

Next