Re: POC: GROUP BY optimization - Mailing list pgsql-hackers

From Dmitry Dolgov
Subject Re: POC: GROUP BY optimization
Date
Msg-id 20201029145025.tn3tekboot37nkbf@localhost
Whole thread Raw
In response to Re: POC: GROUP BY optimization  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: POC: GROUP BY optimization
List pgsql-hackers
> On Tue, Oct 27, 2020 at 09:19:51PM +0100, Tomas Vondra wrote:
> On Mon, Oct 26, 2020 at 11:40:40AM +0100, Dmitry Dolgov wrote:
> > > On Mon, Oct 26, 2020 at 01:28:59PM +0400, Pavel Borisov wrote:
> > > > Thanks for your interest! FYI there is a new thread about this topic [1]
> > > > with the next version of the patch and more commentaries (I've created
> > > > it for visibility purposes, but probably it also created some confusion,
> > > > sorry for that).
> > > >
> > > > Thanks!
> > >
> > > I made a very quick look at your updates and noticed that it is intended to
> > > be simple and some parts of the code are removed as they have little test
> > > coverage. I'd propose vice versa to increase test coverage to enjoy more
> > > precise cost calculation and probably partial grouping.
> > >
> > > Or maybe it's worth to benchmark both patches and then re-decide what we
> > > want more to have a more complicated or a simpler version.
> > >
> > > Good to know that this feature is not stuck anymore and we have more than
> > > one proposal.
> > > Thanks!
> >
> > Just to clarify, the patch that I've posted in another thread mentioned
> > above is not an alternative proposal, but a development of the same
> > patch I had posted in this thread. As mentioned in [1], reduce of
> > functionality is an attempt to reduce the scope, and as soon as the base
> > functionality looks good enough it will be returned back.
> >
>
> I find it hard to follow two similar threads trying to do the same (or
> very similar) things in different ways. Is there any chance to join
> forces and produce a single patch series merging the changes? With the
> "basic" functionality at the beginning, then patches with the more
> complex stuff. That's the usual way, I think.
>
> As I said in my response on the other thread [1], I think constructing
> additional paths with alternative orderings of pathkeys is the right
> approach. Otherwise we can't really deal with optimizations above the
> place where we consider this optimization.
>
> That's essentially what I was trying in explain May 16 response [2]
> when I actually said this:
>
>    So I don't think there will be a single "interesting" grouping
>    pathkeys (i.e. root->group_pathkeys), but a collection of pathkeys.
>    And we'll need to build grouping paths for all of those, and leave
>    the planner to eventually pick the one giving us the cheapest plan.
>
> I wouldn't go as far as saying the approach in this patch (i.e. picking
> one particular ordering) is doomed, but it's going to be very hard to
> make it work reliably. Even if we get the costing *at this node* right,
> who knows how it'll affect costing of the nodes above us?
>
> So if I can suggest something, I'd merge the two patches, adopting the
> path-based approach. With the very basic functionality/costing in the
> first patch, and the more advanced stuff in additional patches.
>
> Does that make sense?

Yes, and from what I understand it's already what had happened in the
newer thread [1]. To avoid any confusion, there are no "two patches" at
least from my side, and what I've posted in [1] is the continuation of
this work, but with path-based approach adopted and a bit less
functionality (essentially I've dropped everything what was not covered
with tests in the original patch).

In case if I'm missing something and Pavel's proposal is significantly
different from the original patch (if I understand correctly, at the
moment the latest patch posted here is a rebase and adjusting the old
patch to work with the latest changes in master, right?), then indeed
they could be merged, but please in the newer thread [1].

[1]: https://www.postgresql.org/message-id/flat/CA%2Bq6zcW_4o2NC0zutLkOJPsFt80megSpX_dVRo6GK9PC-Jx_Ag%40mail.gmail.com



pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: libpq compression
Next
From: Konstantin Knizhnik
Date:
Subject: Re: libpq compression