Re: [HACKERS] PoC: Grouped base relation - Mailing list pgsql-hackers
From | Antonin Houska |
---|---|
Subject | Re: [HACKERS] PoC: Grouped base relation |
Date | |
Msg-id | 24966.1484830702@localhost Whole thread Raw |
In response to | Re: [HACKERS] PoC: Grouped base relation (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Responses |
Re: [HACKERS] PoC: Grouped base relation
|
List | pgsql-hackers |
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > >> 1. Pushing down aggregates/groups down join tree, so that the number of rows > >> to be joined decreases. This might be a good optimization to have. However > >> there are problems in the current patch. Every path built for a relation > >> (join or base) returns the same result expressed by the relation or its > >> subset restricted by parameterization or unification. But this patch changes > >> that. It creates paths which represent grouping in the base relation. I > >> think, we need a separate relation to represent that result and hold paths > >> which produce that result. That itself would be a sizable patch. > > > > Whether a separate relation (RelOptInfo) should be created for grouped > > relation is an important design decision indeed. More important than your > > argument about the same result ("partial path", used to implement parallel > > nodes actually does not fit this criterion perfectly - it only returns part of > > the set) is the fact that the data type (target) differs. > > I even spent some time coding a prototype where separate RelOptInfo is created > > for the grouped relation but it was much more invasive. In particular, if only > > some relations are grouped, it's hard to join them with non-grouped ones w/o > > changing make_rel_from_joinlist and subroutines substantially. (Decision > > whether the plain or the grouped relation should be involved in joining makes > > little sense at the leaf level of the join tree.) > > > > So I took the approach that resembles the partial paths - separate pathlists > > within the same RelOptInfo. > Yes, it's hard, but I think without having a separate RelOptInfo the > design won't be complete. Is there a subset of problem that can be > solved by using a separate RelOptInfo e.g. pushing aggregates down > child relations or anything else. I'm still not convinced that all the fields of RelOptInfo (typically relids) need to be duplicated. If the current concept should be improved, I'd move all the grouping related fields to a separate structure, e.g. GroupPathInfo, and let RelOptInfo point to it. Similar to ParamPathInfo, which contains parameterization-specific information, GroupPathInfo would conain the grouping-specific information: target, row count, width, maybe path lists too. > > > >> 2. Try to push down aggregates based on the equivalence classes, where > >> grouping properties can be transferred from one relation to the other using > >> EC mechanism. > > > > I don't think the EC part should increase the patch complexity a lot. Unless I > > missed something, it's rather isolated to the part where target of the grouped > > paths is assembled. And I think it's important even for initial version of the > > patch. > > > >> This seems to require solving the problem of combining aggregates across the > >> relations. But there might be some usecases which could benefit without > >> solving this problem. > > > > If "combining aggregates ..." refers to joining grouped relations, then I > > insist on doing this in the initial version of the new feature too. Otherwise > > it'd only work if exactly one base relation of the query is grouped. > > No. "combining aggregates" refers to what aggtransmultifn does. But, > possibly that problem needs to be solved in the first step itself. ok. As the discussion goes on, I see that this part could be more useful than I originally thought. I'll consider it. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at
pgsql-hackers by date: