Re: [HACKERS] PoC: Grouped base relation - Mailing list pgsql-hackers
From | Antonin Houska |
---|---|
Subject | Re: [HACKERS] PoC: Grouped base relation |
Date | |
Msg-id | 4410.1484671030@localhost Whole thread Raw |
In response to | Re: [HACKERS] PoC: Grouped base relation (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Responses |
Re: [HACKERS] PoC: Grouped base relation
|
List | pgsql-hackers |
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > [... snip ]] > > This all works well, as long as the aggregate is "summing" something > across rows. The method doesn't work when aggregation is say > "multiplying" across the rows or "concatenating" across the rows like > array_agg() or string_agg(). They need a different strategy to combine > aggregates across relations. Good point. The common characteristic of these seems to be that thay don't have aggcombinefn defined. > IIUC, we are trying to solve multiple problems here: > 1. Pushing down aggregates/groups down join tree, so that the number of rows > to be joined decreases. This might be a good optimization to have. However > there are problems in the current patch. Every path built for a relation > (join or base) returns the same result expressed by the relation or its > subset restricted by parameterization or unification. But this patch changes > that. It creates paths which represent grouping in the base relation. I > think, we need a separate relation to represent that result and hold paths > which produce that result. That itself would be a sizable patch. Whether a separate relation (RelOptInfo) should be created for grouped relation is an important design decision indeed. More important than your argument about the same result ("partial path", used to implement parallel nodes actually does not fit this criterion perfectly - it only returns part of the set) is the fact that the data type (target) differs. I even spent some time coding a prototype where separate RelOptInfo is created for the grouped relation but it was much more invasive. In particular, if only some relations are grouped, it's hard to join them with non-grouped ones w/o changing make_rel_from_joinlist and subroutines substantially. (Decision whether the plain or the grouped relation should be involved in joining makes little sense at the leaf level of the join tree.) So I took the approach that resembles the partial paths - separate pathlists within the same RelOptInfo. > 2. Try to push down aggregates based on the equivalence classes, where > grouping properties can be transferred from one relation to the other using > EC mechanism. I don't think the EC part should increase the patch complexity a lot. Unless I missed something, it's rather isolated to the part where target of the grouped paths is assembled. And I think it's important even for initial version of the patch. > This seems to require solving the problem of combining aggregates across the > relations. But there might be some usecases which could benefit without > solving this problem. If "combining aggregates ..." refers to joining grouped relations, then I insist on doing this in the initial version of the new feature too. Otherwise it'd only work if exactly one base relation of the query is grouped. > 3. If the relation to which we push the aggregate is an append relation, > push (partial) aggregation/grouping down into the child relations. - We > don't do that right now even for grouping aggregation on a single append > table. Parallel partial aggregation does that, but not exactly per > relation. That may be a sizable project in itself. Even without this piece > the rest of the optimizations proposed by this patch are important. Yes, this can be done in a separate patch. I'll consider it. > 4. Additional goal: push down the aggregation to any relation (join/base) > where it can be computed. I think this can be achieved by adding extra aggregation nodes to the join tree. As I still anticipate more important design changes, this part is not at the top of my TODO list. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at
pgsql-hackers by date: