Re: [HACKERS] WIP: Aggregation push-down - Mailing list pgsql-hackers
From | Antonin Houska |
---|---|
Subject | Re: [HACKERS] WIP: Aggregation push-down |
Date | |
Msg-id | 13114.1517214722@localhost Whole thread Raw |
In response to | Re: [HACKERS] WIP: Aggregation push-down (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] WIP: Aggregation push-down
Re: [HACKERS] WIP: Aggregation push-down |
List | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jan 26, 2018 at 8:04 AM, Antonin Houska <ah@cybertec.at> wrote: > > So one problem is that the grouping expression can be inappropriate for > > partial aggregation even if there's no type change during the > > translation. What I consider typical for this case is that the equality > > operator used to identify groups (SortGroupClause.eqop) can return true even > > if binary (stored) values of the inputs differ. The partial aggregation could > > only take place if we had a special equality operator which distinguishes the > > *binary* values (I don't know yet how to store this operator the catalog --- > > in pg_opclass recors for the hash opclasses?).. > > We don't have an operator that tests for binary equality, but it's > certainly testable from C; see datumIsEqual. I'm not sure how much > this really helps, though. I think it would be safe to build an > initial set of groups based on datumIsEqual comparisons and then > arrange to later merge some of the groups. But that's not something > we ever do in the executor today, so it might require quite a lot of > hacking. Also, it seems like it might really suck in some cases. For > instance, consider something like SELECT scale(one.a), sum(two.a) FROM > one, two WHERE one.a = two.a GROUP BY 1. Doing a Partial Aggregate on > two.a using datumIsEqual semantics, joining to a, and then doing a > Finalize Aggregate looks legal, but the Partial Aggregate may produce > a tremendous number of groups compared to the Finalize Aggregate. In > other words, this technique wouldn't merge any groups that shouldn't > be merged, but it might fail to merge groups that really need to be > merged to get good performance. I don't insist on doing Partial Aggregate in any case. If we wanted to group by the binary value, we'd probably have to enhance statistics accordingly. The important thing is to recognize the special case like this. Rejection of the Partial Aggregate would be the default response. > > Another idea is to allow only such changes that the > > destination type is in the same operator class as the source, and explicitly > > enumerate the "safe opclasses". But that would mean that user cannot define > > new opclasses within which the translation is possible --- not sure this is a > > serious issue. > > Enumerating specific opclasses in the source code is a non-starter -- > Tom Lane would roll over in his grave if he weren't still alive. What > we could perhaps consider doing is adding some mechanism for an > opclass or opfamily to say whether its equality semantics happen to be > exactly datumIsEqual() semantics. That's a little grotty because it > leaves data types for which that isn't true out in the cold, but on > the other hand it seems like it would be useful as a way of optimizing > a bunch of things other than this. Maybe it could also include a way > to specify that the comparator happens to have the semantics as C's > built-in < and > operators, which seems like a case that might also > lend itself to some optimizations. I think of a variant of this: implement an universal function that tests the binary values for equality (besides the actual arguments, caller would have to pass info like typlen, typalign, typbyval for each argument, and cache these for repeated calls) and set pg_proc(oprcode) to 0 wherever this function is sufficient. Thus the problematic cases like numeric, citext, etc. would be detected by their non-zero oprcode. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: https://www.cybertec-postgresql.com
pgsql-hackers by date: