Re: [HACKERS] WIP: Aggregation push-down - Mailing list pgsql-hackers
From | Antonin Houska |
---|---|
Subject | Re: [HACKERS] WIP: Aggregation push-down |
Date | |
Msg-id | 31585.1519833763@localhost Whole thread Raw |
In response to | Re: [HACKERS] WIP: Aggregation push-down (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] WIP: Aggregation push-down
|
List | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Dec 22, 2017 at 10:43 AM, Antonin Houska <ah@cybertec.at> wrote: > > Michael Paquier <michael.paquier@gmail.com> wrote: > >> On Sat, Nov 4, 2017 at 12:33 AM, Antonin Houska <ah@cybertec.at> wrote: > >> > I'm not about to add any other features now. Implementation of the missing > >> > parts (see the TODO comments in the code) is the next step. But what I'd > >> > appreciate most is a feedback on the design. Thanks. > >> > >> I am getting a conflict after applying patch 5 but this did not get > >> any reviews so moved to next CF with waiting on author as status. > > > > Attached is the next version. > translate_expression_to_rels() looks unsafe. Equivalence members are > known to be equal under the semantics of the relevant operator class, > but that doesn't mean that one can be freely substituted for another. > For example: > > rhaas=# create table one (a numeric); > CREATE TABLE > rhaas=# create table two (a numeric); > CREATE TABLE > rhaas=# insert into one values ('0'); > INSERT 0 1 > rhaas=# insert into two values ('0.00'); > INSERT 0 1 > rhaas=# select one.a, count(*) from one, two where one.a = two.a group by 1; > a | count > ---+------- > 0 | 1 > (1 row) > > rhaas=# select two.a, count(*) from one, two where one.a = two.a group by 1; > a | count > ------+------- > 0.00 | 1 > (1 row) > > There are, admittedly, a large number of data types for which such a > substitution would work just fine. numeric will not, citext will not, > but many others are fine. Regrettably, we have no framework in the > system for identifying equality operators which actually test identity > versus some looser notion of equality. Cross-type operators are a > problem, too; if we have foo.x = bar.y in the query, one might be int4 > and the other int8, for example, but they can still belong to the same > equivalence class. > > Concretely, in your test query "SELECT p.i, avg(c1.v) FROM > agg_pushdown_parent AS p JOIN agg_pushdown_child1 AS c1 ON c1.parent = > p.i GROUP BY p.i" you assume that it's OK to do a Partial > HashAggregate over c1.parent rather than p.i. This will be false if, > say, c1.parent is of type citext and p.i is of type text; this will > get grouped together that shouldn't. It will also be false if the > grouping expression is something like GROUP BY length(p.i::text), > because one value could be '0'::numeric and the other '0.00'::numeric. > I can't think of a reason why it would be false if the grouping > expressions are both simple Vars of the same underlying data type, but > I'm a little nervous that I might be wrong even about that case. > Maybe you've handled all of this somehow, but it's not obvious to me > that it has been considered. These problems are still being investigated, see [1] > Another thing I noticed is that the GroupedPathInfo looks a bit like a > stripped-down RelOptInfo, in that for example it has both a pathlist > and a partial_pathlist. I'm inclined to think that we should build new > RelOptInfos instead. As you have it, there are an awful lot of things > that seem to know about the grouped/ungrouped distinction, many of > which are quite low-level functions like add_path(), > add_paths_to_append_rel(), try_nestloop_path(), etc. I think that > some of this would go away if you had separate RelOptInfos instead of > GroupedPathInfo. I've reworked the patch so that separate RelOptInfo is used for grouped relation. The attached patch is only the core part. Neither postgres_fdw changes nor the part that tries to avoid the final aggregation is included here. I'll add these when the patch does not seem to need another major rework. [1] https://www.postgresql.org/message-id/CA+Tgmoa5Pp-DBJg=W8Xj8Czf-32PfxPgxwFPkA6qN2w_wPX8bg@mail.gmail.com -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: https://www.cybertec-postgresql.com
Attachment
pgsql-hackers by date: