Re: [HACKERS] WIP: Aggregation push-down - Mailing list pgsql-hackers

From Antonin Houska
Subject Re: [HACKERS] WIP: Aggregation push-down
Date
Msg-id 31585.1519833763@localhost
Whole thread Raw
In response to Re: [HACKERS] WIP: Aggregation push-down  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] WIP: Aggregation push-down
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> wrote:

> On Fri, Dec 22, 2017 at 10:43 AM, Antonin Houska <ah@cybertec.at> wrote:
> > Michael Paquier <michael.paquier@gmail.com> wrote:
> >> On Sat, Nov 4, 2017 at 12:33 AM, Antonin Houska <ah@cybertec.at> wrote:
> >> > I'm not about to add any other features now. Implementation of the missing
> >> > parts (see the TODO comments in the code) is the next step. But what I'd
> >> > appreciate most is a feedback on the design. Thanks.
> >>
> >> I am getting a conflict after applying patch 5 but this did not get
> >> any reviews so moved to next CF with waiting on author as status.
> >
> > Attached is the next version.

> translate_expression_to_rels() looks unsafe.  Equivalence members are
> known to be equal under the semantics of the relevant operator class,
> but that doesn't mean that one can be freely substituted for another.
> For example:
>
> rhaas=# create table one (a numeric);
> CREATE TABLE
> rhaas=# create table two (a numeric);
> CREATE TABLE
> rhaas=# insert into one values ('0');
> INSERT 0 1
> rhaas=# insert into two values ('0.00');
> INSERT 0 1
> rhaas=# select one.a, count(*) from one, two where one.a = two.a group by 1;
>  a | count
> ---+-------
>  0 |     1
> (1 row)
>
> rhaas=# select two.a, count(*) from one, two where one.a = two.a group by 1;
>   a   | count
> ------+-------
>  0.00 |     1
> (1 row)
>
> There are, admittedly, a large number of data types for which such a
> substitution would work just fine.  numeric will not, citext will not,
> but many others are fine. Regrettably, we have no framework in the
> system for identifying equality operators which actually test identity
> versus some looser notion of equality.  Cross-type operators are a
> problem, too; if we have foo.x = bar.y in the query, one might be int4
> and the other int8, for example, but they can still belong to the same
> equivalence class.
>
> Concretely, in your test query "SELECT p.i, avg(c1.v) FROM
> agg_pushdown_parent AS p JOIN agg_pushdown_child1 AS c1 ON c1.parent =
> p.i GROUP BY p.i" you assume that it's OK to do a Partial
> HashAggregate over c1.parent rather than p.i.  This will be false if,
> say, c1.parent is of type citext and p.i is of type text; this will
> get grouped together that shouldn't.  It will also be false if the
> grouping expression is something like GROUP BY length(p.i::text),
> because one value could be '0'::numeric and the other '0.00'::numeric.
> I can't think of a reason why it would be false if the grouping
> expressions are both simple Vars of the same underlying data type, but
> I'm a little nervous that I might be wrong even about that case.
> Maybe you've handled all of this somehow, but it's not obvious to me
> that it has been considered.

These problems are still being investigated, see [1]

> Another thing I noticed is that the GroupedPathInfo looks a bit like a
> stripped-down RelOptInfo, in that for example it has both a pathlist
> and a partial_pathlist. I'm inclined to think that we should build new
> RelOptInfos instead.  As you have it, there are an awful lot of things
> that seem to know about the grouped/ungrouped distinction, many of
> which are quite low-level functions like add_path(),
> add_paths_to_append_rel(), try_nestloop_path(), etc.  I think that
> some of this would go away if you had separate RelOptInfos instead of
> GroupedPathInfo.

I've reworked the patch so that separate RelOptInfo is used for grouped
relation. The attached patch is only the core part. Neither postgres_fdw
changes nor the part that tries to avoid the final aggregation is included
here. I'll add these when the patch does not seem to need another major rework.

[1] https://www.postgresql.org/message-id/CA+Tgmoa5Pp-DBJg=W8Xj8Czf-32PfxPgxwFPkA6qN2w_wPX8bg@mail.gmail.com

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com


Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: server crash in nodeAppend.c
Next
From: Anastasia Lubennikova
Date:
Subject: Reduce amount of WAL generated by CREATE INDEX for gist, gin andsp-gist