Home > mailing lists

Re: Push down Aggregates below joins - Mailing list pgsql-hackers

From	Antonin Houska
Subject	Re: Push down Aggregates below joins
Date	August 3, 2018 17:50:11
Msg-id	18751.1533307811@localhost Whole thread Raw
In response to	Re: Push down Aggregates below joins (Antonin Houska <ah@cybertec.at>)
Responses	Re: Push down Aggregates below joins
List	pgsql-hackers

Tree view

Antonin Houska <ah@cybertec.at> wrote:

> I didn't have enough time to separate "your functionality" and can do it when
> I'm back from vacation.

So I've separated the code that does not use the 2-stage replication (and
therefore the feature is not involved in parallel queries).

Note on coding: so far most of the functions to which I added the "grouped"
argument can get the same information from rel->agg_info (it's set iff the
relation is grouped). So we can remove it for this "simple aggregation", but
the next (more generic) part of the patch will have to add some argument
anyway, to indicate whether AGGSPLIT_SIMPLE, AGGSPLIT_INITIAL_SERIAL (or no
aggregation) should be performed.

Besides splitting the code I worked on the determination whether join
duplicates grouping keys or not. Currently it still fails to combine
"uniquekeys" of joined relation in some (important) cases when the information
needed is actually available. I think ECs should be used here, like it is for
pathkeys.

So currently you should comment out this code

    if (!match_uniquekeys_to_group_pathkeys(root, result, target))
        *keys_ok = false;

in pathnode.c:make_uniquekeys_for_join() if you want the patch to at leat
produce interesting EXPLAIN output.

One example:

CREATE TABLE a(i int primary key);
CREATE TABLE b(j int primary key, k int);

SET enable_agg_pushdown TO true;

EXPLAIN
SELECT          j, sum(k)
FROM            a, b
WHERE           i = j
GROUP BY        j

                              QUERY PLAN
-----------------------------------------------------------------------
 Hash Join  (cost=94.75..162.43 rows=2260 width=12)
   Hash Cond: (a.i = b.j)
   ->  Seq Scan on a  (cost=0.00..35.50 rows=2550 width=4)
   ->  Hash  (cost=66.50..66.50 rows=2260 width=12)
         ->  HashAggregate  (cost=43.90..66.50 rows=2260 width=12)
               Group Key: b.j
               ->  Seq Scan on b  (cost=0.00..32.60 rows=2260 width=8)

However there are cases like this

EXPLAIN
SELECT          i, sum(k)
FROM            a, b
WHERE           i = j
GROUP BY        i

which currently does not work. The reason is that the column b.j which is not
in the GROUP BY clause needs to be in the grouped output of the "b" (grouped)
table output, otherwise the join condition cannot be evaluated.

While separating the code that only uses 1-stage aggregation I removed the
code that adds such extra grouping keys to per-relation AggPath because in
general this is only safe if the final aggregation is performed, and the final
aggregation uses no added columns. However I forgot that grouping keys can be
added in cases like shown above, i.e. the grouping expression b.j is derived
from GROUP BY using equivalence class.

I'll fix this (and various other problems) asap. I believe it's worth to at
least show the current code. I'm curious if it's something we can build on or
if another rework will be needed.

(I'll be off next week.)

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com

Attachment

agg_pushdown_simple.patch

pgsql-hackers by date:

From: Alvaro Herrera
Date: 03 August 2018, 17:38:15
Subject: Re: Explain buffers wrong counter with parallel plans

From: Tom Lane
Date: 03 August 2018, 18:21:38
Subject: Re: Fallout from PQhost() semantics changes

Re: Push down Aggregates below joins - Mailing list pgsql-hackers

Attachment

Previous

Next