Re: Push down Aggregates below joins - Mailing list pgsql-hackers
From | Antonin Houska |
---|---|
Subject | Re: Push down Aggregates below joins |
Date | |
Msg-id | 18751.1533307811@localhost Whole thread Raw |
In response to | Re: Push down Aggregates below joins (Antonin Houska <ah@cybertec.at>) |
Responses |
Re: Push down Aggregates below joins
|
List | pgsql-hackers |
Antonin Houska <ah@cybertec.at> wrote: > I didn't have enough time to separate "your functionality" and can do it when > I'm back from vacation. So I've separated the code that does not use the 2-stage replication (and therefore the feature is not involved in parallel queries). Note on coding: so far most of the functions to which I added the "grouped" argument can get the same information from rel->agg_info (it's set iff the relation is grouped). So we can remove it for this "simple aggregation", but the next (more generic) part of the patch will have to add some argument anyway, to indicate whether AGGSPLIT_SIMPLE, AGGSPLIT_INITIAL_SERIAL (or no aggregation) should be performed. Besides splitting the code I worked on the determination whether join duplicates grouping keys or not. Currently it still fails to combine "uniquekeys" of joined relation in some (important) cases when the information needed is actually available. I think ECs should be used here, like it is for pathkeys. So currently you should comment out this code if (!match_uniquekeys_to_group_pathkeys(root, result, target)) *keys_ok = false; in pathnode.c:make_uniquekeys_for_join() if you want the patch to at leat produce interesting EXPLAIN output. One example: CREATE TABLE a(i int primary key); CREATE TABLE b(j int primary key, k int); SET enable_agg_pushdown TO true; EXPLAIN SELECT j, sum(k) FROM a, b WHERE i = j GROUP BY j QUERY PLAN ----------------------------------------------------------------------- Hash Join (cost=94.75..162.43 rows=2260 width=12) Hash Cond: (a.i = b.j) -> Seq Scan on a (cost=0.00..35.50 rows=2550 width=4) -> Hash (cost=66.50..66.50 rows=2260 width=12) -> HashAggregate (cost=43.90..66.50 rows=2260 width=12) Group Key: b.j -> Seq Scan on b (cost=0.00..32.60 rows=2260 width=8) However there are cases like this EXPLAIN SELECT i, sum(k) FROM a, b WHERE i = j GROUP BY i which currently does not work. The reason is that the column b.j which is not in the GROUP BY clause needs to be in the grouped output of the "b" (grouped) table output, otherwise the join condition cannot be evaluated. While separating the code that only uses 1-stage aggregation I removed the code that adds such extra grouping keys to per-relation AggPath because in general this is only safe if the final aggregation is performed, and the final aggregation uses no added columns. However I forgot that grouping keys can be added in cases like shown above, i.e. the grouping expression b.j is derived from GROUP BY using equivalence class. I'll fix this (and various other problems) asap. I believe it's worth to at least show the current code. I'm curious if it's something we can build on or if another rework will be needed. (I'll be off next week.) -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: https://www.cybertec-postgresql.com
Attachment
pgsql-hackers by date: