Re: Aggregate Push Down - Performing aggregation on foreign server - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Aggregate Push Down - Performing aggregation on foreign server
Date
Msg-id 27818.1477150767@sss.pgh.pa.us
Whole thread Raw
In response to Re: Aggregate Push Down - Performing aggregation on foreign server  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Aggregate Push Down - Performing aggregation on foreign server  (Jeevan Chalke <jeevan.chalke@enterprisedb.com>)
List pgsql-hackers
brolga is still not terribly happy with this patch: it's choosing not to
push down the aggregates in one of the queries.  While I failed to
duplicate that result locally, investigation suggests that brolga's result
is perfectly sane; in fact it's not very clear why we're not getting that
from multiple critters, because the plan brolga is choosing is not
inferior to the expected one.

The core of the problem is this subquery:

contrib_regression=# explain verbose select min(13), avg(ft1.c1), sum(ft2.c1) from ft1 right join ft2 on (ft1.c1 =
ft2.c1)where ft1.c1 = 12;                                                                      QUERY PLAN
                                                         

---------------------------------------------------------------------------------------------------------------------------------------------------------Foreign
Scan (cost=108.61..108.64 rows=1 width=44)  Output: (min(13)), (avg(ft1.c1)), (sum(ft2.c1))  Relations: Aggregate on
((public.ft1)INNER JOIN (public.ft2))  Remote SQL: SELECT min(13), avg(r1."C 1"), sum(r2."C 1") FROM ("S 1"."T 1" r1
INNERJOIN "S 1"."T 1" r2 ON (((r2."C 1" = 12)) AND ((r1."C 1" = 12)))) 
(4 rows)

If you look at the estimate to just fetch the data, it's:

contrib_regression=# explain verbose select ft1.c1, ft2.c1 from ft1 right join ft2 on (ft1.c1 = ft2.c1) where ft1.c1 =
12;                                                             QUERY PLAN
                

--------------------------------------------------------------------------------------------------------------------------------------Foreign
Scan (cost=100.55..108.62 rows=1 width=8)  Output: ft1.c1, ft2.c1  Relations: (public.ft1) INNER JOIN (public.ft2)
RemoteSQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r2."C 1" = 12)) AND ((r1."C
1"= 12)))) 
(4 rows)

Note we're expecting only one row out of the join.  Now the cost of doing
three aggregates on a single row of input is not a lot.  Comparing these
local queries:

regression=# explain select min(13),avg(q1),sum(q2) from int8_tbl where q2=456;                         QUERY PLAN
                     
---------------------------------------------------------------Aggregate  (cost=1.07..1.08 rows=1 width=68)  ->  Seq
Scanon int8_tbl  (cost=0.00..1.06 rows=1 width=16)        Filter: (q2 = 456) 
(3 rows)

regression=# explain select (q1),(q2) from int8_tbl where q2=456;                      QUERY PLAN
 
---------------------------------------------------------Seq Scan on int8_tbl  (cost=0.00..1.06 rows=1 width=16)
Filter:(q2 = 456) 
(2 rows)

we seem to have startup = input cost + .01 and then another .01
for total.  So the estimate to do the above remote scan and then
aggregate locally should have been 108.63 startup and 108.64 total,
give or take.  The estimate for aggregating remotely is a hair better,
but it's not nearly better enough to guarantee that the planner won't
see it as fuzzily the same cost.

In short: the problem with this test case is that it's considering
aggregation over only a single row, which is a situation in which
pushing down the aggregate actually doesn't save us anything, because
we're retrieving one row from the remote either way.  So it's not at all
surprising that we don't get a stable plan choice.  The test query needs
to be adjusted so that the aggregation is done over multiple rows,
allowing fdw_tuple_cost to kick in and provide some daylight between
the cost estimates.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: issue with track_commit_timestamp and server restart
Next
From: Bruce Momjian
Date:
Subject: Reload config instructions