Thread: Is it possible for postgres_fdw to push down queries on co-located tables?
That is, if table `foo` and table `bar` are both tables on the same remote server, then when I do `select * from foo, bar`, can it delegate the whole query on the remote side, rather than fetching rows from both servers one by one and do merging on the local side? For example: ``` foo=> explain select * from foreign_test2, foreign_test where m = id; QUERY PLAN ------------------------------------------------------------------------------------- Merge Join (cost=444.06..590.63 rows=9316 width=72) Merge Cond: (foreign_test2.m = foreign_test.id) -> Sort (cost=222.03..225.44 rows=1365 width=36) Sort Key: foreign_test2.m -> Foreign Scan on foreign_test2 (cost=100.00..150.95 rows=1365 width=36) -> Sort (cost=222.03..225.44 rows=1365 width=36) Sort Key: foreign_test.id -> Foreign Scan on foreign_test (cost=100.00..150.95 rows=1365 width=36) ``` If the planning involving remote tables, could the planner push down the queries once to gain the best efficiency?
Jinhua Luo <luajit.io@gmail.com> writes: > That is, if table `foo` and table `bar` are both tables on the same > remote server, then when I do `select * from foo, bar`, can it > delegate the whole query on the remote side, rather than fetching rows > from both servers one by one and do merging on the local side? Reasonably recent releases can do that. What version are you testing? > foo=> explain select * from foreign_test2, foreign_test where m = id; > QUERY PLAN > ------------------------------------------------------------------------------------- > Merge Join (cost=444.06..590.63 rows=9316 width=72) > Merge Cond: (foreign_test2.m = foreign_test.id) > -> Sort (cost=222.03..225.44 rows=1365 width=36) > Sort Key: foreign_test2.m > -> Foreign Scan on foreign_test2 (cost=100.00..150.95 > rows=1365 width=36) > -> Sort (cost=222.03..225.44 rows=1365 width=36) > Sort Key: foreign_test.id > -> Foreign Scan on foreign_test (cost=100.00..150.95 > rows=1365 width=36) > ``` I don't find this particular example to be very compelling. Taking the amount of data pulled from the foreign server as the main cost factor, the plan as given requires pulling 1365*2 rows, whereas if it were to push down the join, it'd have to retrieve 9316 rows (or so the planner estimates, anyway). So it's quite possible that the planner just rejected the remote join as a net loss. If you think it isn't a net loss, you might want to twiddle the cost parameters for this foreign server. regards, tom lane
I was testing PG10. Sorry, the example is not so proper. I just think even if it's a simple example, e.g. join two co-located tables, the planner should work out to push down it. Can you confirm the postgresql could detect co-located tables on the same foreign server and push down queries on them? Could you give an actual example or point out the relevant source code paths for reference? (Let me clarify the context of this question, if the planner supports co-located push-down, then it's meaningful for manual sharding via partitioning to remote tables, where it's mostly necessary to join two or more co-located parent tables in complex queries. If not, the postgresql instance on which the parent tables are placed (let's say it's a coordinator node) would be likely the bottleneck.) Tom Lane <tgl@sss.pgh.pa.us> 于2018年9月18日周二 下午9:43写道: > > Jinhua Luo <luajit.io@gmail.com> writes: > > That is, if table `foo` and table `bar` are both tables on the same > > remote server, then when I do `select * from foo, bar`, can it > > delegate the whole query on the remote side, rather than fetching rows > > from both servers one by one and do merging on the local side? > > Reasonably recent releases can do that. What version are you testing? > > > foo=> explain select * from foreign_test2, foreign_test where m = id; > > QUERY PLAN > > ------------------------------------------------------------------------------------- > > Merge Join (cost=444.06..590.63 rows=9316 width=72) > > Merge Cond: (foreign_test2.m = foreign_test.id) > > -> Sort (cost=222.03..225.44 rows=1365 width=36) > > Sort Key: foreign_test2.m > > -> Foreign Scan on foreign_test2 (cost=100.00..150.95 > > rows=1365 width=36) > > -> Sort (cost=222.03..225.44 rows=1365 width=36) > > Sort Key: foreign_test.id > > -> Foreign Scan on foreign_test (cost=100.00..150.95 > > rows=1365 width=36) > > ``` > > I don't find this particular example to be very compelling. Taking > the amount of data pulled from the foreign server as the main cost > factor, the plan as given requires pulling 1365*2 rows, whereas if > it were to push down the join, it'd have to retrieve 9316 rows > (or so the planner estimates, anyway). So it's quite possible that > the planner just rejected the remote join as a net loss. If you > think it isn't a net loss, you might want to twiddle the cost > parameters for this foreign server. > > regards, tom lane
Re: Is it possible for postgres_fdw to push down queries onco-located tables?
From
Dilip Kumar
Date:
On Tue, Sep 18, 2018 at 7:50 PM, Jinhua Luo <luajit.io@gmail.com> wrote: > I was testing PG10. > > Sorry, the example is not so proper. I just think even if it's a > simple example, e.g. join two co-located tables, the planner should > work out to push down it. Can you confirm the postgresql could detect > co-located tables on the same foreign server and push down queries on > them? Could you give an actual example or point out the relevant > source code paths for reference? > You can check "postgresGetForeignJoinPaths" function or you can refer commit "e4106b2528727c4b48639c0e12bf2f70a766b910". -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is it possible for postgres_fdw to push down queries on co-locatedtables?
From
Etsuro Fujita
Date:
(2018/09/18 23:20), Jinhua Luo wrote: > Sorry, the example is not so proper. I just think even if it's a > simple example, e.g. join two co-located tables, the planner should > work out to push down it. Can you confirm the postgresql could detect > co-located tables on the same foreign server and push down queries on > them? Could you give an actual example or point out the relevant > source code paths for reference? > > (Let me clarify the context of this question, if the planner supports > co-located push-down, then it's meaningful for manual sharding via > partitioning to remote tables, where it's mostly necessary to join two > or more co-located parent tables in complex queries. If not, the > postgresql instance on which the parent tables are placed (let's say > it's a coordinator node) would be likely the bottleneck.) You might want to check partitionwise join functionality as well, which we have in PG11 [1]. Best regards, Etsuro Fujita [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f49842d1ee31b976c681322f76025d7732e860f3