Thread: Function scan FDW pushdown
Hi. The attached patch allows pushing joins with function RTEs to PostgreSQL data sources. This makes executing queries like this create foreign table f_pgbench_accounts (aid int, bid int, abalance int, filler char(84)) SERVER local_srv OPTIONS (table_name 'pgbench_accounts'); select * from f_pgbench_accounts join unnest(array[1,2,3]) ON unnest = aid; more efficient. with patch: # explain analyze select * from f_pgbench_accounts join unnest(array[1,2,3,4,5,6]) ON unnest = aid; QUERY PLAN ------------------------------------------------------------------------------------------------ Foreign Scan (cost=100.00..116.95 rows=7 width=356) (actual time=2.282..2.287 rows=6 loops=1) Relations: (f_pgbench_accounts) INNER JOIN (FUNCTION RTE unnest) Planning Time: 0.487 ms Execution Time: 3.336 ms without patch: # explain analyze select * from f_pgbench_accounts join unnest(array[1,2,3,4,5,6]) ON unnest = aid; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=100.14..158.76 rows=7 width=356) (actual time=2.263..1268.607 rows=6 loops=1) Hash Cond: (f_pgbench_accounts.aid = unnest.unnest) -> Foreign Scan on f_pgbench_accounts (cost=100.00..157.74 rows=217 width=352) (actual time=2.190..1205.938 rows=100000 loops=1) -> Hash (cost=0.06..0.06 rows=6 width=4) (actual time=0.041..0.043 rows=6 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Function Scan on unnest (cost=0.00..0.06 rows=6 width=4) (actual time=0.025..0.028 rows=6 loops=1) Planning Time: 0.389 ms Execution Time: 1269.627 ms So far I don't know how to visualize actual function expression used in function RTE, as in postgresExplainForeignScan() es->rtable comes from queryDesc->plannedstmt->rtable, and rte->functions is already 0. -- Best regards, Alexander Pyhalov, Postgres Professional
Attachment
Hi Alexander, On Thu, May 20, 2021 at 11:13 PM Alexander Pyhalov <a.pyhalov@postgrespro.ru> wrote: > > Hi. > > The attached patch allows pushing joins with function RTEs to PostgreSQL > data sources. > This makes executing queries like this > > create foreign table f_pgbench_accounts (aid int, bid int, abalance int, > filler char(84)) SERVER local_srv OPTIONS (table_name > 'pgbench_accounts'); > select * from f_pgbench_accounts join unnest(array[1,2,3]) ON unnest = > aid; > It will be good to provide some practical examples where this is useful. > more efficient. > > with patch: > > > So far I don't know how to visualize actual function expression used in > function RTE, as in postgresExplainForeignScan() es->rtable comes from > queryDesc->plannedstmt->rtable, and rte->functions is already 0. The actual function expression will be part of the Remote SQL of ForeignScan node so no need to visualize it separately. The patch will have problems when there are multiple foreign tables all on different servers or use different FDWs. In such a case the function scan's RelOptInfo will get the fpinfo based on the first foreign table the function scan is paired with during join planning. But that may not be the best foreign table to join. We should be able to plan all the possible joins. Current infra to add one fpinfo per RelOptInfo won't help there. We need something better. The patch targets only postgres FDW, how do you see this working with other FDWs? If we come up with the right approach we could use it for 1. pushing down queries with IN () clause 2. joining a small local table with a large foreign table by sending the local table rows down to the foreign server. -- Best Wishes, Ashutosh Bapat
Ashutosh Bapat писал 2021-06-15 16:15: > Hi Alexander, Hi. The current version of the patch is based on asymetric partition-wise join. Currently it is applied after v19-0001-Asymmetric-partitionwise-join.patch from on https://www.postgresql.org/message-id/792d60f4-37bc-e6ad-68ca-c2af5cbb2d9b@postgrespro.ru . >> So far I don't know how to visualize actual function expression used >> in >> function RTE, as in postgresExplainForeignScan() es->rtable comes from >> queryDesc->plannedstmt->rtable, and rte->functions is already 0. > > The actual function expression will be part of the Remote SQL of > ForeignScan node so no need to visualize it separately. We still need to create tuple description for functions in get_tupdesc_for_join_scan_tuples(), so I had to remove setting newrte->functions to NIL in add_rte_to_flat_rtable(). With rte->functions in place, there's no issues for explain. > > The patch will have problems when there are multiple foreign tables > all on different servers or use different FDWs. In such a case the > function scan's RelOptInfo will get the fpinfo based on the first > foreign table the function scan is paired with during join planning. > But that may not be the best foreign table to join. We should be able > to plan all the possible joins. Current infra to add one fpinfo per > RelOptInfo won't help there. We need something better. I suppose attached version of the patch is more mature. > > The patch targets only postgres FDW, how do you see this working with > other FDWs? Not now. We introduce necessary APIs for other FDWs, but implementing TryShippableJoinPaths() doesn't seem straightforward. > > If we come up with the right approach we could use it for 1. pushing > down queries with IN () clause 2. joining a small local table with a > large foreign table by sending the local table rows down to the > foreign server. -- Best regards, Alexander Pyhalov, Postgres Professional