Enforcing Parameterised Nested Loop Join Order for Foreign Table Joins - Mailing list pgsql-general
From | Adam Zegelin |
---|---|
Subject | Enforcing Parameterised Nested Loop Join Order for Foreign Table Joins |
Date | |
Msg-id | 0DF4E77A-8276-44DC-84F9-0B588733484D@relational.io Whole thread Raw |
Responses |
Re: Enforcing Parameterised Nested Loop Join Order for Foreign Table Joins
|
List | pgsql-general |
Hello, I’m in the process of writing a Postgres FDW that can interface with web service endpoints. Certain FDW columns would actas web service parameters, while others would be the output. For example: adam=# select * from bing where query = 'xbox'; query | url | description -------+---------------------------------+--------------------- xbox | http://www.xbox.com/en-AU/index | Xbox Australia is ⋮ | ⋮ | ⋮ For the simple cases, extracting the quals (such as var [query] = const “xbox”) works perfectly. I’d like to join FDW tables with other tables, possibly local or foreign. ex: `select * from search_terms, bing where bing.query = search_terms.term`, where `search_terms` is a local table. Without any parameterised paths, Postgres, as expected, will attempt to perform unfiltered foreign sequence scans followedby a hash join of the two tables. Some service endpoints have no concept of unqualified queries. In the example above,a ‘sequence scan’ of Bing is a not possible. I generate parameterised paths inside the FDW handler function `GetForeignPaths`. I call `create_foreignscan_path` with aset of req_outer relids found by scanning PlannerInfo’s eq_classes, left_join_clauses and right_join_clauses. Bitmapset* outer_relids = NULL; foreach(lc, root->eq_classes) { EquivalenceClass* ec = (EquivalenceClass *) lfirst(lc); if (ec->ec_members->length > 1) outer_relids = bms_union(outer_relids, ec->ec_relids); } foreach(lc, list_union(root->left_join_clauses, root->right_join_clauses)) { RestrictInfo *ri = (RestrictInfo *) lfirst(lc); outer_relids = bms_union(outer_relids, ri->outer_relids); } Bitmapset* req_outer = bms_difference(outer_relids, bms_make_singleton(baserel->relid)); foreignPath = create_foreignscan_path(root, baserel, nbrows, 0, 0, NIL, req_outer, NULL); For simple joins this works. `BeginForeignScan` can access a list of quals with params, and the planner generates an appropriatenested loop join over the foreign table. adam=# explain select * from search_terms, bing where bing.query = search_terms.term; QUERY PLAN ----------------------------------------------------------------------- Nested Loop (cost=0.00..49.30 rows=6550 width=96) -> Seq Scan on search_terms (cost=0.00..23.10 rows=1310 width=32) -> Foreign Scan on bing (cost=0.00..0.00 rows=2 width=64) Filter: (search_terms.term = query) Even a query over two foreign tables works correctly: adam=# create foreign table ft1 (inp text, out text) server test; adam=# create foreign table ft2 (inp text, out text) server test; adam=# explain select * from ft1, ft2 where ft1.inp = ‘hello’ and ft2.inp = ft1.out; QUERY PLAN ---------------------------------------------------------------------- Nested Loop (cost=0.00..500020.00 rows=5000 width=128) -> Foreign Scan on ft1 (cost=0.00..500000.00 rows=1000 width=64) Filter: (inp = 'hello'::text) -> Foreign Scan on ft2 (cost=0.00..0.00 rows=2 width=64) Filter: (ft1."out" = inp) But, on a more complex query consisting of multiple foreign tables the planner generates something a little less desirable: adam=# create foreign table ft3 (inp text, out text) server test; adam=# explain select * from ft1, ft2, ft3 where ft1.inp = 'hello' and ft2.inp = ft1.out and ft3.inp = ft2.out; QUERY PLAN ---------------------------------------------------------------------------------- Nested Loop (cost=500012.50..1000290.00 rows=25000 width=192) -> Hash Join (cost=500012.50..1000190.00 rows=5000 width=128) Hash Cond: (ft1."out" = ft2.inp) -> Foreign Scan on ft1 (cost=0.00..500000.00 rows=1000 width=64) Filter: (inp = 'hello'::text) -> Hash (cost=500000.00..500000.00 rows=1000 width=64) -> Foreign Scan on ft2 (cost=0.00..500000.00 rows=1000 width=64) -> Foreign Scan on ft3 (cost=0.00..0.00 rows=2 width=64) Filter: (ft2."out" = inp) The high total costs are the result of my attempts to coerce the planner to select the parameterised paths and generate filteredforeign scans rather than preferring unfiltered foreign scans. I’ve tried adjusting the query planner tuneables (enable_hashjoin, et al) and the path costs with some degree of success,but often the generated plans will filter the tables in the wrong order -- the output column of table 1 will be filteredby the input column of table 2 -- which is technically correct as the operation should be associative and transitive,but in this case, table 2 must be filtered by the output of table 1, not vice-versa. Is there a way to convince Postgres to always generate nested loop joins without sub merges, hashes or materialisations andin the correct order? I’m thinking that the current method I’m using to generate the foreign scan path doesn’t take into account the required orderand which fields can and cannot be parameterised. I’m not sure how to proceed with this. Thanks, Adam
Attachment
pgsql-general by date: