Enforcing Parameterised Nested Loop Join Order for Foreign Table Joins - Mailing list pgsql-general

From Adam Zegelin
Subject Enforcing Parameterised Nested Loop Join Order for Foreign Table Joins
Date
Msg-id 0DF4E77A-8276-44DC-84F9-0B588733484D@relational.io
Whole thread Raw
Responses Re: Enforcing Parameterised Nested Loop Join Order for Foreign Table Joins
List pgsql-general
Hello,

I’m in the process of writing a Postgres FDW that can interface with web service endpoints. Certain FDW columns would
actas web service parameters, while others would be the output. 

For example:

    adam=# select * from bing where query = 'xbox';
     query |               url               |     description
    -------+---------------------------------+---------------------
     xbox  | http://www.xbox.com/en-AU/index | Xbox Australia is
       ⋮   |                ⋮                |           ⋮

For the simple cases, extracting the quals (such as var [query] = const “xbox”) works perfectly.

I’d like to join FDW tables with other tables, possibly local or foreign.
ex: `select * from search_terms, bing where bing.query = search_terms.term`, where `search_terms` is a local table.

Without any parameterised paths, Postgres, as expected, will attempt to perform unfiltered foreign sequence scans
followedby a hash join of the two tables. Some service endpoints have no concept of unqualified queries. In the example
above,a ‘sequence scan’ of Bing is a not possible. 

I generate parameterised paths inside the FDW handler function `GetForeignPaths`. I call `create_foreignscan_path` with
aset of req_outer relids found by scanning PlannerInfo’s eq_classes, left_join_clauses and right_join_clauses. 

    Bitmapset* outer_relids = NULL;

        foreach(lc, root->eq_classes) {
            EquivalenceClass* ec = (EquivalenceClass *) lfirst(lc);
            if (ec->ec_members->length > 1)
                outer_relids = bms_union(outer_relids, ec->ec_relids);
        }

        foreach(lc, list_union(root->left_join_clauses, root->right_join_clauses)) {
            RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
            outer_relids = bms_union(outer_relids, ri->outer_relids);
        }

        Bitmapset* req_outer = bms_difference(outer_relids,
                                              bms_make_singleton(baserel->relid));

    foreignPath = create_foreignscan_path(root, baserel, nbrows, 0, 0, NIL, req_outer, NULL);


For simple joins this works. `BeginForeignScan` can access a list of quals with params, and the planner generates an
appropriatenested loop join over the foreign table. 

    adam=# explain select * from search_terms, bing where bing.query = search_terms.term;
                                  QUERY PLAN
    -----------------------------------------------------------------------
     Nested Loop  (cost=0.00..49.30 rows=6550 width=96)
       ->  Seq Scan on search_terms  (cost=0.00..23.10 rows=1310 width=32)
       ->  Foreign Scan on bing  (cost=0.00..0.00 rows=2 width=64)
             Filter: (search_terms.term = query)

Even a query over two foreign tables works correctly:

    adam=# create foreign table ft1 (inp text, out text) server test;
    adam=# create foreign table ft2 (inp text, out text) server test;

    adam=# explain select * from ft1, ft2 where ft1.inp = ‘hello’ and ft2.inp = ft1.out;
                                  QUERY PLAN
    ----------------------------------------------------------------------
     Nested Loop  (cost=0.00..500020.00 rows=5000 width=128)
       ->  Foreign Scan on ft1  (cost=0.00..500000.00 rows=1000 width=64)
             Filter: (inp = 'hello'::text)
       ->  Foreign Scan on ft2  (cost=0.00..0.00 rows=2 width=64)
             Filter: (ft1."out" = inp)


But, on a more complex query consisting of multiple foreign tables the planner generates something a little less
desirable:

    adam=# create foreign table ft3 (inp text, out text) server test;

    adam=# explain select * from ft1, ft2, ft3 where ft1.inp = 'hello' and ft2.inp = ft1.out and ft3.inp = ft2.out;
                                        QUERY PLAN
    ----------------------------------------------------------------------------------
     Nested Loop  (cost=500012.50..1000290.00 rows=25000 width=192)
       ->  Hash Join  (cost=500012.50..1000190.00 rows=5000 width=128)
             Hash Cond: (ft1."out" = ft2.inp)
             ->  Foreign Scan on ft1  (cost=0.00..500000.00 rows=1000 width=64)
                   Filter: (inp = 'hello'::text)
             ->  Hash  (cost=500000.00..500000.00 rows=1000 width=64)
                   ->  Foreign Scan on ft2  (cost=0.00..500000.00 rows=1000 width=64)
       ->  Foreign Scan on ft3  (cost=0.00..0.00 rows=2 width=64)
             Filter: (ft2."out" = inp)

The high total costs are the result of my attempts to coerce the planner to select the parameterised paths and generate
filteredforeign scans rather than preferring unfiltered foreign scans. 

I’ve tried adjusting the query planner tuneables (enable_hashjoin, et al) and the path costs with some degree of
success,but often the generated plans will filter the tables in the wrong order -- the output column of table 1 will be
filteredby the input column of table 2 -- which is technically correct as the operation should be associative and
transitive,but in this case, table 2 must be filtered by the output of table 1, not vice-versa. 

Is there a way to convince Postgres to always generate nested loop joins without sub merges, hashes or materialisations
andin the correct order? 

I’m thinking that the current method I’m using to generate the foreign scan path doesn’t take into account the required
orderand which fields can and cannot be parameterised. I’m not sure how to proceed with this. 

Thanks,
Adam
Attachment

pgsql-general by date:

Previous
From: Craig Ringer
Date:
Subject: Re: [HACKERS] Trust intermediate CA for client certificates
Next
From: Craig Ringer
Date:
Subject: Re: [HACKERS] Trust intermediate CA for client certificates