Getting sorted data from foreign server for merge join - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Getting sorted data from foreign server for merge join
Date
Msg-id CAFjFpRfeKHiCmwJ72p4=ZvuZRQsaU9tbfyW7vwr-5PPvrCbcQQ@mail.gmail.com
Whole thread Raw
Responses Re: Getting sorted data from foreign server for merge join  (Robert Haas <robertmhaas@gmail.com>)
Re: Getting sorted data from foreign server for merge join  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
Hi All,
PFA patch to get data sorted from the foreign server (postgres_fdw) according to the pathkeys useful for merge join.

For a given base relation (extendable to join when that becomes available in postgres_fdw), the patch tries to find merge joinable clauses. It then adds paths with pathkeys extracted out of these merge joinable clauses. The merge joinable clauses form equivalence classes. The patch searches in root->eq_classes for equivalence members belonging to the given relation. For every such expression it creates a single member pathkey list and corresponding path. The test postgres_fdw.sql has an existing join which uses merge join. With this patch the data is sorted on the foreign server than locally.

While mergejoinable clauses can be obtained from rel->joininfo as well. But rel->joininfo contains other clauses as well and we need extra efforts to remove duplicates if the same expression appears in multiple merge joinable clauses.

Two joining relations can have multiple merge joinable clauses, requiring multi-member pathkeys so that merge join is efficient to the maximum extent. The order in which the expressions appears in pathkeys can change the costs of sorting the data according to the pathkeys, depending upon the expressions and the presence of indexes containing those expressions. Thus ideally we would need to club all the expressions appearing in all the clauses for given two relations and create paths with pathkeys for every order of these expressions.That explodes the number of possible paths. We may restrict the number of paths created by considering only certain orders like sort_inner_and_outer(). In any case, costing such paths increases the planning time which may not be worth it. So, this patch uses a heuristic approach of creating single member pathkeys path for every merge joinable expression.

The pathkeys need to be canonicalised using make_canonical_pathkey(), which is a static function. I have added a TODO and comments in the patch explaining possible ways to avoid "extern"alization of this function.

Comments/suggestions are welcome.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Within CF app, "Bug Fixes" should be "Bug Fixes/Refactoring"
Next
From: Craig Ringer
Date:
Subject: Re: Some questions about the array.