Re: Getting sorted data from foreign server for merge join - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Getting sorted data from foreign server for merge join
Date
Msg-id CAFjFpRfD7niTunGnhHjJ+mDPssAEeQSdYHwHUso3apUm8p7ONg@mail.gmail.com
Whole thread Raw
In response to Re: Getting sorted data from foreign server for merge join  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Getting sorted data from foreign server for merge join  (Rushabh Lathia <rushabh.lathia@gmail.com>)
List pgsql-hackers


On Thu, Nov 19, 2015 at 7:30 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Nov 17, 2015 at 8:33 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>> Although I'm usually on the side of marking things as extern whenever
>> we find it convenient, I'm nervous about doing that to
>> make_canonical_pathkey(), because it has side effects.  Searching the
>> list of canonical pathkeys for the one we need is reasonable, but is
>> it really right to ever think that we might create a new one at this
>> stage?  Maybe it is, but if so I'd like to hear a good explanation as
>> to why.
>
> For a foreign table no pathkeys are created before creating paths for
> individual foreign table since there are no indexes. For a regular table,
> the pathkeys get created for useful indexes.

OK.

Could some portion of the second foreach loop inside
get_useful_pathkeys_for_relation be replaced by a call to
eclass_useful_for_merging?  The logic looks very similar.

Yes. I have incorporated this change in the patch attached.
 

More broadly, I'm wondering about the asymmetries between this code
and pathkeys_useful_for_merging().  The latter has a
right_merge_direction() test and a case for non-EC-derivable join
clauses that aren't present in your code.
I wonder if we could just
generate pathkeys speculatively and then test which ones survive
truncate_useless_pathkeys(), or something like that.  This isn't an
enormously complicated patch, but it duplicates more of the logic in
pathkeys.c than I'd like.


pathkeys_useful_for_merging(), truncate_useless_pathkeys() and host of functions in that area are crafted to assess the usefulness of given pathkeys which for regular tables are "speculated" from indexes on that table. Foreign tables do not have indexes and neither we have information about the indexes (if any) on foreign server, thus pathkeys to be assessed are not readily available. Hence we need some other way of "speculating" pathkeys for foreign tables. We can not just generate pathkeys because there are infinite possible expressions on which pathkeys can be built. So, we hunt for the expressions at various places like root_pathkeys, merge joinable clauses etc. The seeming duplication of code is because the places where we are hunting for useful pathkeys in case of foreign table are same as the places used for assessing usefulness for pathkeys in case of regular table. Thus in case of foreign tables, we always generate useful pathkeys, which do not need any further assessment. For regular tables we have set of pathkeys which need to be assessed. Because of this difference the code differs in details.

right_merge_direction() compares whether a given pathkey (readily available or derived from an index) has the same direction as required by root->query_pathkeys or ascending direction. For foreign tables, the pathkeys are themselves created from root->query_pathkeys, thus doesn't require this assessment. The patch creates pathkeys other than those from root->query_pathkeys in the ascending order (like other places in the code eg. select_outer_pathkeys_for_merge()), which is what right_merge_direction() assesses. So again we don't need to call right_merge_direction().
 
non-EC-derivable join is interesting. Thanks for bringing it out. These are mergejoinable clauses in an OUTER join, where columns from inner side can be NULL, and thus not exactly equal. I will incorporate that change in the patch. Again while doing so, we have to just pick up the right or left equivalence class from a given RestrictInfo and don't need to assess it further like pathkeys_useful_for_merging().
 
Added this change in the attached patch.

I'm inclined to think that we shouldn't consider pathkeys that might
be useful for merge-joining unless we're using remote estimates.  It
seems more speculative than pushing down a toplevel sort.

I agree with you but let me elaborate why I agree. The pathkeys we are generating are heauristic in nature and thus may not be useful in the same sense as pathkeys for regular tables. If use_remote_estimate is ON, the planner will spend time in explaining multiple queries, but it will be in better position to cost the usage. If use_remote_estimate is OFF, we will altogether loose chances of merge join, which doesn't look good to me. But a speculative costing done in this case can result in choosing wrong plan similar to the same case as toplevel sort. But then choosing a wrong join has more serious implications than estimating wrong cost for top level join.
 

This patch seems rather light on test cases.


Added tests. Let me know if you have any specific scenario in mind.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Attachment

pgsql-hackers by date:

Previous
From: Kouhei Kaigai
Date:
Subject: Re: Foreign join pushdown vs EvalPlanQual
Next
From: Tom Lane
Date:
Subject: Re: [BUGS] BUG #13779: Inherited check constraint becomes non-inherited when related column is changed