Thread: Getting sorted data from foreign server for merge join

Getting sorted data from foreign server for merge join

From
Ashutosh Bapat
Date:
Hi All,
PFA patch to get data sorted from the foreign server (postgres_fdw) according to the pathkeys useful for merge join.

For a given base relation (extendable to join when that becomes available in postgres_fdw), the patch tries to find merge joinable clauses. It then adds paths with pathkeys extracted out of these merge joinable clauses. The merge joinable clauses form equivalence classes. The patch searches in root->eq_classes for equivalence members belonging to the given relation. For every such expression it creates a single member pathkey list and corresponding path. The test postgres_fdw.sql has an existing join which uses merge join. With this patch the data is sorted on the foreign server than locally.

While mergejoinable clauses can be obtained from rel->joininfo as well. But rel->joininfo contains other clauses as well and we need extra efforts to remove duplicates if the same expression appears in multiple merge joinable clauses.

Two joining relations can have multiple merge joinable clauses, requiring multi-member pathkeys so that merge join is efficient to the maximum extent. The order in which the expressions appears in pathkeys can change the costs of sorting the data according to the pathkeys, depending upon the expressions and the presence of indexes containing those expressions. Thus ideally we would need to club all the expressions appearing in all the clauses for given two relations and create paths with pathkeys for every order of these expressions.That explodes the number of possible paths. We may restrict the number of paths created by considering only certain orders like sort_inner_and_outer(). In any case, costing such paths increases the planning time which may not be worth it. So, this patch uses a heuristic approach of creating single member pathkeys path for every merge joinable expression.

The pathkeys need to be canonicalised using make_canonical_pathkey(), which is a static function. I have added a TODO and comments in the patch explaining possible ways to avoid "extern"alization of this function.

Comments/suggestions are welcome.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Attachment

Re: Getting sorted data from foreign server for merge join

From
Robert Haas
Date:
On Thu, Nov 5, 2015 at 11:54 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> Hi All,
> PFA patch to get data sorted from the foreign server (postgres_fdw)
> according to the pathkeys useful for merge join.
>
> For a given base relation (extendable to join when that becomes available in
> postgres_fdw), the patch tries to find merge joinable clauses. It then adds
> paths with pathkeys extracted out of these merge joinable clauses. The merge
> joinable clauses form equivalence classes. The patch searches in
> root->eq_classes for equivalence members belonging to the given relation.
> For every such expression it creates a single member pathkey list and
> corresponding path. The test postgres_fdw.sql has an existing join which
> uses merge join. With this patch the data is sorted on the foreign server
> than locally.
>
> While mergejoinable clauses can be obtained from rel->joininfo as well. But
> rel->joininfo contains other clauses as well and we need extra efforts to
> remove duplicates if the same expression appears in multiple merge joinable
> clauses.
>
> Two joining relations can have multiple merge joinable clauses, requiring
> multi-member pathkeys so that merge join is efficient to the maximum extent.
> The order in which the expressions appears in pathkeys can change the costs
> of sorting the data according to the pathkeys, depending upon the
> expressions and the presence of indexes containing those expressions. Thus
> ideally we would need to club all the expressions appearing in all the
> clauses for given two relations and create paths with pathkeys for every
> order of these expressions.That explodes the number of possible paths. We
> may restrict the number of paths created by considering only certain orders
> like sort_inner_and_outer(). In any case, costing such paths increases the
> planning time which may not be worth it. So, this patch uses a heuristic
> approach of creating single member pathkeys path for every merge joinable
> expression.
>
> The pathkeys need to be canonicalised using make_canonical_pathkey(), which
> is a static function. I have added a TODO and comments in the patch
> explaining possible ways to avoid "extern"alization of this function.
>
> Comments/suggestions are welcome.

I think this approach is generally reasonable, but I suggested parts
of it, so may be biased.  I would be interested in hearing the
opinions of others.

Random notes:

"possibily" is a typo.

usable_pklist is confusing because it seems like it might be talking
about primary keys rather than pathkeys.  Also, I realize now, looking
at this again, that you're saying "usable" when what I really think
you mean is "useful".  Lots of pathkeys are usable, but only a few of
those are useful.  I suggest renaming usable_pathkeys to
query_pathkeys and usable_pklist to useful_pathkeys.  Similarly, let's
rename generate_pathkeys_for_relation() to
get_useful_pathkeys_for_relation().

Although I'm usually on the side of marking things as extern whenever
we find it convenient, I'm nervous about doing that to
make_canonical_pathkey(), because it has side effects.  Searching the
list of canonical pathkeys for the one we need is reasonable, but is
it really right to ever think that we might create a new one at this
stage?  Maybe it is, but if so I'd like to hear a good explanation as
to why.
Is the comment "Equivalence classes covering relations other than the
current one are of interest here" missing a "not"?

I don't find this comment illuminating:

+         * In case of child relation, we need to check that the
+         * equivalence class indicates a join to a relation other than
+         * parents, other children and itself (something similar to above).
+         * Otherwise we will end up creating useless paths. The code below is
+         * similar to generate_implied_equalities_for_column(), which might
+         * give a hint.

That basically just says that we have to do it this way because the
other way would be wrong.  But it doesn't say WHY the other way would
be wrong. Then a few lines later, you have another comment which says
the same thing again:

+                /*
+                 * Ignore equivalence members which correspond to children
+                 * or same relation or to parent relations
+                 */

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Getting sorted data from foreign server for merge join

From
Kevin Grittner
Date:
On Friday, November 6, 2015 10:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:

> I think this approach is generally reasonable, but I suggested
> parts of it, so may be biased.  I would be interested in hearing
> the opinions of others.

Has anyone taken a close look at what happens if the two sides of
the merge join have different implementations of the same collation
name?  Is there anything we should do to defend against the
problem?

We already face the issue of corrupted indexes when we have
different revisions of glibc on a primary and a standby or when the
OS on a server is updated, so this wouldn't be entirely a *new*
problem:

http://www.postgresql.org/message-id/BA6132ED-1F6B-4A0B-AC22-81278F5AB81E@tripadvisor.com

... but it would be a brand-new way to hit it, and we might be able
to spot the problem in a merge join by watching for rows being fed
to either side of the join which are not in order according to the
machine doing the join.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Getting sorted data from foreign server for merge join

From
Corey Huinker
Date:
On Fri, Nov 6, 2015 at 11:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Nov 5, 2015 at 11:54 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> Hi All,
> PFA patch to get data sorted from the foreign server (postgres_fdw)
> according to the pathkeys useful for merge join.
>
> For a given base relation (extendable to join when that becomes available in
> postgres_fdw), the patch tries to find merge joinable clauses. It then adds
> paths with pathkeys extracted out of these merge joinable clauses. The merge
> joinable clauses form equivalence classes. The patch searches in
> root->eq_classes for equivalence members belonging to the given relation.
> For every such expression it creates a single member pathkey list and
> corresponding path. The test postgres_fdw.sql has an existing join which
> uses merge join. With this patch the data is sorted on the foreign server
> than locally.
>
> While mergejoinable clauses can be obtained from rel->joininfo as well. But
> rel->joininfo contains other clauses as well and we need extra efforts to
> remove duplicates if the same expression appears in multiple merge joinable
> clauses.
>
> Two joining relations can have multiple merge joinable clauses, requiring
> multi-member pathkeys so that merge join is efficient to the maximum extent.
> The order in which the expressions appears in pathkeys can change the costs
> of sorting the data according to the pathkeys, depending upon the
> expressions and the presence of indexes containing those expressions. Thus
> ideally we would need to club all the expressions appearing in all the
> clauses for given two relations and create paths with pathkeys for every
> order of these expressions.That explodes the number of possible paths. We
> may restrict the number of paths created by considering only certain orders
> like sort_inner_and_outer(). In any case, costing such paths increases the
> planning time which may not be worth it. So, this patch uses a heuristic
> approach of creating single member pathkeys path for every merge joinable
> expression.
>
> The pathkeys need to be canonicalised using make_canonical_pathkey(), which
> is a static function. I have added a TODO and comments in the patch
> explaining possible ways to avoid "extern"alization of this function.
>
> Comments/suggestions are welcome.

I think this approach is generally reasonable, but I suggested parts
of it, so may be biased.  I would be interested in hearing the
opinions of others.

Random notes:

"possibily" is a typo.

usable_pklist is confusing because it seems like it might be talking
about primary keys rather than pathkeys.  Also, I realize now, looking
at this again, that you're saying "usable" when what I really think
you mean is "useful".  Lots of pathkeys are usable, but only a few of
those are useful.  I suggest renaming usable_pathkeys to
query_pathkeys and usable_pklist to useful_pathkeys.  Similarly, let's
rename generate_pathkeys_for_relation() to
get_useful_pathkeys_for_relation().

Although I'm usually on the side of marking things as extern whenever
we find it convenient, I'm nervous about doing that to
make_canonical_pathkey(), because it has side effects.  Searching the
list of canonical pathkeys for the one we need is reasonable, but is
it really right to ever think that we might create a new one at this
stage?  Maybe it is, but if so I'd like to hear a good explanation as
to why.

 Is the comment "Equivalence classes covering relations other than the
current one are of interest here" missing a "not"?

I don't find this comment illuminating:

+         * In case of child relation, we need to check that the
+         * equivalence class indicates a join to a relation other than
+         * parents, other children and itself (something similar to above).
+         * Otherwise we will end up creating useless paths. The code below is
+         * similar to generate_implied_equalities_for_column(), which might
+         * give a hint.

That basically just says that we have to do it this way because the
other way would be wrong.  But it doesn't say WHY the other way would
be wrong. Then a few lines later, you have another comment which says
the same thing again:

+                /*
+                 * Ignore equivalence members which correspond to children
+                 * or same relation or to parent relations
+                 */

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Sorry to barge in late, but I was wondering if what we've learned with this patch can be applied to the case of asserting a sort order on a query returning from dblink(). 

This is a common use-case for us where one machine will issue a query request to a non-Pg database that happens to speak libpq (Vertica, Redshift, I guess Greenplum would be one too), but for whom foreign data wrappers aren't yet an option, and then join that data to local tables, many of which have indexes matching the sort order that I know is coming from the dblink(), but have no way of telling that to the query planner.

I'm guessing the answer is "no", but I wanted to make you aware of a similar real-world need.

Re: Getting sorted data from foreign server for merge join

From
Greg Stark
Date:
On Fri, Nov 6, 2015 at 4:54 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> PFA patch to get data sorted from the foreign server (postgres_fdw)
> according to the pathkeys useful for merge join.


An idle thought. There are going to be a lot of cases where different
software systems actually disagree about collation rules. I wonder if
it would be valuable to have a node that just checks that each row is
in fact greater than the previous row and throws an error if not. That
can be optional or a parameter of the FDW but it's probably cheap
enough to have enabled by default. It would save a lot of difficult to
heartache since the behaviour if the inputs aren't correctly sorted
will be strangely incorrect join results. Often the results may just
be missing or duplicated rows and that can be easy to miss and lead to
corrupted databases or security problems.

-- 
greg



Re: Getting sorted data from foreign server for merge join

From
Robert Haas
Date:
On Sat, Nov 7, 2015 at 12:07 PM, Corey Huinker <corey.huinker@gmail.com> wrote:
> Sorry to barge in late, but I was wondering if what we've learned with this
> patch can be applied to the case of asserting a sort order on a query
> returning from dblink().

Nope.

Sorry to the bearer of bad news, but that would be a different and
much harder development project.  Set-returning functions have no way
to indicate anything about the ordering of their return values at
present.  We could invent something, but the work involved wouldn't
have much to do with this patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Getting sorted data from foreign server for merge join

From
Robert Haas
Date:
On Sat, Nov 7, 2015 at 5:42 PM, Greg Stark <stark@mit.edu> wrote:
> On Fri, Nov 6, 2015 at 4:54 AM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> PFA patch to get data sorted from the foreign server (postgres_fdw)
>> according to the pathkeys useful for merge join.
>
> An idle thought. There are going to be a lot of cases where different
> software systems actually disagree about collation rules. I wonder if
> it would be valuable to have a node that just checks that each row is
> in fact greater than the previous row and throws an error if not. That
> can be optional or a parameter of the FDW but it's probably cheap
> enough to have enabled by default. It would save a lot of difficult to
> heartache since the behaviour if the inputs aren't correctly sorted
> will be strangely incorrect join results. Often the results may just
> be missing or duplicated rows and that can be easy to miss and lead to
> corrupted databases or security problems.

It's not a bad thought, but it could come up even locally - we've had
more than one situation where indexes have gotten corrupted by
updating glibc.  The new glibc doesn't agree with the old one on what
the collation ordering is, and so the indexes are wrong with respect
to the new glibc version.  If we were going to design something like
this, rather than making it a separate node, I'd be inclined to create
it as a C-callable function that could be invoked anywhere we want to
check that the ordering is valid.  I suspect you're wrong about the
cost, though: I bet it wouldn't be too hard to find cases where it
imposes a really noticeable penalty.

Also, to be clear, I don't think this patch needs to solve that problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Getting sorted data from foreign server for merge join

From
Robert Haas
Date:
On Fri, Nov 6, 2015 at 2:03 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Has anyone taken a close look at what happens if the two sides of
> the merge join have different implementations of the same collation
> name?  Is there anything we should do to defend against the
> problem?

The issue of FDWs vs. collations has been thought about to some degree
(see FDW_COLLATE_NONE/SAFE/UNSAFE), but I don't quite understand that
stuff in detail.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Getting sorted data from foreign server for merge join

From
Amit Langote
Date:
On 2015/11/10 0:56, Robert Haas wrote:
> On Sat, Nov 7, 2015 at 12:07 PM, Corey Huinker <corey.huinker@gmail.com> wrote:
>> Sorry to barge in late, but I was wondering if what we've learned with this
>> patch can be applied to the case of asserting a sort order on a query
>> returning from dblink().
> 
> Nope.
> 
> Sorry to the bearer of bad news, but that would be a different and
> much harder development project.  Set-returning functions have no way
> to indicate anything about the ordering of their return values at
> present.  We could invent something, but the work involved wouldn't
> have much to do with this patch.

There was a patch (which, it seems, was rejected [1]) to do this sort of
thing.

Thanks,
Amit

[1] https://commitfest.postgresql.org/4/88/




Re: Getting sorted data from foreign server for merge join

From
Ashutosh Bapat
Date:


On Mon, Nov 9, 2015 at 9:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Nov 6, 2015 at 2:03 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Has anyone taken a close look at what happens if the two sides of
> the merge join have different implementations of the same collation
> name?  Is there anything we should do to defend against the
> problem?

The issue of FDWs vs. collations has been thought about to some degree
(see FDW_COLLATE_NONE/SAFE/UNSAFE), but I don't quite understand that
stuff in detail> .
>

collations arising from a foreign table's var are considered to be safer (FDW_COLLATE_SAFE) to push down to the foreign server , since they are either local default collation or are assumed to be same on foreign and local server as per user declaration. The onus of making that sure is on the user who declares particular collation for a foreign table var. As said upthread, different glibc implementations can cause collation ordering mismatch, this patch will be susceptible to the same problem as master/standby problem.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Getting sorted data from foreign server for merge join

From
Craig Ringer
Date:
On 16 November 2015 at 17:47, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

collations arising from a foreign table's var are considered to be safer (FDW_COLLATE_SAFE) to push down to the foreign server , since they are either local default collation or are assumed to be same on foreign and local server as per user declaration. The onus of making that sure is on the user who declares particular collation for a foreign table var. As said upthread, different glibc implementations can cause collation ordering mismatch, this patch will be susceptible to the same problem as master/standby problem.


Yeah. It's less bad than the related problems we already have:

* Upgrading glibc can cause your indexes to no longer match your local collations
* Different glibc versions on master and replica(s) can have the same effect

I don't see a problem with adding another way this same issue can be expressed, since there's no sane way to fix it _properly_ without either versioned collations in glibc or bringing collation onboard into Pg.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Getting sorted data from foreign server for merge join

From
Ashutosh Bapat
Date:
Thanks Robert for your comments. Please see my replies inlined. Updated patch is attached.

On Fri, Nov 6, 2015 at 10:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think this approach is generally reasonable, but I suggested parts
of it, so may be biased.  I would be interested in hearing the
opinions of others.

Random notes:

"possibily" is a typo.

Fixed.
 

usable_pklist is confusing because it seems like it might be talking
about primary keys rather than pathkeys.  Also, I realize now, looking
at this again, that you're saying "usable" when what I really think
you mean is "useful". Lots of pathkeys are usable, but only a few of
those are useful.  I suggest renaming usable_pathkeys to
query_pathkeys

Done.
 
and usable_pklist to useful_pathkeys. 

pathkeys is being used to mean a list of pathkey nodes. What we want here is list of pathkeys so I have renamed it as useful_pathkeys_list, somewhat long but clear.
 
Similarly, let's
rename generate_pathkeys_for_relation() to
get_useful_pathkeys_for_relation().

Done.
 

Although I'm usually on the side of marking things as extern whenever
we find it convenient, I'm nervous about doing that to
make_canonical_pathkey(), because it has side effects.  Searching the
list of canonical pathkeys for the one we need is reasonable, but is
it really right to ever think that we might create a new one at this
stage?  Maybe it is, but if so I'd like to hear a good explanation as
to why.

For a foreign table no pathkeys are created before creating paths for individual foreign table since there are no indexes. For a regular table, the pathkeys get created for useful indexes. Exception to this is if the expression appears in the ORDER BY clause, the pathkey for the same is created while handling ORDER BY clause. So, we will always need to create pathkey for a foreign table unless the corresponding expression does not appear in the ORDER BY clause. This can be verified by breaking in make_canonical_pathkey() while executing "explain verbose select * from ft1 join lt using (val);" ft1(val int, val2 int) is foreign table and lt (val int, val2 int) is local table. You will hit the first breakpoint with stack trace
Breakpoint 1, make_canonical_pathkey (root=0x231d858, eclass=0x231f030, opfamily=1976, strategy=1, nulls_first=0 '\000') at pathkeys.c:60
(gdb) where
#0  make_canonical_pathkey (root=0x231d858, eclass=0x231f030, opfamily=1976, strategy=1, nulls_first=0 '\000') at pathkeys.c:60
#1  0x00007f6740077e39 in get_useful_pathkeys_for_relation (root=0x231d858, rel=0x231e390) at postgres_fdw.c:663
#2  0x00007f6740077f34 in postgresGetForeignPaths (root=0x231d858, baserel=0x231e390, foreigntableid=16393) at postgres_fdw.c:705
#3  0x00000000007079cf in set_foreign_pathlist (root=0x231d858, rel=0x231e390, rte=0x231c488) at allpaths.c:600
#4  0x0000000000707653 in set_rel_pathlist (root=0x231d858, rel=0x231e390, rti=1, rte=0x231c488) at allpaths.c:395
#5  0x000000000070735f in set_base_rel_pathlists (root=0x231d858) at allpaths.c:277

at this point

(gdb) p root->canon_pathkeys
$1 = (List *) 0x0

so get_useful_pathkeys_for_relation->make_canonical_pathkey() will create the first pathkey.

I have left the corresponding TODO untouched, if anybody else wants to review that part of the code. I will remove it in the final version of the patch.
 

 Is the comment "Equivalence classes covering relations other than the
current one are of interest here" missing a "not"?

The sentence is correct. We need equivalence class covering relations other than the current one, because only such equivalence classes indicate joins between two relations. If an equivalence class covers only a single baserel (has only a single member in ec_relids), it indicates equivalence between columns/expressions of the same table, which can not result in a join. I have changed to comment to be more elaborate.
 

I don't find this comment illuminating:

+         * In case of child relation, we need to check that the
+         * equivalence class indicates a join to a relation other than
+         * parents, other children and itself (something similar to above).
+         * Otherwise we will end up creating useless paths. The code below is
+         * similar to generate_implied_equalities_for_column(), which might
+         * give a hint.

That basically just says that we have to do it this way because the
other way would be wrong.  But it doesn't say WHY the other way would
be wrong.

There's no "other way" which is wrong. What's the other way you are talking about?

Anway, I have updated the comment to be more readable.
 
Then a few lines later, you have another comment which says
the same thing again:

+                /*
+                 * Ignore equivalence members which correspond to children
+                 * or same relation or to parent relations
+                 */


Updated this too.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Attachment

Re: Getting sorted data from foreign server for merge join

From
Robert Haas
Date:
On Tue, Nov 17, 2015 at 8:33 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>> Although I'm usually on the side of marking things as extern whenever
>> we find it convenient, I'm nervous about doing that to
>> make_canonical_pathkey(), because it has side effects.  Searching the
>> list of canonical pathkeys for the one we need is reasonable, but is
>> it really right to ever think that we might create a new one at this
>> stage?  Maybe it is, but if so I'd like to hear a good explanation as
>> to why.
>
> For a foreign table no pathkeys are created before creating paths for
> individual foreign table since there are no indexes. For a regular table,
> the pathkeys get created for useful indexes.

OK.

Could some portion of the second foreach loop inside
get_useful_pathkeys_for_relation be replaced by a call to
eclass_useful_for_merging?  The logic looks very similar.

More broadly, I'm wondering about the asymmetries between this code
and pathkeys_useful_for_merging().  The latter has a
right_merge_direction() test and a case for non-EC-derivable join
clauses that aren't present in your code.  I wonder if we could just
generate pathkeys speculatively and then test which ones survive
truncate_useless_pathkeys(), or something like that.  This isn't an
enormously complicated patch, but it duplicates more of the logic in
pathkeys.c than I'd like.

I'm inclined to think that we shouldn't consider pathkeys that might
be useful for merge-joining unless we're using remote estimates.  It
seems more speculative than pushing down a toplevel sort.

This patch seems rather light on test cases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Getting sorted data from foreign server for merge join

From
Ashutosh Bapat
Date:


On Thu, Nov 19, 2015 at 7:30 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Nov 17, 2015 at 8:33 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>> Although I'm usually on the side of marking things as extern whenever
>> we find it convenient, I'm nervous about doing that to
>> make_canonical_pathkey(), because it has side effects.  Searching the
>> list of canonical pathkeys for the one we need is reasonable, but is
>> it really right to ever think that we might create a new one at this
>> stage?  Maybe it is, but if so I'd like to hear a good explanation as
>> to why.
>
> For a foreign table no pathkeys are created before creating paths for
> individual foreign table since there are no indexes. For a regular table,
> the pathkeys get created for useful indexes.

OK.

Could some portion of the second foreach loop inside
get_useful_pathkeys_for_relation be replaced by a call to
eclass_useful_for_merging?  The logic looks very similar.

Yes. I have incorporated this change in the patch attached.
 

More broadly, I'm wondering about the asymmetries between this code
and pathkeys_useful_for_merging().  The latter has a
right_merge_direction() test and a case for non-EC-derivable join
clauses that aren't present in your code.
I wonder if we could just
generate pathkeys speculatively and then test which ones survive
truncate_useless_pathkeys(), or something like that.  This isn't an
enormously complicated patch, but it duplicates more of the logic in
pathkeys.c than I'd like.


pathkeys_useful_for_merging(), truncate_useless_pathkeys() and host of functions in that area are crafted to assess the usefulness of given pathkeys which for regular tables are "speculated" from indexes on that table. Foreign tables do not have indexes and neither we have information about the indexes (if any) on foreign server, thus pathkeys to be assessed are not readily available. Hence we need some other way of "speculating" pathkeys for foreign tables. We can not just generate pathkeys because there are infinite possible expressions on which pathkeys can be built. So, we hunt for the expressions at various places like root_pathkeys, merge joinable clauses etc. The seeming duplication of code is because the places where we are hunting for useful pathkeys in case of foreign table are same as the places used for assessing usefulness for pathkeys in case of regular table. Thus in case of foreign tables, we always generate useful pathkeys, which do not need any further assessment. For regular tables we have set of pathkeys which need to be assessed. Because of this difference the code differs in details.

right_merge_direction() compares whether a given pathkey (readily available or derived from an index) has the same direction as required by root->query_pathkeys or ascending direction. For foreign tables, the pathkeys are themselves created from root->query_pathkeys, thus doesn't require this assessment. The patch creates pathkeys other than those from root->query_pathkeys in the ascending order (like other places in the code eg. select_outer_pathkeys_for_merge()), which is what right_merge_direction() assesses. So again we don't need to call right_merge_direction().
 
non-EC-derivable join is interesting. Thanks for bringing it out. These are mergejoinable clauses in an OUTER join, where columns from inner side can be NULL, and thus not exactly equal. I will incorporate that change in the patch. Again while doing so, we have to just pick up the right or left equivalence class from a given RestrictInfo and don't need to assess it further like pathkeys_useful_for_merging().
 
Added this change in the attached patch.

I'm inclined to think that we shouldn't consider pathkeys that might
be useful for merge-joining unless we're using remote estimates.  It
seems more speculative than pushing down a toplevel sort.

I agree with you but let me elaborate why I agree. The pathkeys we are generating are heauristic in nature and thus may not be useful in the same sense as pathkeys for regular tables. If use_remote_estimate is ON, the planner will spend time in explaining multiple queries, but it will be in better position to cost the usage. If use_remote_estimate is OFF, we will altogether loose chances of merge join, which doesn't look good to me. But a speculative costing done in this case can result in choosing wrong plan similar to the same case as toplevel sort. But then choosing a wrong join has more serious implications than estimating wrong cost for top level join.
 

This patch seems rather light on test cases.


Added tests. Let me know if you have any specific scenario in mind.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Attachment

Re: Getting sorted data from foreign server for merge join

From
Rushabh Lathia
Date:
Hi Ashutosh,

I reviewed your latest version of patch and over all the implementation
and other details look good to me.

Here are few cosmetic issues which I found:

1) Patch not getting applied cleanly - white space warning

2)

-    List       *usable_pathkeys = NIL;
+    List        *useful_pathkeys_list = NIL;    /* List of all pathkeys */

Code alignment is not correct with other declared variables.

3)

+        {
+            PathKey    *pathkey;
+            List    *pathkeys;
+
+            pathkey = make_canonical_pathkey(root, cur_ec,
+                                            linitial_oid(cur_ec->ec_opfamilies),
+                                            BTLessStrategyNumber,
+                                            false);
+            pathkeys = list_make1(pathkey);
+            useful_pathkeys_list = lappend(useful_pathkeys_list, pathkeys);
+        }

Code alignment need to fix at make_canonical_pathkey().

4)

I don't understand the meaning of following added testcase into postgres_fdw.

+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -171,20 +171,35 @@ SELECT COUNT(*) FROM ft1 t1;
 -- join two tables
 SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
 -- subquery
 SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
 -- subquery+MAX
 SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY c1;
 -- used in CTE
 WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
 -- fixed values
 SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
+-- getting data sorted from the foreign table for merge join
+-- Since we are interested in merge join, disable other joins
+SET enable_hashjoin TO false;
+SET enable_nestloop TO false;
+-- inner join, expressions in the clauses appear in the equivalence class list
+EXPLAIN (VERBOSE, COSTS false)
+    SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+-- outer join, expression in the clauses do not appear in equivalence class list
+-- but no output change as compared to the previous query
+EXPLAIN (VERBOSE, COSTS false)
+    SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SET enable_hashjoin TO true;
+SET enable_nestloop TO true;

Because, I removed the code changes of the patch and then I run the test
seem like it has nothing to do with the code changes. Above set of test giving
same result with/without patch.

Am I missing something ?

Apart from this I debugged the patch for each scenario (query pathkeys and
pathkeys arising out of the equivalence classes) and so far patch looks good
to me.

Attaching update version of patch by fixing the cosmetic changes.


On Fri, Nov 20, 2015 at 9:14 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:


On Thu, Nov 19, 2015 at 7:30 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Nov 17, 2015 at 8:33 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>> Although I'm usually on the side of marking things as extern whenever
>> we find it convenient, I'm nervous about doing that to
>> make_canonical_pathkey(), because it has side effects.  Searching the
>> list of canonical pathkeys for the one we need is reasonable, but is
>> it really right to ever think that we might create a new one at this
>> stage?  Maybe it is, but if so I'd like to hear a good explanation as
>> to why.
>
> For a foreign table no pathkeys are created before creating paths for
> individual foreign table since there are no indexes. For a regular table,
> the pathkeys get created for useful indexes.

OK.

Could some portion of the second foreach loop inside
get_useful_pathkeys_for_relation be replaced by a call to
eclass_useful_for_merging?  The logic looks very similar.

Yes. I have incorporated this change in the patch attached.
 

More broadly, I'm wondering about the asymmetries between this code
and pathkeys_useful_for_merging().  The latter has a
right_merge_direction() test and a case for non-EC-derivable join
clauses that aren't present in your code.
I wonder if we could just
generate pathkeys speculatively and then test which ones survive
truncate_useless_pathkeys(), or something like that.  This isn't an
enormously complicated patch, but it duplicates more of the logic in
pathkeys.c than I'd like.


pathkeys_useful_for_merging(), truncate_useless_pathkeys() and host of functions in that area are crafted to assess the usefulness of given pathkeys which for regular tables are "speculated" from indexes on that table. Foreign tables do not have indexes and neither we have information about the indexes (if any) on foreign server, thus pathkeys to be assessed are not readily available. Hence we need some other way of "speculating" pathkeys for foreign tables. We can not just generate pathkeys because there are infinite possible expressions on which pathkeys can be built. So, we hunt for the expressions at various places like root_pathkeys, merge joinable clauses etc. The seeming duplication of code is because the places where we are hunting for useful pathkeys in case of foreign table are same as the places used for assessing usefulness for pathkeys in case of regular table. Thus in case of foreign tables, we always generate useful pathkeys, which do not need any further assessment. For regular tables we have set of pathkeys which need to be assessed. Because of this difference the code differs in details.

right_merge_direction() compares whether a given pathkey (readily available or derived from an index) has the same direction as required by root->query_pathkeys or ascending direction. For foreign tables, the pathkeys are themselves created from root->query_pathkeys, thus doesn't require this assessment. The patch creates pathkeys other than those from root->query_pathkeys in the ascending order (like other places in the code eg. select_outer_pathkeys_for_merge()), which is what right_merge_direction() assesses. So again we don't need to call right_merge_direction().
 
non-EC-derivable join is interesting. Thanks for bringing it out. These are mergejoinable clauses in an OUTER join, where columns from inner side can be NULL, and thus not exactly equal. I will incorporate that change in the patch. Again while doing so, we have to just pick up the right or left equivalence class from a given RestrictInfo and don't need to assess it further like pathkeys_useful_for_merging().
 
Added this change in the attached patch.

I'm inclined to think that we shouldn't consider pathkeys that might
be useful for merge-joining unless we're using remote estimates.  It
seems more speculative than pushing down a toplevel sort.

I agree with you but let me elaborate why I agree. The pathkeys we are generating are heauristic in nature and thus may not be useful in the same sense as pathkeys for regular tables. If use_remote_estimate is ON, the planner will spend time in explaining multiple queries, but it will be in better position to cost the usage. If use_remote_estimate is OFF, we will altogether loose chances of merge join, which doesn't look good to me. But a speculative costing done in this case can result in choosing wrong plan similar to the same case as toplevel sort. But then choosing a wrong join has more serious implications than estimating wrong cost for top level join.
 

This patch seems rather light on test cases.


Added tests. Let me know if you have any specific scenario in mind.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers




--
Rushabh Lathia
Attachment

Re: Getting sorted data from foreign server for merge join

From
Ashutosh Bapat
Date:
Thanks Rushabh for your review and comments.

On Thu, Nov 26, 2015 at 5:39 PM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
Hi Ashutosh,

I reviewed your latest version of patch and over all the implementation
and other details look good to me.

Here are few cosmetic issues which I found:

1) Patch not getting applied cleanly - white space warning


Done.
 
2)

-    List       *usable_pathkeys = NIL;
+    List        *useful_pathkeys_list = NIL;    /* List of all pathkeys */

Code alignment is not correct with other declared variables.


Incorporated the change in the patch.

3)

+        {
+            PathKey    *pathkey;
+            List    *pathkeys;
+
+            pathkey = make_canonical_pathkey(root, cur_ec,
+                                            linitial_oid(cur_ec->ec_opfamilies),
+                                            BTLessStrategyNumber,
+                                            false);
+            pathkeys = list_make1(pathkey);
+            useful_pathkeys_list = lappend(useful_pathkeys_list, pathkeys);
+        }

Code alignment need to fix at make_canonical_pathkey().

Incorporated the change in the patch.

I have also removed the TODO item in the prologue of this function, since none has objected to externalization of make_canonical_pathkeys till now and it's not expected to be part of the final commit.
 

4)

I don't understand the meaning of following added testcase into postgres_fdw.

+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -171,20 +171,35 @@ SELECT COUNT(*) FROM ft1 t1;
 -- join two tables
 SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
 -- subquery
 SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
 -- subquery+MAX
 SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY c1;
 -- used in CTE
 WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
 -- fixed values
 SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
+-- getting data sorted from the foreign table for merge join
+-- Since we are interested in merge join, disable other joins
+SET enable_hashjoin TO false;
+SET enable_nestloop TO false;
+-- inner join, expressions in the clauses appear in the equivalence class list
+EXPLAIN (VERBOSE, COSTS false)
+    SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+-- outer join, expression in the clauses do not appear in equivalence class list
+-- but no output change as compared to the previous query
+EXPLAIN (VERBOSE, COSTS false)
+    SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SET enable_hashjoin TO true;
+SET enable_nestloop TO true;

Because, I removed the code changes of the patch and then I run the test
seem like it has nothing to do with the code changes. Above set of test giving
same result with/without patch.

Am I missing something ?

Actually, the output of merge join is always ordered by the pathkeys used for merge join. That routed through LIMIT node remains ordered. So, we actually do not need ORDER BY t1.c1 clause in the above queries. Without that clause, the tests will show difference output with and without patch. I have changed the attached patch accordingly.
 

Apart from this I debugged the patch for each scenario (query pathkeys and
pathkeys arising out of the equivalence classes) and so far patch looks good
to me.


Thanks.
 
Attaching update version of patch by fixing the cosmetic changes.


Attached version of patch contains your changes.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Attachment

Re: Getting sorted data from foreign server for merge join

From
Rushabh Lathia
Date:
Thanks Ashutosh.

Re-reviewed and Re-verified the patch, pg_sort_all_pd_v5.patch
looks good to me.


On Fri, Nov 27, 2015 at 3:02 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
Thanks Rushabh for your review and comments.

On Thu, Nov 26, 2015 at 5:39 PM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
Hi Ashutosh,

I reviewed your latest version of patch and over all the implementation
and other details look good to me.

Here are few cosmetic issues which I found:

1) Patch not getting applied cleanly - white space warning


Done.
 
2)

-    List       *usable_pathkeys = NIL;
+    List        *useful_pathkeys_list = NIL;    /* List of all pathkeys */

Code alignment is not correct with other declared variables.


Incorporated the change in the patch.

3)

+        {
+            PathKey    *pathkey;
+            List    *pathkeys;
+
+            pathkey = make_canonical_pathkey(root, cur_ec,
+                                            linitial_oid(cur_ec->ec_opfamilies),
+                                            BTLessStrategyNumber,
+                                            false);
+            pathkeys = list_make1(pathkey);
+            useful_pathkeys_list = lappend(useful_pathkeys_list, pathkeys);
+        }

Code alignment need to fix at make_canonical_pathkey().

Incorporated the change in the patch.

I have also removed the TODO item in the prologue of this function, since none has objected to externalization of make_canonical_pathkeys till now and it's not expected to be part of the final commit.
 

4)

I don't understand the meaning of following added testcase into postgres_fdw.

+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -171,20 +171,35 @@ SELECT COUNT(*) FROM ft1 t1;
 -- join two tables
 SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
 -- subquery
 SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
 -- subquery+MAX
 SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY c1;
 -- used in CTE
 WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
 -- fixed values
 SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
+-- getting data sorted from the foreign table for merge join
+-- Since we are interested in merge join, disable other joins
+SET enable_hashjoin TO false;
+SET enable_nestloop TO false;
+-- inner join, expressions in the clauses appear in the equivalence class list
+EXPLAIN (VERBOSE, COSTS false)
+    SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+-- outer join, expression in the clauses do not appear in equivalence class list
+-- but no output change as compared to the previous query
+EXPLAIN (VERBOSE, COSTS false)
+    SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SET enable_hashjoin TO true;
+SET enable_nestloop TO true;

Because, I removed the code changes of the patch and then I run the test
seem like it has nothing to do with the code changes. Above set of test giving
same result with/without patch.

Am I missing something ?

Actually, the output of merge join is always ordered by the pathkeys used for merge join. That routed through LIMIT node remains ordered. So, we actually do not need ORDER BY t1.c1 clause in the above queries. Without that clause, the tests will show difference output with and without patch. I have changed the attached patch accordingly.
 

Apart from this I debugged the patch for each scenario (query pathkeys and
pathkeys arising out of the equivalence classes) and so far patch looks good
to me.


Thanks.
 
Attaching update version of patch by fixing the cosmetic changes.


Attached version of patch contains your changes.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company






--
Rushabh Lathia

Re: Getting sorted data from foreign server for merge join

From
Robert Haas
Date:
On Wed, Dec 2, 2015 at 6:45 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> Thanks Ashutosh.
>
> Re-reviewed and Re-verified the patch, pg_sort_all_pd_v5.patch
> looks good to me.

This patch needs a rebase.

It's not going to work to say this is a patch proposed for commit when
it's still got a TODO comment in it that obviously needs to be
changed.   And the formatting of that long comment is pretty weird,
too, and not consistent with other functions in that same file (e.g.
get_remote_estimate, ec_member_matches_foreign, create_cursor).

Aside from that, I think before we commit this, somebody should do
some testing that demonstrates that this is actually a good idea.  Not
as part of the test case set for this patch, but just in general.
Merge joins are typically going to be relevant for large tables, but
the examples in the regression tests are necessarily tiny.  I'd like
to see some sample data and some sample queries that get appreciably
faster with this code.  If we can't find any, we don't need the code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Getting sorted data from foreign server for merge join

From
Ashutosh Bapat
Date:


On Wed, Dec 9, 2015 at 12:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Dec 2, 2015 at 6:45 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> Thanks Ashutosh.
>
> Re-reviewed and Re-verified the patch, pg_sort_all_pd_v5.patch
> looks good to me.

This patch needs a rebase.

Done.
 

It's not going to work to say this is a patch proposed for commit when
it's still got a TODO comment in it that obviously needs to be
changed.   And the formatting of that long comment is pretty weird,
too, and not consistent with other functions in that same file (e.g.
get_remote_estimate, ec_member_matches_foreign, create_cursor).


The TODO was present in v4 but not in v5 and is not present in v6 attached here.. Formatted comment according estimate_path_cost_size(), convert_prep_stmt_params().
 
Aside from that, I think before we commit this, somebody should do
some testing that demonstrates that this is actually a good idea.  Not
as part of the test case set for this patch, but just in general.
Merge joins are typically going to be relevant for large tables, but
the examples in the regression tests are necessarily tiny.  I'd like
to see some sample data and some sample queries that get appreciably
faster with this code.  If we can't find any, we don't need the code.


I tested the patch on my laptop with two types of queries, a join between two foreign tables on different foreign servers (pointing to the same self server) and a join between one foreign and one local table. The foreign tables and servers are created using sort_pd_setup.sql attached. Foreign tables pointed to table with index useful for join clause. Both the joining tables had 10M rows. The execution time of query was measured for 100 runs and average and standard deviation were calculated (using function query_execution_stats() in script sort_pd.sql) and are presented below.

1. Query between foreign tables
SELECT ft1.val, ft2.val FROM ft1 join ft2 on (ft1.val = ft2.val)

Plan and timings without patch
EXPLAIN (VERBOSE, ANALYSE) :query ;
                                                                 QUERY PLAN                                                                 
---------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=508510.02..1129945.94 rows=999995 width=8) (actual time=33803.826..82416.342 rows=10000000 loops=1)
   Output: ft1.val, ft2.val
   Hash Cond: (ft1.val = ft2.val)
   ->  Foreign Scan on public.ft1  (cost=100.00..344347.31 rows=9999977 width=4) (actual time=0.624..28531.803 rows=10000000 loops=1)
         Output: ft1.val
         Remote SQL: SELECT val FROM public.lt
   ->  Hash  (cost=344347.31..344347.31 rows=9999977 width=4) (actual time=33258.025..33258.025 rows=10000000 loops=1)
         Output: ft2.val
         Buckets: 131072  Batches: 256  Memory Usage: 2400kB
         ->  Foreign Scan on public.ft2  (cost=100.00..344347.31 rows=9999977 width=4) (actual time=22.171..28134.970 rows=10000000 loops=1)
               Output: ft2.val
               Remote SQL: SELECT val FROM public.lt
 Planning time: 33.155 ms
 Execution time: 82914.607 ms
(14 rows)

 avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--------------+------------------+--------------+--------------
  78750.95487 | 2911.51825687913 |    74314.886 |    89358.464

Plan and timing with patch
EXPLAIN (VERBOSE, ANALYSE) :query ;
                                                                 QUERY PLAN                                                                 
---------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=200.86..1183070.86 rows=10000000 width=8) (actual time=1.776..73140.219 rows=10000000 loops=1)
   Output: ft1.val, ft2.val
   Merge Cond: (ft1.val = ft2.val)
   ->  Foreign Scan on public.ft1  (cost=100.43..504035.43 rows=10000000 width=4) (actual time=0.937..30422.457 rows=10000000 loops=1)
         Output: ft1.val, ft1.val2
         Remote SQL: SELECT val FROM public.lt ORDER BY val ASC
   ->  Materialize  (cost=100.43..529035.43 rows=10000000 width=4) (actual time=0.826..33448.822 rows=10000000 loops=1)
         Output: ft2.val, ft2.val2
         ->  Foreign Scan on public.ft2  (cost=100.43..504035.43 rows=10000000 width=4) (actual time=0.818..31035.362 rows=10000000 loops=1)
               Output: ft2.val, ft2.val2
               Remote SQL: SELECT val FROM public.lt ORDER BY val ASC
 Planning time: 163.161 ms
 Execution time: 73654.106 ms
(13 rows)

 avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--------------+------------------+--------------+--------------
  71881.15916 | 819.091605498189 |    70197.312 |    74653.314

It can be observed that the with the patch, merge join strategy is used instead of hash join and the execution time reduces by approx 9%. A desired effect is that the deviation in the execution time has reduced heavily (almost by 75%).

2. Join between local and foreign table

Without patch the plan and timings are
EXPLAIN (VERBOSE, ANALYSE) :query ;
                                                              QUERY PLAN                                                             
--------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=308410.66..1019846.69 rows=9999970 width=8) (actual time=7674.681..47767.136 rows=10000000 loops=1)
   Output: lt.val, ft1.val
   Hash Cond: (ft1.val = lt.val)
   ->  Foreign Scan on public.ft1  (cost=100.00..344347.55 rows=9999985 width=4) (actual time=0.506..26679.980 rows=10000000 loops=1)
         Output: ft1.val
         Remote SQL: SELECT val FROM public.lt
   ->  Hash  (cost=144247.85..144247.85 rows=9999985 width=4) (actual time=7667.598..7667.598 rows=10000000 loops=1)
         Output: lt.val
         Buckets: 131072  Batches: 256  Memory Usage: 2400kB
         ->  Seq Scan on public.lt  (cost=0.00..144247.85 rows=9999985 width=4) (actual time=0.018..2959.111 rows=10000000 loops=1)
               Output: lt.val
 Planning time: 8.668 ms
 Execution time: 48209.365 ms
(13 rows)

SELECT avg_exe_time, std_dev_exe_time, min_exe_time, max_exe_time
        FROM query_execution_stats(:'query', :num_samples);
 avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--------------+------------------+--------------+--------------
  47246.46956 | 2579.42041949119 |    43603.411 |    56096.759

With the patch the plan and timings are
EXPLAIN (VERBOSE, ANALYSE) :query ;
                                                                     QUERY PLAN                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=155.01..957924.85 rows=9999970 width=8) (actual time=0.592..45125.356 rows=10000000 loops=1)
   Output: lt.val, ft1.val
   Merge Cond: (ft1.val = lt.val)
   ->  Foreign Scan on public.ft1  (cost=100.43..504038.91 rows=9999985 width=4) (actual time=0.551..30526.048 rows=10000000 loops=1)
         Output: ft1.val, ft1.val2
         Remote SQL: SELECT val FROM public.lt ORDER BY val ASC
   ->  Index Only Scan using i_lt_val on public.lt  (cost=0.43..303939.21 rows=9999985 width=4) (actual time=0.032..6192.406 rows=10000000 loops=1)
         Output: lt.val
         Heap Fetches: 10000000
 Planning time: 9.043 ms
 Execution time: 45666.023 ms
(11 rows)


 avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--------------+------------------+--------------+--------------
  42803.36105 | 166.874491432755 |    42321.314 |    43316.902

Again observe that with the patch, merge join is used instead of hash join and timing reduces by approx 9%. Again the deviation in execution reduces heavily (almost by 75%). There is increase in planning time with the patch owing to firing EXPLAIN on the foreign server.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Attachment

Re: Getting sorted data from foreign server for merge join

From
Robert Haas
Date:
On Thu, Dec 17, 2015 at 3:32 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Wed, Dec 9, 2015 at 12:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Dec 2, 2015 at 6:45 AM, Rushabh Lathia <rushabh.lathia@gmail.com>
>> wrote:
>> > Thanks Ashutosh.
>> >
>> > Re-reviewed and Re-verified the patch, pg_sort_all_pd_v5.patch
>> > looks good to me.
>>
>> This patch needs a rebase.
>
> Done.

Thanks.

>> It's not going to work to say this is a patch proposed for commit when
>> it's still got a TODO comment in it that obviously needs to be
>> changed.   And the formatting of that long comment is pretty weird,
>> too, and not consistent with other functions in that same file (e.g.
>> get_remote_estimate, ec_member_matches_foreign, create_cursor).
>>
>
> The TODO was present in v4 but not in v5 and is not present in v6 attached
> here.. Formatted comment according estimate_path_cost_size(),
> convert_prep_stmt_params().

Hrm, I must have been looking at the wrong version somehow.  Sorry about that.

>> Aside from that, I think before we commit this, somebody should do
>> some testing that demonstrates that this is actually a good idea.  Not
>> as part of the test case set for this patch, but just in general.
>> Merge joins are typically going to be relevant for large tables, but
>> the examples in the regression tests are necessarily tiny.  I'd like
>> to see some sample data and some sample queries that get appreciably
>> faster with this code.  If we can't find any, we don't need the code.
>>
>
> I tested the patch on my laptop with two types of queries, a join between
> two foreign tables on different foreign servers (pointing to the same self
> server) and a join between one foreign and one local table. The foreign
> tables and servers are created using sort_pd_setup.sql attached. Foreign
> tables pointed to table with index useful for join clause. Both the joining
> tables had 10M rows. The execution time of query was measured for 100 runs
> and average and standard deviation were calculated (using function
> query_execution_stats() in script sort_pd.sql) and are presented below.

OK, cool.

I went over this patch in some detail today and did a lot of cosmetic
cleanup.  The results are attached.  I'm fairly happy with this
version, but let me know what you think.  Of course, feedback from
others is more than welcome also.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: Getting sorted data from foreign server for merge join

From
Rushabh Lathia
Date:


On Sat, Dec 19, 2015 at 2:17 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Dec 17, 2015 at 3:32 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Wed, Dec 9, 2015 at 12:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Dec 2, 2015 at 6:45 AM, Rushabh Lathia <rushabh.lathia@gmail.com>
>> wrote:
>> > Thanks Ashutosh.
>> >
>> > Re-reviewed and Re-verified the patch, pg_sort_all_pd_v5.patch
>> > looks good to me.
>>
>> This patch needs a rebase.
>
> Done.

Thanks.

>> It's not going to work to say this is a patch proposed for commit when
>> it's still got a TODO comment in it that obviously needs to be
>> changed.   And the formatting of that long comment is pretty weird,
>> too, and not consistent with other functions in that same file (e.g.
>> get_remote_estimate, ec_member_matches_foreign, create_cursor).
>>
>
> The TODO was present in v4 but not in v5 and is not present in v6 attached
> here.. Formatted comment according estimate_path_cost_size(),
> convert_prep_stmt_params().

Hrm, I must have been looking at the wrong version somehow.  Sorry about that.

>> Aside from that, I think before we commit this, somebody should do
>> some testing that demonstrates that this is actually a good idea.  Not
>> as part of the test case set for this patch, but just in general.
>> Merge joins are typically going to be relevant for large tables, but
>> the examples in the regression tests are necessarily tiny.  I'd like
>> to see some sample data and some sample queries that get appreciably
>> faster with this code.  If we can't find any, we don't need the code.
>>
>
> I tested the patch on my laptop with two types of queries, a join between
> two foreign tables on different foreign servers (pointing to the same self
> server) and a join between one foreign and one local table. The foreign
> tables and servers are created using sort_pd_setup.sql attached. Foreign
> tables pointed to table with index useful for join clause. Both the joining
> tables had 10M rows. The execution time of query was measured for 100 runs
> and average and standard deviation were calculated (using function
> query_execution_stats() in script sort_pd.sql) and are presented below.

OK, cool.

I went over this patch in some detail today and did a lot of cosmetic
cleanup.  The results are attached.  I'm fairly happy with this
version, but let me know what you think.  Of course, feedback from
others is more than welcome also.


Had a quick look at the patch changes and also performed basic
sanity check. Patch looks good to me.

Thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Rushabh Lathia

Re: Getting sorted data from foreign server for merge join

From
Ashutosh Bapat
Date:
I went over this patch in some detail today and did a lot of cosmetic
cleanup.  The results are attached.  I'm fairly happy with this
version, but let me know what you think.  Of course, feedback from
others is more than welcome also.


Attached patch with some cosmetic changes (listed here for your quick reference)
1. , was replaced with ; in comment "inner join, expressions in the " at one place, which is correct, but missed other place.
2. The comment "First, consider whether any each active EC is potentially" should use either "any" or "each". I have reworded it as "First, consider whether any of the active ECs is potentially ...". Or we can use "First, find all of the active ECs which are potentially ....".
3. "having the remote side due the sort generally won't be any worse ..." - instead of "due" we should use "do"?
4. Added static prototype of function get_useful_ecs_for_relation().
5. The comment "Extract unique EC for query, if any, so we don't consider it again." is too crisp. Phrase "Unique EC for query" is confusing; EC can not be associated with a query per say and EC's are always unique because of canonicalisation. May be we should reword it as "Extract single EC for ordering of query, if any, so we don't consider it again." Is that cryptic as well?

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Attachment

Re: Getting sorted data from foreign server for merge join

From
Robert Haas
Date:
On Mon, Dec 21, 2015 at 6:34 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>> I went over this patch in some detail today and did a lot of cosmetic
>> cleanup.  The results are attached.  I'm fairly happy with this
>> version, but let me know what you think.  Of course, feedback from
>> others is more than welcome also.
>>
>
> Attached patch with some cosmetic changes (listed here for your quick
> reference)
> 1. , was replaced with ; in comment "inner join, expressions in the " at one
> place, which is correct, but missed other place.
> 2. The comment "First, consider whether any each active EC is potentially"
> should use either "any" or "each". I have reworded it as "First, consider
> whether any of the active ECs is potentially ...". Or we can use "First,
> find all of the active ECs which are potentially ....".
> 3. "having the remote side due the sort generally won't be any worse ..." -
> instead of "due" we should use "do"?
> 4. Added static prototype of function get_useful_ecs_for_relation().
> 5. The comment "Extract unique EC for query, if any, so we don't consider it
> again." is too crisp. Phrase "Unique EC for query" is confusing; EC can not
> be associated with a query per say and EC's are always unique because of
> canonicalisation. May be we should reword it as "Extract single EC for
> ordering of query, if any, so we don't consider it again." Is that cryptic
> as well?

Thanks.  I committed this version with one small tweak.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Getting sorted data from foreign server for merge join

From
Ashutosh Bapat
Date:
<div dir="ltr">Thanks.<br /></div><div class="gmail_extra"><br /><div class="gmail_quote">On Wed, Dec 23, 2015 at 12:24
AM,Robert Haas <span dir="ltr"><<a href="mailto:robertmhaas@gmail.com"
target="_blank">robertmhaas@gmail.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex"><span class="">On Mon, Dec 21, 2015 at 6:34 AM, Ashutosh Bapat<br />
<<ahref="mailto:ashutosh.bapat@enterprisedb.com">ashutosh.bapat@enterprisedb.com</a>> wrote:<br /> >> I
wentover this patch in some detail today and did a lot of cosmetic<br /> >> cleanup.  The results are attached. 
I'mfairly happy with this<br /> >> version, but let me know what you think.  Of course, feedback from<br />
>>others is more than welcome also.<br /> >><br /> ><br /> > Attached patch with some cosmetic
changes(listed here for your quick<br /> > reference)<br /> > 1. , was replaced with ; in comment "inner join,
expressionsin the " at one<br /> > place, which is correct, but missed other place.<br /> > 2. The comment
"First,consider whether any each active EC is potentially"<br /> > should use either "any" or "each". I have
rewordedit as "First, consider<br /> > whether any of the active ECs is potentially ...". Or we can use "First,<br
/>> find all of the active ECs which are potentially ....".<br /> > 3. "having the remote side due the sort
generallywon't be any worse ..." -<br /> > instead of "due" we should use "do"?<br /> > 4. Added static prototype
offunction get_useful_ecs_for_relation().<br /> > 5. The comment "Extract unique EC for query, if any, so we don't
considerit<br /> > again." is too crisp. Phrase "Unique EC for query" is confusing; EC can not<br /> > be
associatedwith a query per say and EC's are always unique because of<br /> > canonicalisation. May be we should
rewordit as "Extract single EC for<br /> > ordering of query, if any, so we don't consider it again." Is that
cryptic<br/> > as well?<br /><br /></span>Thanks.  I committed this version with one small tweak.<br /><div
class="HOEnZb"><divclass="h5"><br /> --<br /> Robert Haas<br /> EnterpriseDB: <a href="http://www.enterprisedb.com"
rel="noreferrer"target="_blank">http://www.enterprisedb.com</a><br /> The Enterprise PostgreSQL Company<br
/></div></div></blockquote></div><br/><br clear="all" /><br />-- <br /><div class="gmail_signature"><div dir="ltr">Best
Wishes,<br/>Ashutosh Bapat<br />EnterpriseDB Corporation<br />The Postgres Database Company<br /></div></div></div> 

Re: Getting sorted data from foreign server for merge join

From
Ashutosh Bapat
Date:
Hi,
In get_useful_ecs_for_relation(), while checking whether to use left or right argument of a mergejoinable operator, the arguments to bms_is_subset() are passed in reverse order. bms_is_subset() checks whether the first argument in subset of the second, but in this function the subset to be checked is passed as the second argument. Because of this following query when run in contrib_regression database after "make installcheck" in contrib/postgres_fdw trips assertion Assert(bms_is_subset(relids, restrictinfo->left_ec->ec_relids));

EXPLAIN (COSTS false, VERBOSE)
    SELECT t1."C 1" FROM "S 1"."T 1" t1 left join ft1 t2 join ft2 t3 on (t2.c1 = t3.c1) on (t3.c1 = t1."C 1") OFFSET 100 LIMIT 10;

PFA patch to fix it.

Reason why it was not caught earlier: this code is excercised when expressions in left/right join clauses are considered. For mergejoinable clauses, the left and right side are not merged into a single EC but appear as separate ECs. All tests in the postgres_fdw.sql that excercised this code involved only two relations, thus ec_relids and relids had only a single member and bms_is_subset() returned true. But in the above query the left/right EC has relids of t2 and t3, which caused bms_is_subset() to return false and thus trip the assertion.

I have added above query and the output to the tests. The output of EXPLAIN shows that ORDER BY clause is pushed down for ft2 but not ft1. This is because ft2 has use_remote_estimate true and ft1 has that false. So, we push down ORDER BY corresponding merge join for ft2 but not for ft1.

On Wed, Dec 23, 2015 at 12:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Dec 21, 2015 at 6:34 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>> I went over this patch in some detail today and did a lot of cosmetic
>> cleanup.  The results are attached.  I'm fairly happy with this
>> version, but let me know what you think.  Of course, feedback from
>> others is more than welcome also.
>>
>
> Attached patch with some cosmetic changes (listed here for your quick
> reference)
> 1. , was replaced with ; in comment "inner join, expressions in the " at one
> place, which is correct, but missed other place.
> 2. The comment "First, consider whether any each active EC is potentially"
> should use either "any" or "each". I have reworded it as "First, consider
> whether any of the active ECs is potentially ...". Or we can use "First,
> find all of the active ECs which are potentially ....".
> 3. "having the remote side due the sort generally won't be any worse ..." -
> instead of "due" we should use "do"?
> 4. Added static prototype of function get_useful_ecs_for_relation().
> 5. The comment "Extract unique EC for query, if any, so we don't consider it
> again." is too crisp. Phrase "Unique EC for query" is confusing; EC can not
> be associated with a query per say and EC's are always unique because of
> canonicalisation. May be we should reword it as "Extract single EC for
> ordering of query, if any, so we don't consider it again." Is that cryptic
> as well?

Thanks.  I committed this version with one small tweak.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Attachment

Re: Getting sorted data from foreign server for merge join

From
Robert Haas
Date:
On Thu, Jan 7, 2016 at 4:05 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> In get_useful_ecs_for_relation(), while checking whether to use left or
> right argument of a mergejoinable operator, the arguments to bms_is_subset()
> are passed in reverse order. bms_is_subset() checks whether the first
> argument in subset of the second, but in this function the subset to be
> checked is passed as the second argument. Because of this following query
> when run in contrib_regression database after "make installcheck" in
> contrib/postgres_fdw trips assertion Assert(bms_is_subset(relids,
> restrictinfo->left_ec->ec_relids));
>
> EXPLAIN (COSTS false, VERBOSE)
>     SELECT t1."C 1" FROM "S 1"."T 1" t1 left join ft1 t2 join ft2 t3 on
> (t2.c1 = t3.c1) on (t3.c1 = t1."C 1") OFFSET 100 LIMIT 10;
>
> PFA patch to fix it.

The test case failed for me, possibly because of Tom's upper planner
pathification, but the substantive part of the fix looks right to me,
so committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company