Re: Getting sorted data from foreign server for merge join - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Getting sorted data from foreign server for merge join
Date
Msg-id CAFjFpRfiU43-w+pKf2SPsE6Z+iSi570ruUM9WSKtJy6Fb9HyHg@mail.gmail.com
Whole thread Raw
In response to Re: Getting sorted data from foreign server for merge join  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Getting sorted data from foreign server for merge join  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Thanks Robert for your comments. Please see my replies inlined. Updated patch is attached.

On Fri, Nov 6, 2015 at 10:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think this approach is generally reasonable, but I suggested parts
of it, so may be biased.  I would be interested in hearing the
opinions of others.

Random notes:

"possibily" is a typo.

Fixed.
 

usable_pklist is confusing because it seems like it might be talking
about primary keys rather than pathkeys.  Also, I realize now, looking
at this again, that you're saying "usable" when what I really think
you mean is "useful". Lots of pathkeys are usable, but only a few of
those are useful.  I suggest renaming usable_pathkeys to
query_pathkeys

Done.
 
and usable_pklist to useful_pathkeys. 

pathkeys is being used to mean a list of pathkey nodes. What we want here is list of pathkeys so I have renamed it as useful_pathkeys_list, somewhat long but clear.
 
Similarly, let's
rename generate_pathkeys_for_relation() to
get_useful_pathkeys_for_relation().

Done.
 

Although I'm usually on the side of marking things as extern whenever
we find it convenient, I'm nervous about doing that to
make_canonical_pathkey(), because it has side effects.  Searching the
list of canonical pathkeys for the one we need is reasonable, but is
it really right to ever think that we might create a new one at this
stage?  Maybe it is, but if so I'd like to hear a good explanation as
to why.

For a foreign table no pathkeys are created before creating paths for individual foreign table since there are no indexes. For a regular table, the pathkeys get created for useful indexes. Exception to this is if the expression appears in the ORDER BY clause, the pathkey for the same is created while handling ORDER BY clause. So, we will always need to create pathkey for a foreign table unless the corresponding expression does not appear in the ORDER BY clause. This can be verified by breaking in make_canonical_pathkey() while executing "explain verbose select * from ft1 join lt using (val);" ft1(val int, val2 int) is foreign table and lt (val int, val2 int) is local table. You will hit the first breakpoint with stack trace
Breakpoint 1, make_canonical_pathkey (root=0x231d858, eclass=0x231f030, opfamily=1976, strategy=1, nulls_first=0 '\000') at pathkeys.c:60
(gdb) where
#0  make_canonical_pathkey (root=0x231d858, eclass=0x231f030, opfamily=1976, strategy=1, nulls_first=0 '\000') at pathkeys.c:60
#1  0x00007f6740077e39 in get_useful_pathkeys_for_relation (root=0x231d858, rel=0x231e390) at postgres_fdw.c:663
#2  0x00007f6740077f34 in postgresGetForeignPaths (root=0x231d858, baserel=0x231e390, foreigntableid=16393) at postgres_fdw.c:705
#3  0x00000000007079cf in set_foreign_pathlist (root=0x231d858, rel=0x231e390, rte=0x231c488) at allpaths.c:600
#4  0x0000000000707653 in set_rel_pathlist (root=0x231d858, rel=0x231e390, rti=1, rte=0x231c488) at allpaths.c:395
#5  0x000000000070735f in set_base_rel_pathlists (root=0x231d858) at allpaths.c:277

at this point

(gdb) p root->canon_pathkeys
$1 = (List *) 0x0

so get_useful_pathkeys_for_relation->make_canonical_pathkey() will create the first pathkey.

I have left the corresponding TODO untouched, if anybody else wants to review that part of the code. I will remove it in the final version of the patch.
 

 Is the comment "Equivalence classes covering relations other than the
current one are of interest here" missing a "not"?

The sentence is correct. We need equivalence class covering relations other than the current one, because only such equivalence classes indicate joins between two relations. If an equivalence class covers only a single baserel (has only a single member in ec_relids), it indicates equivalence between columns/expressions of the same table, which can not result in a join. I have changed to comment to be more elaborate.
 

I don't find this comment illuminating:

+         * In case of child relation, we need to check that the
+         * equivalence class indicates a join to a relation other than
+         * parents, other children and itself (something similar to above).
+         * Otherwise we will end up creating useless paths. The code below is
+         * similar to generate_implied_equalities_for_column(), which might
+         * give a hint.

That basically just says that we have to do it this way because the
other way would be wrong.  But it doesn't say WHY the other way would
be wrong.

There's no "other way" which is wrong. What's the other way you are talking about?

Anway, I have updated the comment to be more readable.
 
Then a few lines later, you have another comment which says
the same thing again:

+                /*
+                 * Ignore equivalence members which correspond to children
+                 * or same relation or to parent relations
+                 */


Updated this too.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Attachment

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Speed up Clog Access by increasing CLOG buffers
Next
From: Amit Kapila
Date:
Subject: Re: Speed up Clog Access by increasing CLOG buffers