Thread: Pushing down sorted joins
Hi All,
Now that we have join pushdown support in postgres_fdw, we can leverage the sort pushdown mechanism for base relations to work for pushed down joins as well. PFA patch for the same.
The code to find useful set of pathkeys and then generate paths for each list of pathkeys is moved into a function which is called for base relations and join relations, while creating respective paths. The useful pathkeys are same as the base relation i.e. root->query_pathkeys and pathkeys useful for merge join as discussed in [1].
I measured performance of pushing down sort for merge joins for query
SELECT lt1.val, ft1.val, ft2.val FROM lt1 join (ft1 join ft2 on (ft1.val = ft2.val)) on (lt1.val = ft1.val) where ft1, ft2 are foreign tables, join between which gets pushed down to the foreign server and lt is the local table.
SELECT lt1.val, ft1.val, ft2.val FROM lt1 join (ft1 join ft2 on (ft1.val = ft2.val)) on (lt1.val = ft1.val) where ft1, ft2 are foreign tables, join between which gets pushed down to the foreign server and lt is the local table.
Without the patch servers prefers local merge join between foreign tables followed by merge join with local table by getting the data sorted from the foreign server. But with the patch, it pushes down the foreign join and also gets the data sorted for local merge join. The times measured over 10 runs of query with and without patch are
With patch
avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--------------+------------------+--------------+--------------
60310.0369 | 251.075471210925 | 59895.064 | 60746.496
avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--------------+------------------+--------------+--------------
60310.0369 | 251.075471210925 | 59895.064 | 60746.496
Without patch
avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--------------+------------------+--------------+--------------
86396.6001 | 254.30988131848 | 85906.606 | 86742.311
avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--------------+------------------+--------------+--------------
86396.6001 | 254.30988131848 | 85906.606 | 86742.311
With the patch the execution time of the query reduces by 30%.
The scripts to setup and run query and outputs of running query with and without patch are attached.
[1] http://www.postgresql.org/message-id/CAFjFpRfeKHiCmwJ72p4=ZvuZRQsaU9tbfyW7vwr-5PPvrCbcQQ@mail.gmail.com
--
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Attachment
Rushabh pointed out that declarations of helper functions get_useful_ecs_for_relation and get_useful_pathkeys_for_relation() are part of FDW routines declarations rather than helper function declaration. Since those functions are related to this patch, the attached patch moves those declaration in their right place.
On Wed, Feb 17, 2016 at 5:37 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
Hi All,Now that we have join pushdown support in postgres_fdw, we can leverage the sort pushdown mechanism for base relations to work for pushed down joins as well. PFA patch for the same.The code to find useful set of pathkeys and then generate paths for each list of pathkeys is moved into a function which is called for base relations and join relations, while creating respective paths. The useful pathkeys are same as the base relation i.e. root->query_pathkeys and pathkeys useful for merge join as discussed in [1].I measured performance of pushing down sort for merge joins for query
SELECT lt1.val, ft1.val, ft2.val FROM lt1 join (ft1 join ft2 on (ft1.val = ft2.val)) on (lt1.val = ft1.val) where ft1, ft2 are foreign tables, join between which gets pushed down to the foreign server and lt is the local table.Without the patch servers prefers local merge join between foreign tables followed by merge join with local table by getting the data sorted from the foreign server. But with the patch, it pushes down the foreign join and also gets the data sorted for local merge join. The times measured over 10 runs of query with and without patch areWith patch
avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--------------+------------------+--------------+--------------
60310.0369 | 251.075471210925 | 59895.064 | 60746.496Without patch
avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--------------+------------------+--------------+--------------
86396.6001 | 254.30988131848 | 85906.606 | 86742.311With the patch the execution time of the query reduces by 30%.The scripts to setup and run query and outputs of running query with and without patch are attached.[1] http://www.postgresql.org/message-id/CAFjFpRfeKHiCmwJ72p4=ZvuZRQsaU9tbfyW7vwr-5PPvrCbcQQ@mail.gmail.com
--Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Attachment
On Tue, Feb 23, 2016 at 7:48 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > Rushabh pointed out that declarations of helper functions > get_useful_ecs_for_relation and get_useful_pathkeys_for_relation() are part > of FDW routines declarations rather than helper function declaration. Since > those functions are related to this patch, the attached patch moves those > declaration in their right place. This patch needs to be rebased. + /* + * TODO: we should worry about EPQ path but should that path have + * pathkeys? I guess, that's not really important since it's just + * going to evaluate the join from whole-row references stuffed in the + * corresponding EPQ slots, for which the order doesn't matter. + */ The pathkeys for the EPQ path don't matter. It'll only be called to recheck one single row, and there's only one order in which you can return one row. - if (bms_equal(em->em_relids, rel->relids)) + if (bms_is_subset(em->em_relids, rel->relids)) Why do we need this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
This patch needs to be rebased.
Done.
+ /*
+ * TODO: we should worry about EPQ path but should
that path have
+ * pathkeys? I guess, that's not really important
since it's just
+ * going to evaluate the join from whole-row
references stuffed in the
+ * corresponding EPQ slots, for which the order doesn't matter.
+ */
The pathkeys for the EPQ path don't matter. It'll only be called to
recheck one single row, and there's only one order in which you can
return one row.
Right. Removed the TODO
- if (bms_equal(em->em_relids, rel->relids))
+ if (bms_is_subset(em->em_relids, rel->relids))
Why do we need this?
The function find_em_expr_for_rel() find an equivalence member expression that has all its Vars come from the given relation. It's not necessary that it will have Vars from relids that are covered by the given relations. E.g. in query SELECT A.c1, B.c2 FROM A join B ON ... ORDER BY A.c3, there will be a single equivalence member A.c3 in the pathkeys and em_relids will indicate only A. Hence instead of equal, (which used to be OK for single relation join push-down) we have to use subset operation. We want an equivalence members whose relids are subset of relids contained by given relation.
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Attachment
On Wed, Mar 9, 2016 at 2:23 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > [ new patch ] This looks OK to me. Committed! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Mar 9, 2016 at 9:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:
--
On Wed, Mar 9, 2016 at 2:23 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> [ new patch ]
This looks OK to me. Committed!
Thanks.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company