Thread: Push down more full joins in postgres_fdw

Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
Hi,

The postgres_fdw join pushdown in 9.6 is great, but it can't handle full
joins on relations with restrictions.  The reason for that is,
postgres_fdw can't support deparsing subqueries when creating a remote
join query.  So, by adding the deparsing logic to it, I removed that
limitation.  Attached is a patch for that, which is based on the patch I
posted before [1].

Also, by the deparsing logic, I improved the handling of PHVs so that
PHVs are evaluated remotely if it's safe, as discussed in [2].  Here is
an example from the regression test, which pushes down multiple levels
of PHVs to the remote:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT ss.*, ft2.c1 FROM ft2 LEFT JOIN (SELECT 13, q.a, ft2.c1 FROM
(SELECT 13 FROM ft1 WHERE c1 = 13) q(a) RIGHT JOIN ft2 ON (q.a = ft2.c1)
WHERE ft2.\
c1 BETWEEN 10 AND 15) ss(f1, f2, f3) ON (ft2.c1 = ss.f1) WHERE ft2.c1
BETWEEN 10 AND 15;
               \
                            QUERY PLAN             \


-------------------------------------------------------------------------------------------------------------------------------------------------------\

-------------------------------------------------------------------------------------------------------------------------------------------------------\
---------------------------------------------------------------
   Foreign Scan
     Output: (13), (13), ft2_1.c1, ft2.c1
     Relations: (public.ft2) LEFT JOIN ((public.ft2) LEFT JOIN (public.ft1))
     Remote SQL: SELECT r1."C 1", ss2.c1, ss2.c2, ss2.c3 FROM ("S 1"."T
1" r1 LEFT JOIN (SELECT r5."C 1", 13, ss1.c1 FROM ("S 1"."T 1" r5 LEFT
JOIN (SELE\
CT 13 FROM "S 1"."T 1" WHERE (("C 1" = 13))) ss1(c1) ON (((13 = r5."C
1")))) WHERE ((r5."C 1" >= 10)) AND ((r5."C 1" <= 15))) ss2(c1, c2, c3)
ON (((r1.\
"C 1" = 13)))) WHERE ((r1."C 1" >= 10)) AND ((r1."C 1" <= 15))
(4 rows)

Also, in the same way as the PHV handling, I improved the handling of
whole-row reference (and system columns other than ctid), as proposed in
[3].  We don't need the ""CASE WHEN ... IS NOT NULL THEN ROW(...) END"
conditions any more, as shown in the below example:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)
ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
               \
                          QUERY PLAN             \


-------------------------------------------------------------------------------------------------------------------------------------------------------\

-------------------------------------------------------------------------------------------------------------------------------------------------------\
-----------------------------------------------------------
   Limit
     Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
     ->  Foreign Scan
           Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
           Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
           Remote SQL: SELECT ss1.c1, ss1.c2, ss1.c3, ss1.c4, ss2.c1
FROM ((SELECT ctid, ROW("C 1", c2, c3, c4, c5, c6, c7, c8), "C 1", c3
FROM "S 1"."T \
1") ss1(c1, c2, c3, c4) INNER JOIN (SELECT ROW("C 1", c2, c3, c4, c5,
c6, c7, c8), "C 1" FROM "S 1"."T 1") ss2(c1, c2) ON (TRUE)) WHERE
((ss1.c3 = ss2.\
c2)) ORDER BY ss1.c4 ASC NULLS LAST, ss1.c3 ASC NULLS LAST
(6 rows)

I'll add this to the next CF.  Comments are welcome!

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5710D7E2.7010302%40lab.ntt.co.jp
[2]
https://www.postgresql.org/message-id/b4549406-909f-7d15-dc34-499835a8f0b3%40lab.ntt.co.jp
[3]
https://www.postgresql.org/message-id/a1fa1c4c-bf96-8ea5-cff5-85b927298e73%40lab.ntt.co.jp

Attachment

Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
Thanks Fujita-san for working on this. I took a quick look at the patch. Here are some quick comments.

1. deparsePlaceHolderVar looks odd - each of the deparse* function is named as deparse + <name of the parser node the string would parse into>. PlaceHolderVar is not a parser node, so no string is going to be parsed as a PlaceHolderVar. May be you want to name it as deparseExerFromPlaceholderVar or something like that.

2. You will need to check phlevelsup member while assessing whether a PHV is safe to push down.

3. I think registerAlias stuff should happen really at the time of creating paths and should be stored in fpinfo. Without that it will be computed every time we deparse the query. This means every time we try to EXPLAIN the query at various levels in the join tree and once for the query to be sent to the foreign server.

4. The changes related to retrieved_attrs look unrelated to the patch. Either your patch should use the current method of handling retrieved_attrs or there should be a separate patch for retrieved_attrs changes. May be you want to take a look at the discussion in join pushdown thread as to why we assume retrieved_attrs to be non-NIL always.

5. The blocks related to inner and outer relations in deparseFromExprForRel() look same. We should probably separate that code out into a function and call it at two places.

6.
!     if (is_placeholder)
!         errcontext("placeholder expression at position %d in select list",
!                    errpos->cur_attno);
A user wouldn't know what a placeholder expression is, there is no such term in the documentation. We have to device a better way to provide an error context for an expression in general.




On Fri, Aug 19, 2016 at 2:50 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
Hi,

The postgres_fdw join pushdown in 9.6 is great, but it can't handle full joins on relations with restrictions.  The reason for that is, postgres_fdw can't support deparsing subqueries when creating a remote join query.  So, by adding the deparsing logic to it, I removed that limitation.  Attached is a patch for that, which is based on the patch I posted before [1].

Also, by the deparsing logic, I improved the handling of PHVs so that PHVs are evaluated remotely if it's safe, as discussed in [2].  Here is an example from the regression test, which pushes down multiple levels of PHVs to the remote:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT ss.*, ft2.c1 FROM ft2 LEFT JOIN (SELECT 13, q.a, ft2.c1 FROM (SELECT 13 FROM ft1 WHERE c1 = 13) q(a) RIGHT JOIN ft2 ON (q.a = ft2.c1) WHERE ft2.\
c1 BETWEEN 10 AND 15) ss(f1, f2, f3) ON (ft2.c1 = ss.f1) WHERE ft2.c1 BETWEEN 10 AND 15;
              \
                           QUERY PLAN             \

-------------------------------------------------------------------------------------------------------------------------------------------------------\
-------------------------------------------------------------------------------------------------------------------------------------------------------\
---------------------------------------------------------------
  Foreign Scan
    Output: (13), (13), ft2_1.c1, ft2.c1
    Relations: (public.ft2) LEFT JOIN ((public.ft2) LEFT JOIN (public.ft1))
    Remote SQL: SELECT r1."C 1", ss2.c1, ss2.c2, ss2.c3 FROM ("S 1"."T 1" r1 LEFT JOIN (SELECT r5."C 1", 13, ss1.c1 FROM ("S 1"."T 1" r5 LEFT JOIN (SELE\
CT 13 FROM "S 1"."T 1" WHERE (("C 1" = 13))) ss1(c1) ON (((13 = r5."C 1")))) WHERE ((r5."C 1" >= 10)) AND ((r5."C 1" <= 15))) ss2(c1, c2, c3) ON (((r1.\
"C 1" = 13)))) WHERE ((r1."C 1" >= 10)) AND ((r1."C 1" <= 15))
(4 rows)

Also, in the same way as the PHV handling, I improved the handling of whole-row reference (and system columns other than ctid), as proposed in [3].  We don't need the ""CASE WHEN ... IS NOT NULL THEN ROW(...) END" conditions any more, as shown in the below example:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
              \
                         QUERY PLAN             \

-------------------------------------------------------------------------------------------------------------------------------------------------------\
-------------------------------------------------------------------------------------------------------------------------------------------------------\
-----------------------------------------------------------
  Limit
    Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
    ->  Foreign Scan
          Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
          Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
          Remote SQL: SELECT ss1.c1, ss1.c2, ss1.c3, ss1.c4, ss2.c1 FROM ((SELECT ctid, ROW("C 1", c2, c3, c4, c5, c6, c7, c8), "C 1", c3 FROM "S 1"."T \
1") ss1(c1, c2, c3, c4) INNER JOIN (SELECT ROW("C 1", c2, c3, c4, c5, c6, c7, c8), "C 1" FROM "S 1"."T 1") ss2(c1, c2) ON (TRUE)) WHERE ((ss1.c3 = ss2.\
c2)) ORDER BY ss1.c4 ASC NULLS LAST, ss1.c3 ASC NULLS LAST
(6 rows)

I'll add this to the next CF.  Comments are welcome!

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5710D7E2.7010302%40lab.ntt.co.jp
[2] https://www.postgresql.org/message-id/b4549406-909f-7d15-dc34-499835a8f0b3%40lab.ntt.co.jp
[3] https://www.postgresql.org/message-id/a1fa1c4c-bf96-8ea5-cff5-85b927298e73%40lab.ntt.co.jp


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers




--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
Hi Ashutosh,

On 2016/08/22 15:49, Ashutosh Bapat wrote:
> 1. deparsePlaceHolderVar looks odd - each of the deparse* function is
> named as deparse + <name of the parser node the string would parse
> into>. PlaceHolderVar is not a parser node, so no string is going to be
> parsed as a PlaceHolderVar. May be you want to name it as
> deparseExerFromPlaceholderVar or something like that.

The name "deparseExerFromPlaceholderVar" seems long to me.  How about
"deparsePlaceHolderExpr"?

> 2. You will need to check phlevelsup member while assessing whether a
> PHV is safe to push down.

Good catch!  In addition to that, I added another check that the eval_at
set for the PHV should be included in the relids set for the foreign
relation.  I think that would make the shippability check more robust.

> 3. I think registerAlias stuff should happen really at the time of
> creating paths and should be stored in fpinfo. Without that it will be
> computed every time we deparse the query. This means every time we try
> to EXPLAIN the query at various levels in the join tree and once for the
> query to be sent to the foreign server.

Hmm.  I think the overhead in calculating aliases would be negligible
compared to the overhead in explaining each remote query for costing or
sending the remote query for execution.  So, I created aliases in the
same way as remote params created and stored into params_list in
deparse_expr_cxt.  I'm not sure it's worth complicating the code.

> 4. The changes related to retrieved_attrs look unrelated to the patch.
> Either your patch should use the current method of handling
> retrieved_attrs or there should be a separate patch for retrieved_attrs
> changes. May be you want to take a look at the discussion in join
> pushdown thread as to why we assume retrieved_attrs to be non-NIL always.

OK, I removed those changes from the patch.

> 5. The blocks related to inner and outer relations in
> deparseFromExprForRel() look same. We should probably separate that code
> out into a function and call it at two places.

Done.

> 6.
> !     if (is_placeholder)
> !         errcontext("placeholder expression at position %d in select list",
> !                    errpos->cur_attno);
> A user wouldn't know what a placeholder expression is, there is no such
> term in the documentation. We have to device a better way to provide an
> error context for an expression in general.

Though I proposed that, I don't think that it's that important to let
users know that the expression is from a PHV.  How about just saying
"expression", not "placeholder expression", so that we have the message
"expression at position %d in select list" in the context?

Attached is an updated version of the patch.

Other changes:

* Add a bit more regression test
* Revise code/comments
* Cleanups

Thanks for the comments!

Best regards,
Etsuro Fujita

Attachment

Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:


On Fri, Sep 2, 2016 at 3:55 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
Hi Ashutosh,

On 2016/08/22 15:49, Ashutosh Bapat wrote:
1. deparsePlaceHolderVar looks odd - each of the deparse* function is
named as deparse + <name of the parser node the string would parse
into>. PlaceHolderVar is not a parser node, so no string is going to be
parsed as a PlaceHolderVar. May be you want to name it as
deparseExerFromPlaceholderVar or something like that.

The name "deparseExerFromPlaceholderVar" seems long to me.  How about "deparsePlaceHolderExpr"?

There isn't any node with name PlaceHolderExpr.


2. You will need to check phlevelsup member while assessing whether a
PHV is safe to push down.

Good catch!  In addition to that, I added another check that the eval_at set for the PHV should be included in the relids set for the foreign relation.  I think that would make the shippability check more robust.

Thanks.
 

3. I think registerAlias stuff should happen really at the time of
creating paths and should be stored in fpinfo. Without that it will be
computed every time we deparse the query. This means every time we try
to EXPLAIN the query at various levels in the join tree and once for the
query to be sent to the foreign server.

Hmm.  I think the overhead in calculating aliases would be negligible compared to the overhead in explaining each remote query for costing or sending the remote query for execution.  So, I created aliases in the same way as remote params created and stored into params_list in deparse_expr_cxt.  I'm not sure it's worth complicating the code.

We defer remote parameter creation till deparse time since the the parameter numbers are dependent upon the sequence in which we deparse the query. Creating them at the time of path creation and storing them in fpinfo is not possible because a. those present in the joining relations will conflict with each other and need some kind of serialization at the time of deparsing b. those defer for differently parameterized paths or paths with different pathkeys. We don't defer it because it's very light on performance.

That's not true with the alias information. As long as we detect which relations need subqueries, their RTIs are enough to create unique aliases e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery can have alias r123 without conflicting with any other alias.

However minimum overhead it might have, we will deparse the query every time we create a path involving that kind of relation i.e. for different pathkeys, different parameterization and different joins in which that relation participates in. This number can be significant. We want to avoid this overhead if we can.
 

5. The blocks related to inner and outer relations in
deparseFromExprForRel() look same. We should probably separate that code
out into a function and call it at two places.

Done.

Thanks. I see you have created function deparseOperandRelation() for the same. I guess, this should be renamed as deparseRangeTblRef() since it will create RangeTblRef node when parsed back.
 

6.
!     if (is_placeholder)
!         errcontext("placeholder expression at position %d in select list",
!                    errpos->cur_attno);
A user wouldn't know what a placeholder expression is, there is no such
term in the documentation. We have to device a better way to provide an
error context for an expression in general.

Though I proposed that, I don't think that it's that important to let users know that the expression is from a PHV.  How about just saying "expression", not "placeholder expression", so that we have the message "expression at position %d in select list" in the context?

Hmm, that's better than placeholder expression, but not as explanatory as it should be since we won't be printing the "select list" in the error context and user won't have a clue as to what error context is about. But I don't have any good suggestions right now. May be we should print the whole expression? But that can be very long, I don't know.

This patch tries to do two things at a time 1. support join pushdown for FULL join when the joining relations have remote conditions 2. better support for fetching placeholder vars, whole row references and some system columns. To make reviews easy, I think we should split the patch into two 1. supporting subqueries to be deparsed with support for one of the above (I will suggest FULL join support) 2. support for the other. 

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/09/06 22:07, Ashutosh Bapat wrote:
> On Fri, Sep 2, 2016 at 3:55 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:

>     On 2016/08/22 15:49, Ashutosh Bapat wrote:
>         1. deparsePlaceHolderVar looks odd - each of the deparse*
>         function is
>         named as deparse + <name of the parser node the string would parse
>         into>. PlaceHolderVar is not a parser node, so no string is
>         going to be
>         parsed as a PlaceHolderVar. May be you want to name it as
>         deparseExerFromPlaceholderVar or something like that.

>     The name "deparseExerFromPlaceholderVar" seems long to me.  How
>     about "deparsePlaceHolderExpr"?

> There isn't any node with name PlaceHolderExpr.

I'll rename it to "deparseExerInPlaceholderVar", then.

>         3. I think registerAlias stuff should happen really at the time of
>         creating paths and should be stored in fpinfo. Without that it
>         will be
>         computed every time we deparse the query. This means every time
>         we try
>         to EXPLAIN the query at various levels in the join tree and once
>         for the
>         query to be sent to the foreign server.

>     Hmm.  I think the overhead in calculating aliases would be
>     negligible compared to the overhead in explaining each remote query
>     for costing or sending the remote query for execution.  So, I
>     created aliases in the same way as remote params created and stored
>     into params_list in deparse_expr_cxt.  I'm not sure it's worth
>     complicating the code.

> We defer remote parameter creation till deparse time since the the
> parameter numbers are dependent upon the sequence in which we deparse
> the query. Creating them at the time of path creation and storing them
> in fpinfo is not possible because a. those present in the joining
> relations will conflict with each other and need some kind of
> serialization at the time of deparsing b. those defer for differently
> parameterized paths or paths with different pathkeys. We don't defer it
> because it's very light on performance.
>
> That's not true with the alias information. As long as we detect which
> relations need subqueries, their RTIs are enough to create unique
> aliases e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery
> can have alias r123 without conflicting with any other alias.

Hmm.  But another concern about the approach of generating an subselect 
alias for a path, if needed, at the path creation time would be that the 
path might be rejected by add_path, which would result in totally 
wasting cycles for generating the subselect alias for the path.

> However minimum overhead it might have, we will deparse the query every
> time we create a path involving that kind of relation i.e. for different
> pathkeys, different parameterization and different joins in which that
> relation participates in. This number can be significant. We want to
> avoid this overhead if we can.

Exactly.  As I said above, I think the overhead would be negligible 
compared to the overhead in explaining each remote query for costing or 
the overhead in sending the final remote query for execution, though.

>         5. The blocks related to inner and outer relations in
>         deparseFromExprForRel() look same. We should probably separate
>         that code
>         out into a function and call it at two places.

>     Done.

> I see you have created function deparseOperandRelation() for the
> same. I guess, this should be renamed as deparseRangeTblRef() since it
> will create RangeTblRef node when parsed back.

OK, if there no opinions of others, I'll rename it to the name proposed 
by you, "deparseRangeTblRef".

>         6.
>         !     if (is_placeholder)
>         !         errcontext("placeholder expression at position %d in
>         select list",
>         !                    errpos->cur_attno);
>         A user wouldn't know what a placeholder expression is, there is
>         no such
>         term in the documentation. We have to device a better way to
>         provide an
>         error context for an expression in general.

>     Though I proposed that, I don't think that it's that important to
>     let users know that the expression is from a PHV.  How about just
>     saying "expression", not "placeholder expression", so that we have
>     the message "expression at position %d in select list" in the context?

> Hmm, that's better than placeholder expression, but not as explanatory
> as it should be since we won't be printing the "select list" in the
> error context and user won't have a clue as to what error context is
> about.

I don't think so.  Consider an example of the conversion error message, 
which is from the regression test:

SELECT  ft1.c1,  ft2.c2, ft1 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND 
ft1.c1 = 1;
ERROR:  invalid input syntax for integer: "foo"
CONTEXT:  whole-row reference to foreign table "ft1"

As shown in the example, the error message is displayed under a remote 
query for execution.  So, ISTM it's reasonable to print something like 
"expression at position %d in select list" in the context if an 
expression in a PHV.

> But I don't have any good suggestions right now. May be we should
> print the whole expression? But that can be very long, I don't know.

ISTM that it's a bit too expensive to print the whole expression in the 
error context.

> This patch tries to do two things at a time 1. support join pushdown for
> FULL join when the joining relations have remote conditions 2. better
> support for fetching placeholder vars, whole row references and some
> system columns. To make reviews easy, I think we should split the patch
> into two 1. supporting subqueries to be deparsed with support for one of
> the above (I will suggest FULL join support) 2. support for the other.

OK, will try.

Best regards,
Etsuro Fujita





Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/09/08 19:51, Etsuro Fujita wrote:
> On 2016/09/06 22:07, Ashutosh Bapat wrote:
>> This patch tries to do two things at a time 1. support join pushdown for
>> FULL join when the joining relations have remote conditions 2. better
>> support for fetching placeholder vars, whole row references and some
>> system columns. To make reviews easy, I think we should split the patch
>> into two 1. supporting subqueries to be deparsed with support for one of
>> the above (I will suggest FULL join support) 2. support for the other.

> OK, will try.

I extracted #1 from the patch.  Attached is a patch for that.  If that
patch is reasonable, I'll create a patch for #2 on top of it.

Comments welcome!

Best regards,
Etsuro Fujita

Attachment

Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/09/09 21:35, Etsuro Fujita wrote:
> On 2016/09/08 19:51, Etsuro Fujita wrote:
>> On 2016/09/06 22:07, Ashutosh Bapat wrote:
>>> This patch tries to do two things at a time 1. support join pushdown for
>>> FULL join when the joining relations have remote conditions 2. better
>>> support for fetching placeholder vars, whole row references and some
>>> system columns. To make reviews easy, I think we should split the patch
>>> into two 1. supporting subqueries to be deparsed with support for one of
>>> the above (I will suggest FULL join support) 2. support for the other.

>> OK, will try.

> I extracted #1 from the patch.  Attached is a patch for that.  If that
> patch is reasonable, I'll create a patch for #2 on top of it.

Attached is a patch for #2.  In that patch I fixed some bugs and added a
bit more comments.  For testing, please apply the patch for #1 first.

Best regards,
Etsuro Fujita

Attachment

Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:





        3. I think registerAlias stuff should happen really at the time of
        creating paths and should be stored in fpinfo. Without that it
        will be
        computed every time we deparse the query. This means every time
        we try
        to EXPLAIN the query at various levels in the join tree and once
        for the
        query to be sent to the foreign server.

    Hmm.  I think the overhead in calculating aliases would be
    negligible compared to the overhead in explaining each remote query
    for costing or sending the remote query for execution.  So, I
    created aliases in the same way as remote params created and stored
    into params_list in deparse_expr_cxt.  I'm not sure it's worth
    complicating the code.

We defer remote parameter creation till deparse time since the the
parameter numbers are dependent upon the sequence in which we deparse
the query. Creating them at the time of path creation and storing them
in fpinfo is not possible because a. those present in the joining
relations will conflict with each other and need some kind of
serialization at the time of deparsing b. those defer for differently
parameterized paths or paths with different pathkeys. We don't defer it
because it's very light on performance.

That's not true with the alias information. As long as we detect which
relations need subqueries, their RTIs are enough to create unique
aliases e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery
can have alias r123 without conflicting with any other alias.

Hmm.  But another concern about the approach of generating an subselect alias for a path, if needed, at the path creation time would be that the path might be rejected by add_path, which would result in totally wasting cycles for generating the subselect alias for the path.

A path may get rejected but the relation is not rejected. The alias applies to a relation and its targetlist, which will remain same for all paths created for that relation, esp when it's a base relation or join. So, AFAIU that work never gets wasted. Also, for costing paths with use_remote_estimate, we deparse the query, which builds the alias information again and again for very path. That's much worse than building it once for a given relation.
 

However minimum overhead it might have, we will deparse the query every
time we create a path involving that kind of relation i.e. for different
pathkeys, different parameterization and different joins in which that
relation participates in. This number can be significant. We want to
avoid this overhead if we can.

Exactly.  As I said above, I think the overhead would be negligible compared to the overhead in explaining each remote query for costing or the overhead in sending the final remote query for execution, though.

It won't remain minimal as the number of paths created increases, increasing the number of times a query is deparsed. We deparse query every time time we cost a path for a relation with use_remote_estimates true. As we try to push down more and more stuff, we will create more paths and deparse the query more time.

Also, that makes the interface better. Right now, in your patch, you have changed the order of deparsing in the existing code, so that the aliases are registered while deparsing FROM clause and before any Var nodes are deparsed. If we create aliases at the time of path creation, only once in GetForeignJoinPaths or GetForeignPaths as appropriate, that would require less code churn and would save some CPU cycles as well.
 

        6.
        !     if (is_placeholder)
        !         errcontext("placeholder expression at position %d in
        select list",
        !                    errpos->cur_attno);
        A user wouldn't know what a placeholder expression is, there is
        no such
        term in the documentation. We have to device a better way to
        provide an
        error context for an expression in general.

    Though I proposed that, I don't think that it's that important to
    let users know that the expression is from a PHV.  How about just
    saying "expression", not "placeholder expression", so that we have
    the message "expression at position %d in select list" in the context?

Hmm, that's better than placeholder expression, but not as explanatory
as it should be since we won't be printing the "select list" in the
error context and user won't have a clue as to what error context is
about.

I don't think so.  Consider an example of the conversion error message, which is from the regression test:

SELECT  ft1.c1,  ft2.c2, ft1 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = 1;
ERROR:  invalid input syntax for integer: "foo"
CONTEXT:  whole-row reference to foreign table "ft1"

As shown in the example, the error message is displayed under a remote query for execution.  So, ISTM it's reasonable to print something like "expression at position %d in select list" in the context if an expression in a PHV.

I missed it. Sorry. Looks ok.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Push down more full joins in postgres_fdw

From
Robert Haas
Date:
On Tue, Sep 6, 2016 at 9:07 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> That's not true with the alias information. As long as we detect which
> relations need subqueries, their RTIs are enough to create unique aliases
> e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery can have
> alias r123 without conflicting with any other alias.

What if RTI 123 is also used in the query?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
On Tue, Sep 13, 2016 at 10:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Sep 6, 2016 at 9:07 AM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> That's not true with the alias information. As long as we detect which
>> relations need subqueries, their RTIs are enough to create unique aliases
>> e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery can have
>> alias r123 without conflicting with any other alias.
>
> What if RTI 123 is also used in the query?

Good catch. I actually meant some combination of 1, 2 and 3, which is
unique for a join between r1, r2 and r3.  How about r1_2_3 or
r1_r2_r3?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Robert Haas
Date:
On Tue, Sep 13, 2016 at 11:38 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Tue, Sep 13, 2016 at 10:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Sep 6, 2016 at 9:07 AM, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>>> That's not true with the alias information. As long as we detect which
>>> relations need subqueries, their RTIs are enough to create unique aliases
>>> e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery can have
>>> alias r123 without conflicting with any other alias.
>>
>> What if RTI 123 is also used in the query?
>
> Good catch. I actually meant some combination of 1, 2 and 3, which is
> unique for a join between r1, r2 and r3.  How about r1_2_3 or
> r1_r2_r3?

Sure, something like that can work, but if you have a big enough join
the identifier might get too long.  I'm not sure why it wouldn't work
to just use the lowest RTI involved in the join, though; the others
won't appear anywhere else at that query level.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
On Wed, Sep 14, 2016 at 8:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Sep 13, 2016 at 11:38 PM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> On Tue, Sep 13, 2016 at 10:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Tue, Sep 6, 2016 at 9:07 AM, Ashutosh Bapat
>>> <ashutosh.bapat@enterprisedb.com> wrote:
>>>> That's not true with the alias information. As long as we detect which
>>>> relations need subqueries, their RTIs are enough to create unique aliases
>>>> e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery can have
>>>> alias r123 without conflicting with any other alias.
>>>
>>> What if RTI 123 is also used in the query?
>>
>> Good catch. I actually meant some combination of 1, 2 and 3, which is
>> unique for a join between r1, r2 and r3.  How about r1_2_3 or
>> r1_r2_r3?
>
> Sure, something like that can work, but if you have a big enough join
> the identifier might get too long.  I'm not sure why it wouldn't work
> to just use the lowest RTI involved in the join, though; the others
> won't appear anywhere else at that query level.
>

Yes, that will work too and is much more preferred than long r1_2_3.
The idea is to come with a unique alias name from RTIs involved in
that relation. Thinking loudly, r1_2_3 is more descriptive to debug
issues. It tells that the subquery it refers to covers RTIs 1, 2 and
3. That information may be quite helpful to understand the remote SQL.
r1 on the other hand can refer to relation with RTI 1 or a join
relation where least RTI is 1. That can be solved by using s<RTI> for
subquery and r<RTI> for a bare relation. But it still doesn't tell us
which all relations are involved.

But since the subquery is part of remote SQL and we have to take a
look at it to find out meaning of individual columns, that benefit may
be smaller compared to the convenience of smaller alias. So +1 for
using the smallest RTI to indicate a subquery.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/09/13 14:17, Ashutosh Bapat wrote:

>     But another concern about the approach of generating an
>     subselect alias for a path, if needed, at the path creation time
>     would be that the path might be rejected by add_path, which would
>     result in totally wasting cycles for generating the subselect alias
>     for the path.

> A path may get rejected but the relation is not rejected. The alias
> applies to a relation and its targetlist, which will remain same for all
> paths created for that relation, esp when it's a base relation or join.
> So, AFAIU that work never gets wasted.

I think there is no guarantee that add_path won't reject foreign join 
paths.  The possibility would be very low, though.

>         However minimum overhead it might have, we will deparse the
>         query every
>         time we create a path involving that kind of relation i.e. for
>         different
>         pathkeys, different parameterization and different joins in
>         which that
>         relation participates in. This number can be significant. We want to
>         avoid this overhead if we can.

>     Exactly.  As I said above, I think the overhead would be negligible
>     compared to the overhead in explaining each remote query for costing
>     or the overhead in sending the final remote query for execution, though.

> It won't remain minimal as the number of paths created increases,
> increasing the number of times a query is deparsed. We deparse query
> every time time we cost a path for a relation with use_remote_estimates
> true. As we try to push down more and more stuff, we will create more
> paths and deparse the query more time.

> Also, that makes the interface better. Right now, in your patch, you
> have changed the order of deparsing in the existing code, so that the
> aliases are registered while deparsing FROM clause and before any Var
> nodes are deparsed. If we create aliases at the time of path creation,
> only once in GetForeignJoinPaths or GetForeignPaths as appropriate, that
> would require less code churn and would save some CPU cycles as well.

Agreed.  Will fix.

Best regards,
Etsuro Fujita





Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/09/15 15:29, Ashutosh Bapat wrote:
> On Wed, Sep 14, 2016 at 8:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:

>> I'm not sure why it wouldn't work
>> to just use the lowest RTI involved in the join, though; the others
>> won't appear anywhere else at that query level.

> So +1 for
> using the smallest RTI to indicate a subquery.

+1 for the general idea.

ISTM that the use of the same RTI for subqueries in multi-levels in a 
remote SQL makes the SQL a bit difficult to read.  How about using the 
position of the join rel in join_rel_list, (more precisely, the position 
plus list_length(root->parse->rtable)), instead?

Best regards,
Etsuro Fujita





Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
On Mon, Sep 26, 2016 at 1:05 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2016/09/15 15:29, Ashutosh Bapat wrote:
>>
>> On Wed, Sep 14, 2016 at 8:52 PM, Robert Haas <robertmhaas@gmail.com>
>> wrote:
>
>
>>> I'm not sure why it wouldn't work
>>> to just use the lowest RTI involved in the join, though; the others
>>> won't appear anywhere else at that query level.
>
>
>> So +1 for
>> using the smallest RTI to indicate a subquery.
>
>
> +1 for the general idea.
>
> ISTM that the use of the same RTI for subqueries in multi-levels in a remote
> SQL makes the SQL a bit difficult to read.  How about using the position of
> the join rel in join_rel_list, (more precisely, the position plus
> list_length(root->parse->rtable)), instead?
>
We switch to hash table to maintain the join RelOptInfos when the
number of joins grows larger, where the position won't make much
sense. We might differentiate between a base relation alias and
subquery alias by using different prefixes like "r" and "s"  resp.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/09/26 18:06, Ashutosh Bapat wrote:
> On Mon, Sep 26, 2016 at 1:05 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:

>> ISTM that the use of the same RTI for subqueries in multi-levels in a remote
>> SQL makes the SQL a bit difficult to read.  How about using the position of
>> the join rel in join_rel_list, (more precisely, the position plus
>> list_length(root->parse->rtable)), instead?

> We switch to hash table to maintain the join RelOptInfos when the
> number of joins grows larger, where the position won't make much
> sense.

That's right, but we still store the joinrel into join_rel_list after 
creating that hash table.  That hash table is just for fast lookup.  See 
build_join_rel.

> We might differentiate between a base relation alias and
> subquery alias by using different prefixes like "r" and "s"  resp.

Hmm.  My concern about that would the recursive use of "s" with the same 
RTI as subquery aliases for different level subqueries in a single 
remote SQL.  For example, if those subqueries include a base rel with 
RTI=1, then "s1" would be used again and again within that SQL.  That 
would be logically correct, but seems confusing to me.

Best regards,
Etsuro Fujita





Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
On Mon, Sep 26, 2016 at 4:06 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2016/09/26 18:06, Ashutosh Bapat wrote:
>>
>> On Mon, Sep 26, 2016 at 1:05 PM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp> wrote:
>
>
>>> ISTM that the use of the same RTI for subqueries in multi-levels in a
>>> remote
>>> SQL makes the SQL a bit difficult to read.  How about using the position
>>> of
>>> the join rel in join_rel_list, (more precisely, the position plus
>>> list_length(root->parse->rtable)), instead?
>
>
>> We switch to hash table to maintain the join RelOptInfos when the
>> number of joins grows larger, where the position won't make much
>> sense.
>
>
> That's right, but we still store the joinrel into join_rel_list after
> creating that hash table.  That hash table is just for fast lookup.  See
> build_join_rel.

Sorry, I wasn't clear in my reply. As the list grows, it will take
longer to locate the RelOptInfo of the given join. Doing that for
creating an alias seems an overkill.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/09/26 20:20, Ashutosh Bapat wrote:
> On Mon, Sep 26, 2016 at 4:06 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> On 2016/09/26 18:06, Ashutosh Bapat wrote:
>>> On Mon, Sep 26, 2016 at 1:05 PM, Etsuro Fujita
>>> <fujita.etsuro@lab.ntt.co.jp> wrote:

>>>> ISTM that the use of the same RTI for subqueries in multi-levels in a
>>>> remote
>>>> SQL makes the SQL a bit difficult to read.  How about using the position
>>>> of
>>>> the join rel in join_rel_list, (more precisely, the position plus
>>>> list_length(root->parse->rtable)), instead?

>>> We switch to hash table to maintain the join RelOptInfos when the
>>> number of joins grows larger, where the position won't make much
>>> sense.

>> That's right, but we still store the joinrel into join_rel_list after
>> creating that hash table.

> As the list grows, it will take
> longer to locate the RelOptInfo of the given join. Doing that for
> creating an alias seems an overkill.

The join rel is appended to the end of the list, so I was thinking to 
get the position info by list_length during postgresGetForeignJoinPaths.

Best regards,
Etsuro Fujita





Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
On Tue, Sep 27, 2016 at 8:48 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2016/09/26 20:20, Ashutosh Bapat wrote:
>>
>> On Mon, Sep 26, 2016 at 4:06 PM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp> wrote:
>>>
>>> On 2016/09/26 18:06, Ashutosh Bapat wrote:
>>>>
>>>> On Mon, Sep 26, 2016 at 1:05 PM, Etsuro Fujita
>>>> <fujita.etsuro@lab.ntt.co.jp> wrote:
>
>
>>>>> ISTM that the use of the same RTI for subqueries in multi-levels in a
>>>>> remote
>>>>> SQL makes the SQL a bit difficult to read.  How about using the
>>>>> position
>>>>> of
>>>>> the join rel in join_rel_list, (more precisely, the position plus
>>>>> list_length(root->parse->rtable)), instead?
>
>
>>>> We switch to hash table to maintain the join RelOptInfos when the
>>>> number of joins grows larger, where the position won't make much
>>>> sense.
>
>
>>> That's right, but we still store the joinrel into join_rel_list after
>>> creating that hash table.
>
>
>> As the list grows, it will take
>> longer to locate the RelOptInfo of the given join. Doing that for
>> creating an alias seems an overkill.
>
>
> The join rel is appended to the end of the list, so I was thinking to get
> the position info by list_length during postgresGetForeignJoinPaths.

That's true only when the paths are being added to a newly created
joinrel. But that's not true always. We may add paths with different
joining order to an existing joinrel, in which case list_length would
not give its position. Am I missing something?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/09/27 13:33, Ashutosh Bapat wrote:

I wrote:
>>>>>> ISTM that the use of the same RTI for subqueries in multi-levels in a
>>>>>> remote
>>>>>> SQL makes the SQL a bit difficult to read.  How about using the
>>>>>> position
>>>>>> of
>>>>>> the join rel in join_rel_list, (more precisely, the position plus
>>>>>> list_length(root->parse->rtable)), instead?

I wrote:
>> The join rel is appended to the end of the list, so I was thinking to get
>> the position info by list_length during postgresGetForeignJoinPaths.

> That's true only when the paths are being added to a newly created
> joinrel. But that's not true always. We may add paths with different
> joining order to an existing joinrel, in which case list_length would
> not give its position. Am I missing something?

I think you are right, but postgresGetForeignJoinPaths only allows us to 
add a foreign join path to a newly created joinrel.  The reason is 
because that routine skips all its work after the first call for that 
joinrel, by checking to see if joinrel->fdw_private is not NULL.  So, I 
think it's reasonable to get the position by list_length in that routine.

Best regards,
Etsuro Fujita





Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/09/26 16:30, Etsuro Fujita wrote:
> On 2016/09/13 14:17, Ashutosh Bapat wrote:

>> It won't remain minimal as the number of paths created increases,
>> increasing the number of times a query is deparsed. We deparse query
>> every time time we cost a path for a relation with use_remote_estimates
>> true. As we try to push down more and more stuff, we will create more
>> paths and deparse the query more time.

>> Also, that makes the interface better. Right now, in your patch, you
>> have changed the order of deparsing in the existing code, so that the
>> aliases are registered while deparsing FROM clause and before any Var
>> nodes are deparsed. If we create aliases at the time of path creation,
>> only once in GetForeignJoinPaths or GetForeignPaths as appropriate, that
>> would require less code churn and would save some CPU cycles as well.

> Agreed.  Will fix.

Done.  Attached is an updated version of the patch.

I didn't create aliases at anytime.  Instead, I added a logic to get
info about the alias to a given expression from reltarget->exprs for
relations in a given join tree.  See isSubqueryExpr and
getSubselectAliasInfo.

As proposed by you, the patch differentiates between a base relation
alias and a subquery alias by using different prefixes "r" and "s",
respectively.  Also, subquery aliases are indexed by RTI for baserels
and the position in join_rel_list + the length of rtable for joinrels,
as proposed upthread.

Best regards,
Etsuro Fujita

Attachment

Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/09/28 18:35, Etsuro Fujita wrote:
> Attached is an updated version of the patch.

I found a minor bug in that patch; the relation_index added to
PgFdwRelationInfo was defined as Index, but I used the modifier %d to
print that.  So, I changed the data type to int.  Also, I added a bit
more comments.  Please find attached an updated version of the patch.

Best regards,
Etsuro Fujita

Attachment

Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/09/29 21:12, Etsuro Fujita wrote:
> Please find attached an updated version of the patch.

Here is a new version (V6) of the patch, which is basically the same as
the previous patch, but I slightly modified that patch.  Another patch I
am attaching is an updated patch for pushing down PHVs to the remote,
which is created on top of the V6 patch.

Best regards,
Etsuro Fujita

Attachment

Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
Review for postgres-fdw-more-full-join-pushdown-v6 patch.

The patch applies cleanly and regression is clean (make check in
regress directory and that in postgres_fdw).

Here are some comments.
1. Because of the following code change, for a joinrel we might end up using
attrs_used, which will be garbage for a join rel. The assumption that tlist can
not be NIL for a join relation is wrong. Even for a join relation, tlist can be
NULL.  Consider query 'explain verbose select count(*) from ft1, ft2' Since
count(*) doesn't need any columns, the tlist is NIL. With the patch the server
crashes for this query.
-    if (foreignrel->reloptkind == RELOPT_JOINREL)
+    /*
+     * Note: tlist for a base relation might be non-NIL.  For example, if the
+     * base relation is an operand of a foreign join performing a full outer
+     * join and has non-NIL remote_conds, the base relation will be deparsed
+     * as a subquery, so the tlist for the base relation could be non-NIL.
+     */
+    if (tlist != NIL)    {
-        /* For a join relation use the input tlist */

We can not decide whether to use deparseExplicitTargetList or
deparse it from attrs_used based on the tlist. SELECT lists, whether it's
topmost SELECT or a subquery (even on a base relation), for deparsing a
JOIN query need to use deparseExplicitTargetList.

2. The code in deparseRangeTblRef() dealing with subqueries, is very similar to
deparseSelectStmtForRel(). The only thing deparseRangeTblRef() does not require
is the appending ORDER BY, which is determined by existence of pathkeys
argument. Probably we should reuse deparseSelectStmtForRel(), instead of
duplicating the code. This will also make the current changes to
deparseSelectSql unnecessary.

3. Why do we need following change? The elog() just few lines above says that
we expect only Var nodes. Why then we use deparseExpr() instead of
deparseVar()?        if (i > 0)            appendStringInfoString(buf, ", ");
-        deparseVar(var, context);
+        deparseExpr((Expr *) var, context);
        *retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);

And I get the answer for that question, somewhere down in the patch
+    /*
+     * If the given expression is an output column of a subquery-in-FROM,
+     * deparse the alias to the expression instead.
+     */
+    if (IsA(node, Var))
+    {
+        int            tabno;
+        int            colno;
+
+        if (isSubqueryExpr(node, context->foreignrel, &tabno, &colno))
+        {
+            appendStringInfo(context->buf, "%s%d.%s%d",
+                             SS_TAB_ALIAS_PREFIX, tabno,
+                             SS_COL_ALIAS_PREFIX, colno);
+            return;
+        }
+    }
+

Functionally, this code belongs to deparseVar() and not deparseExpr(). Like all
other functions which handle Expr nodes, deparseExpr() is expected to be a
dispatcher to the node specific function.

4. This comment is good to have there, but is unrelated to this patch. May be a
separate patch?
+    /* Don't generate bad syntax if no columns */

5. Changed comment doesn't say anything different from the original comment. It
may have been good to have it this way to start with, but changing it now
doesn't bring anything new. You seem to have just merged prologue of the
function with a comment in that function. I think, this change is unnecessary.
- * The function constructs ... JOIN ... ON ... for join relation. For a base
- * relation it just returns schema-qualified tablename, with the appropriate
- * alias if so requested.
+ * For a join relation the clause of the following form is appended to buf:
+ * ((outer relation) <join type> (inner relation) ON (joinclauses))
+ * For a base relation the function just adds the schema-qualified tablename,
+ * with the appropriate alias if so requested.
+    /* Don't generate bad syntax if no columns */

6. Change may be good but unrelated to this patch. May be material for a
separate patch. There are few such changes in this function. While these
changes may be good by themselves, they distract reviewer from the goal of the
patch.
-            appendStringInfo(buf, "(");
+            appendStringInfoChar(buf, '(');

7. I don't understand why you need to change this function so much. Take for
example the following change.
-        RelOptInfo *rel_o = fpinfo->outerrel;
-        RelOptInfo *rel_i = fpinfo->innerrel;
-        StringInfoData join_sql_o;
-        StringInfoData join_sql_i;
+        /* Begin the FROM clause entry */
+        appendStringInfoChar(buf, '(');
        /* Deparse outer relation */
-        initStringInfo(&join_sql_o);
-        deparseFromExprForRel(&join_sql_o, root, rel_o, true, params_list);
+        deparseRangeTblRef(buf, root,
+                           fpinfo->outerrel,
+                           fpinfo->make_outerrel_subquery,
+                           params_list);        /* Deparse outer relation */
-        initStringInfo(&join_sql_o);
-        deparseFromExprForRel(&join_sql_o, root, rel_o, true, params_list);
+        deparseRangeTblRef(buf, root,
+                           fpinfo->outerrel,
+                           fpinfo->make_outerrel_subquery,
+                           params_list);

It removes few variables, instead directly accesses the members from fpinfo.
Also, deparses the individual relations in the provided buffer directly, rather
than in separate buffers in the original code. Instead of this, all you had to
do was replace a call
-        deparseFromExprForRel(&join_sql_o, root, rel_o, true, params_list);
with
+        deparseRangeTblRef(&join_sql_o, root, rel_o,
fpinfo->make_outerrel_subquery, params_list)
Similarly for the inner relation. Again, the changes you have done might have
been good, if done in the original code, but doing those in this patch just
creates distractions and increases the size of the patch.

Similarly changes
-        /* Append join clause; (TRUE) if no join clause */
+        /* Append join conditions */

+        {
+            /* No join conditions; add "(TRUE)" */            appendStringInfoString(buf, "(TRUE)");
+        }
+        appendStringInfoString(buf, " ON ");

-        /* End the FROM clause entry. */
-        appendStringInfo(buf, ")");


+        /* End the FROM clause entry */
+        appendStringInfoChar(buf, ')');

8. Why have you changed the name of variable here?
-        if (use_alias)
+        if (add_rel_alias)

9. In deparseRangeTblRef(), the targetlist for the subquery is obtained by
calling build_tlist_to_deparse(), but appendSubselectAlias() constructs the
column aliases based on foreignrel->reltarget->exprs. Those two are usually
same, but there is no code which guarantees that. Instead, pass tlist to
appendSubselectAlias() or simply pass the number of entries in that list
(list_length(tlist), which is all it needs. OR use foreignrel->reltarget->exprs
directly instead of calling build_tlist_to_deparse(). Similar problem exists
with getSubselectAliasInfo().

10. Deparsing a query in the form of a subquery makes it hard-to-read. The
original code aimed at avoiding a subquery. Also, this change has created many
expected output changes, which again seem unnecessary. In fact, having the
pushable join clauses of an inner join in ON clause, which is closer to JOIN
clause, is better than having them farther in the WHERE clause.
-    /*
-     * For an inner join, all restrictions can be treated alike. Treating the
-     * pushed down conditions as join conditions allows a top level full outer
-     * join to be deparsed without requiring subqueries.
-     */
-    if (jointype == JOIN_INNER)
-    {
-        Assert(!fpinfo->joinclauses);
-        fpinfo->joinclauses = fpinfo->remote_conds;
-        fpinfo->remote_conds = NIL;
-    }
-

11. I have reworded following comment and restructured the code that follows it
in the attached patch.
+    /*
+     * Set the subquery information.  If the relation performs a full outer
+     * join and if the input relations have non-NIL remote_conds, the input
+     * relations need to be deparsed as a subquery.
+     */
The code esp. the if .. else .. block followed by another one, made it a bit
unreadable. Hence I restructured it so that it's readable and doesn't require
variable "relids" from earlier code, which was being reused. Let me know if
this change looks good.

12. The code below deletes the condition, which is understandable.
-            if (fpinfo_i->remote_conds || fpinfo_o->remote_conds)
But why does it add a new unrelated condition here? What the comment claims,
looks like an existing bug, unrelated to the patch. Can you please give an
example? If that it's an existing bug, it should be fixed as a separate patch.
I don't understand the relation of this code with what the patch is doing.
+            /*
+             * We can't do anything here, and if there are any non-Vars in the
+             * outerrel/innerrel's reltarget, give up pushing down this join,
+             * because the deparsing logic can't support such a case currently.
+             */
+            if (reltarget_has_non_vars(outerrel))
+                return false;
+            if (reltarget_has_non_vars(innerrel))                return false;

13. The comment below is missing the main purpose i.e. creating a a unique
alias, in case the relation gets converted into a subquery. Lowest or highest
relid will create a unique alias at given level of join and that would be more
future proof. If we decide to consider paths for every join order, following
comment will no more be true.
+    /*
+     * Set the relation index.  This is defined as the position of this
+     * joinrel in the join_rel_list list plus the length of the rtable list.
+     * Note that since this joinrel is at the end of the list when we are
+     * called, we can get the position by list_length.
+     */
+    fpinfo->relation_index =
+        list_length(root->parse->rtable) + list_length(root->join_rel_list);

14. The function name talks about non-vars but the Assert few lines below
asserts that every node there is a Var. Need better naming.
+reltarget_has_non_vars(RelOptInfo *foreignrel)
+{
+    ListCell   *lc;
+
+    foreach(lc, foreignrel->reltarget->exprs)
+    {
+        Var           *var = (Var *) lfirst(lc);
+
+        Assert(IsA(var, Var));
And also an explanation for the claim
+ * Note: currently deparseExplicitTargetList can't properly handle such Vars.

15. While deparsing every Var, we are descending the RelOptInfo hierarchy.
Instead of that, we should probably gather all the alias information once and
store it in context as an array indexed by relid. While deparsing a Var we can
get the targetlist and alias for a given relation by using var->varno as index.
And then search for given Var node in the targetlist to deparse the column name
by its position in the targetlist. For the relations that are not converted
into subqueries, this array will not hold any information and the Var will be
deparsed as it's being done today.

Testing
-------
1. The code is changing deparser but doesn't have testcases
testing impact of the code. For example, we should have testcases with remote
query deparsed as nested subqueries or nested subqueries within subqueries and
so on May be testcases where a relation deeper in the RelOptInfo hierarchy has
conditions but it's immediate upper relation does not have those.

2. The only testcase added by this patch, copies an existing case and adds a
whole row reference to one of the relations being joined. Instead we could add
that whole-row reference to the existing testcase itself.

On Thu, Oct 13, 2016 at 4:05 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2016/09/29 21:12, Etsuro Fujita wrote:
>>
>> Please find attached an updated version of the patch.
>
>
> Here is a new version (V6) of the patch, which is basically the same as the
> previous patch, but I slightly modified that patch.  Another patch I am
> attaching is an updated patch for pushing down PHVs to the remote, which is
> created on top of the V6 patch.
>
> Best regards,
> Etsuro Fujita



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
> 11. I have reworded following comment and restructured the code that follows it
> in the attached patch.
> +    /*
> +     * Set the subquery information.  If the relation performs a full outer
> +     * join and if the input relations have non-NIL remote_conds, the input
> +     * relations need to be deparsed as a subquery.
> +     */
> The code esp. the if .. else .. block followed by another one, made it a bit
> unreadable. Hence I restructured it so that it's readable and doesn't require
> variable "relids" from earlier code, which was being reused. Let me know if
> this change looks good.
>
Sorry, forgot to attach the patch. Here it is.


--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/10/22 17:19, Ashutosh Bapat wrote:
> Review for postgres-fdw-more-full-join-pushdown-v6 patch.
>
> The patch applies cleanly and regression is clean (make check in
> regress directory and that in postgres_fdw).

Thanks for the review!

> Here are some comments.
> 1. Because of the following code change, for a joinrel we might end up using
> attrs_used, which will be garbage for a join rel. The assumption that tlist can
> not be NIL for a join relation is wrong. Even for a join relation, tlist can be
> NULL.  Consider query 'explain verbose select count(*) from ft1, ft2' Since
> count(*) doesn't need any columns, the tlist is NIL. With the patch the server
> crashes for this query.
> -    if (foreignrel->reloptkind == RELOPT_JOINREL)
> +    /*
> +     * Note: tlist for a base relation might be non-NIL.  For example, if the
> +     * base relation is an operand of a foreign join performing a full outer
> +     * join and has non-NIL remote_conds, the base relation will be deparsed
> +     * as a subquery, so the tlist for the base relation could be non-NIL.
> +     */
> +    if (tlist != NIL)
>      {
> -        /* For a join relation use the input tlist */
>
> We can not decide whether to use deparseExplicitTargetList or
> deparse it from attrs_used based on the tlist. SELECT lists, whether it's
> topmost SELECT or a subquery (even on a base relation), for deparsing a
> JOIN query need to use deparseExplicitTargetList.

Will fix.

> 2. The code in deparseRangeTblRef() dealing with subqueries, is very similar to
> deparseSelectStmtForRel(). The only thing deparseRangeTblRef() does not require
> is the appending ORDER BY, which is determined by existence of pathkeys
> argument. Probably we should reuse deparseSelectStmtForRel(), instead of
> duplicating the code. This will also make the current changes to
> deparseSelectSql unnecessary.

Will do.

> 3. Why do we need following change? The elog() just few lines above says that
> we expect only Var nodes. Why then we use deparseExpr() instead of
> deparseVar()?
>          if (i > 0)
>              appendStringInfoString(buf, ", ");
> -        deparseVar(var, context);
> +        deparseExpr((Expr *) var, context);
>
>          *retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
>
> And I get the answer for that question, somewhere down in the patch
> +    /*
> +     * If the given expression is an output column of a subquery-in-FROM,
> +     * deparse the alias to the expression instead.
> +     */
> +    if (IsA(node, Var))
> +    {
> +        int            tabno;
> +        int            colno;
> +
> +        if (isSubqueryExpr(node, context->foreignrel, &tabno, &colno))
> +        {
> +            appendStringInfo(context->buf, "%s%d.%s%d",
> +                             SS_TAB_ALIAS_PREFIX, tabno,
> +                             SS_COL_ALIAS_PREFIX, colno);
> +            return;
> +        }
> +    }
> +
>
> Functionally, this code belongs to deparseVar() and not deparseExpr(). Like all
> other functions which handle Expr nodes, deparseExpr() is expected to be a
> dispatcher to the node specific function.

OK, will change that way.

> 4. This comment is good to have there, but is unrelated to this patch. May be a
> separate patch?
> +    /* Don't generate bad syntax if no columns */
>
> 5. Changed comment doesn't say anything different from the original comment. It
> may have been good to have it this way to start with, but changing it now
> doesn't bring anything new. You seem to have just merged prologue of the
> function with a comment in that function. I think, this change is unnecessary.
> - * The function constructs ... JOIN ... ON ... for join relation. For a base
> - * relation it just returns schema-qualified tablename, with the appropriate
> - * alias if so requested.
> + * For a join relation the clause of the following form is appended to buf:
> + * ((outer relation) <join type> (inner relation) ON (joinclauses))
> + * For a base relation the function just adds the schema-qualified tablename,
> + * with the appropriate alias if so requested.
> +    /* Don't generate bad syntax if no columns */
>
> 6. Change may be good but unrelated to this patch. May be material for a
> separate patch. There are few such changes in this function. While these
> changes may be good by themselves, they distract reviewer from the goal of the
> patch.
> -            appendStringInfo(buf, "(");
> +            appendStringInfoChar(buf, '(');

OK on #4, #5, and #6, Will remove all the changes.  I will leave those 
for separate patches.

> 7. I don't understand why you need to change this function so much. Take for
> example the following change.
> -        RelOptInfo *rel_o = fpinfo->outerrel;
> -        RelOptInfo *rel_i = fpinfo->innerrel;
> -        StringInfoData join_sql_o;
> -        StringInfoData join_sql_i;
> +        /* Begin the FROM clause entry */
> +        appendStringInfoChar(buf, '(');
>
>          /* Deparse outer relation */
> -        initStringInfo(&join_sql_o);
> -        deparseFromExprForRel(&join_sql_o, root, rel_o, true, params_list);
> +        deparseRangeTblRef(buf, root,
> +                           fpinfo->outerrel,
> +                           fpinfo->make_outerrel_subquery,
> +                           params_list);
>          /* Deparse outer relation */
> -        initStringInfo(&join_sql_o);
> -        deparseFromExprForRel(&join_sql_o, root, rel_o, true, params_list);
> +        deparseRangeTblRef(buf, root,
> +                           fpinfo->outerrel,
> +                           fpinfo->make_outerrel_subquery,
> +                           params_list);
>
> It removes few variables, instead directly accesses the members from fpinfo.
> Also, deparses the individual relations in the provided buffer directly, rather
> than in separate buffers in the original code. Instead of this, all you had to
> do was replace a call
> -        deparseFromExprForRel(&join_sql_o, root, rel_o, true, params_list);
> with
> +        deparseRangeTblRef(&join_sql_o, root, rel_o,
> fpinfo->make_outerrel_subquery, params_list)
> Similarly for the inner relation. Again, the changes you have done might have
> been good, if done in the original code, but doing those in this patch just
> creates distractions and increases the size of the patch.

I did that for efficiency because deparsing the relation directly into 
the given buffer would save cycles, but I agree that that is another 
issue.  Will remove that change, and leave that for another patch.

> 8. Why have you changed the name of variable here?
> -        if (use_alias)
> +        if (add_rel_alias)

Sorry, I forgot to remove this change from the patch.  Will fix.

> 9. In deparseRangeTblRef(), the targetlist for the subquery is obtained by
> calling build_tlist_to_deparse(), but appendSubselectAlias() constructs the
> column aliases based on foreignrel->reltarget->exprs. Those two are usually
> same, but there is no code which guarantees that. Instead, pass tlist to
> appendSubselectAlias() or simply pass the number of entries in that list
> (list_length(tlist), which is all it needs. OR use foreignrel->reltarget->exprs
> directly instead of calling build_tlist_to_deparse(). Similar problem exists
> with getSubselectAliasInfo().

Actually, the patch guarantees that since in that case (1) 
foreignrel->reltarget->exprs doesn't contain any PHVs (note that passing 
the following check in foreign_join_ok implies that 
foreignrel->reltarget->exprs of the input rels for a given foreign join 
doesn't contain any PHVs and (2) we have fpinfo->local_conds == NIL, so 
build_tlist_to_deparse() would produce a tlist equivalent to the 
foreignrel->reltarget->exprs.  But as you proposed, to make the code 
easier to understand, I will change to use foreignrel->reltarget->exprs 
directly.
    /*     * deparseExplicitTargetList() isn't smart enough to handle 
anything other     * than a Var.  In particular, if there's some PlaceHolderVar that 
would     * need to be evaluated within this join tree (because there's an upper     * reference to a quantity that may
goto NULL as a result of an outer     * join), then we can't try to push the join down because we'll 
 
fail when     * we get to deparseExplicitTargetList().  However, a 
PlaceHolderVar that     * needs to be evaluated *at the top* of this join tree is OK, 
because we     * can do that locally after fetching the results from the remote side.     */    relids =
joinrel->relids;   foreach(lc, root->placeholder_list)    {        PlaceHolderInfo *phinfo = lfirst(lc);
 
        if (bms_is_subset(phinfo->ph_eval_at, relids) &&            bms_nonempty_difference(relids,
phinfo->ph_eval_at))           return false;    }
 

> 10. Deparsing a query in the form of a subquery makes it hard-to-read. The
> original code aimed at avoiding a subquery. Also, this change has created many
> expected output changes, which again seem unnecessary. In fact, having the
> pushable join clauses of an inner join in ON clause, which is closer to JOIN
> clause, is better than having them farther in the WHERE clause.
> -    /*
> -     * For an inner join, all restrictions can be treated alike. Treating the
> -     * pushed down conditions as join conditions allows a top level full outer
> -     * join to be deparsed without requiring subqueries.
> -     */
> -    if (jointype == JOIN_INNER)
> -    {
> -        Assert(!fpinfo->joinclauses);
> -        fpinfo->joinclauses = fpinfo->remote_conds;
> -        fpinfo->remote_conds = NIL;
> -    }

I added this change for another patch that I proposed for extending the 
DML pushdown in 9.6 so that it can perform an update/delete on a join 
remotely.  To create a remote query for that, I think we need to pull up 
inner join conditions mentioning the target relation in the JOIN/ON 
clauses into the WHERE clause.  But I have to admit that's unrelated to 
this patch, so I will leave that as-is.

> 11. I have reworded following comment and restructured the code that follows it
> in the attached patch.
> +    /*
> +     * Set the subquery information.  If the relation performs a full outer
> +     * join and if the input relations have non-NIL remote_conds, the input
> +     * relations need to be deparsed as a subquery.
> +     */
> The code esp. the if .. else .. block followed by another one, made it a bit
> unreadable. Hence I restructured it so that it's readable and doesn't require
> variable "relids" from earlier code, which was being reused. Let me know if
> this change looks good.

That's a good idea.  Thanks!

> 12. The code below deletes the condition, which is understandable.
> -            if (fpinfo_i->remote_conds || fpinfo_o->remote_conds)
> But why does it add a new unrelated condition here? What the comment claims,
> looks like an existing bug, unrelated to the patch. Can you please give an
> example? If that it's an existing bug, it should be fixed as a separate patch.
> I don't understand the relation of this code with what the patch is doing.
> +            /*
> +             * We can't do anything here, and if there are any non-Vars in the
> +             * outerrel/innerrel's reltarget, give up pushing down this join,
> +             * because the deparsing logic can't support such a case currently.
> +             */
> +            if (reltarget_has_non_vars(outerrel))
> +                return false;
> +            if (reltarget_has_non_vars(innerrel))
>                  return false;

I thought the current deparsing logic wouldn't work well if the target 
list of a given subquery contained eg, system columns other than ctid 
and oid, but I noticed that I was wrong; it can, so we don't need this 
restriction.  Will remove.  (I don't think there is any bug.)

> 13. The comment below is missing the main purpose i.e. creating a a unique
> alias, in case the relation gets converted into a subquery. Lowest or highest
> relid will create a unique alias at given level of join and that would be more
> future proof. If we decide to consider paths for every join order, following
> comment will no more be true.
> +    /*
> +     * Set the relation index.  This is defined as the position of this
> +     * joinrel in the join_rel_list list plus the length of the rtable list.
> +     * Note that since this joinrel is at the end of the list when we are
> +     * called, we can get the position by list_length.
> +     */
> +    fpinfo->relation_index =
> +        list_length(root->parse->rtable) + list_length(root->join_rel_list);

I don't agree on that point.  As I said before, the approach using the 
lowest/highest relid would make a remote query having nested subqueries 
difficult to read.  And even if we decided to consider paths for every 
join order, we could define the relation_index safely, by modifying 
postgresGetForeignJoinPaths so that it (1) sets the relation_index the 
proposed way at the first time it is called and (2) avoids setting the 
relation_index after the first call, by checking to see 
fpinfo->relation_index > 0.

> 14. The function name talks about non-vars but the Assert few lines below
> asserts that every node there is a Var. Need better naming.
> +reltarget_has_non_vars(RelOptInfo *foreignrel)
> +{
> +    ListCell   *lc;
> +
> +    foreach(lc, foreignrel->reltarget->exprs)
> +    {
> +        Var           *var = (Var *) lfirst(lc);
> +
> +        Assert(IsA(var, Var));
> And also an explanation for the claim
> + * Note: currently deparseExplicitTargetList can't properly handle such Vars.

Will remove this function for the reason described in #12.

> 15. While deparsing every Var, we are descending the RelOptInfo hierarchy.

Right.  I think that not only avoids creating redundant data to find the 
alias of a given Var node but also would be able to find it in optimal cost.

> Instead of that, we should probably gather all the alias information once and
> store it in context as an array indexed by relid. While deparsing a Var we can
> get the targetlist and alias for a given relation by using var->varno as index.
> And then search for given Var node in the targetlist to deparse the column name
> by its position in the targetlist. For the relations that are not converted
> into subqueries, this array will not hold any information and the Var will be
> deparsed as it's being done today.

Sorry, I don't understand this part.  How does that work when creating a 
remote query having nested subqueries?  Is that able to search for a 
given Var node efficiently?

> Testing
> -------
> 1. The code is changing deparser but doesn't have testcases
> testing impact of the code. For example, we should have testcases with remote
> query deparsed as nested subqueries or nested subqueries within subqueries and
> so on May be testcases where a relation deeper in the RelOptInfo hierarchy has
> conditions but it's immediate upper relation does not have those.

Will do.

> 2. The only testcase added by this patch, copies an existing case and adds a
> whole row reference to one of the relations being joined. Instead we could add
> that whole-row reference to the existing testcase itself.

Will do.

Best regards,
Etsuro Fujita





Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
>
>> 13. The comment below is missing the main purpose i.e. creating a a unique
>> alias, in case the relation gets converted into a subquery. Lowest or
>> highest
>> relid will create a unique alias at given level of join and that would be
>> more
>> future proof. If we decide to consider paths for every join order,
>> following
>> comment will no more be true.
>> +    /*
>> +     * Set the relation index.  This is defined as the position of this
>> +     * joinrel in the join_rel_list list plus the length of the rtable
>> list.
>> +     * Note that since this joinrel is at the end of the list when we are
>> +     * called, we can get the position by list_length.
>> +     */
>> +    fpinfo->relation_index =
>> +        list_length(root->parse->rtable) +
>> list_length(root->join_rel_list);
>
>
> I don't agree on that point.  As I said before, the approach using the
> lowest/highest relid would make a remote query having nested subqueries
> difficult to read.  And even if we decided to consider paths for every join
> order, we could define the relation_index safely, by modifying
> postgresGetForeignJoinPaths so that it (1) sets the relation_index the
> proposed way at the first time it is called and (2) avoids setting the
> relation_index after the first call, by checking to see
> fpinfo->relation_index > 0.

OK. Although, the alias which contains a number, which apparently
doesn't show any relation with the relation (no pun intended :)) being
deparsed, won't make much sense in the deparsed query. But I am fine
leaving this to the committer's judgement. May be you want to add an
assert here making sure that relation_index is set only once.
Something like Assert(fpinfo->relation_index == 0) just before setting
it.

>
>> 14. The function name talks about non-vars but the Assert few lines below
>> asserts that every node there is a Var. Need better naming.
>> +reltarget_has_non_vars(RelOptInfo *foreignrel)
>> +{
>> +    ListCell   *lc;
>> +
>> +    foreach(lc, foreignrel->reltarget->exprs)
>> +    {
>> +        Var           *var = (Var *) lfirst(lc);
>> +
>> +        Assert(IsA(var, Var));
>> And also an explanation for the claim
>> + * Note: currently deparseExplicitTargetList can't properly handle such
>> Vars.
>
>
> Will remove this function for the reason described in #12.
>
>> 15. While deparsing every Var, we are descending the RelOptInfo hierarchy.
>
>
> Right.  I think that not only avoids creating redundant data to find the
> alias of a given Var node but also would be able to find it in optimal cost.
>
>> Instead of that, we should probably gather all the alias information once
>> and
>> store it in context as an array indexed by relid. While deparsing a Var we
>> can
>> get the targetlist and alias for a given relation by using var->varno as
>> index.
>> And then search for given Var node in the targetlist to deparse the column
>> name
>> by its position in the targetlist. For the relations that are not
>> converted
>> into subqueries, this array will not hold any information and the Var will
>> be
>> deparsed as it's being done today.
>
>
> Sorry, I don't understand this part.  How does that work when creating a
> remote query having nested subqueries?  Is that able to search for a given
> Var node efficiently?

For every relation that is deparsed as a subquery, we will create a
separate context. Each such context will have an array described
above. This array will contain the targetlist and aliases for all the
relations, covered by that relation, that are required to be deparsed
as subqueries. These arrays bubble up to topmost relation that is not
required to be deparsed as subquery. When a relation is required to be
deparsed as a subquery, its immediate upper relation sets the alias
and targetlist chosen for that relation at all the indexes
corresponding to all the base relations covered by that relation. For
example, let's assume that a relation (1, 2, 3) is required to be
deparsed as subquery for an immediate upper relation (1, 2, 3, 4, 5)
(thus the other joining relation being (4,5)). While deparsing for
relation (1,2,3,4,5), the context will contain a 5 element array, with
positions 1, 2, 3 filled by same targetlist and alias whereas
positions 4 and 5 will not be filled as those relations are not
deparsed as subqueries. Let's assume in relation (1, 2, 3), (1, 3) in
turn requires subquery but (2) does not. Thus the context created
while deparsing (1, 2, 3) will have a 3 element array with positions 1
and 3 containing the same targetlist and alias, where as position 2
will be empty. When deparsing a Var node with varno = N and varattno =
m, if the nth position in the array in the context is empty, that Var
node will be deparsed as rN.<column name>. But if that position is has
alias sZ, then we search for that Var node in the targetlist and if
it's found at kth position in the targetlist, we will deparse it as
sZ.ck. The search in the targetlist can be performed using
tlist_member, and then fetching the position by TargetEntry::resno.
This does not require any recursion and thus saves stack space and
some CPU cycles required for recursion. I guess, the arrays need to be
computed only once for any relation when the query for that relation
is deparsed the first time.

Thanks for considering other suggestions.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/10/25 18:58, Ashutosh Bapat wrote:

You wrote:
>>> 13. The comment below is missing the main purpose i.e. creating a a unique
>>> alias, in case the relation gets converted into a subquery. Lowest or
>>> highest
>>> relid will create a unique alias at given level of join and that would be
>>> more
>>> future proof. If we decide to consider paths for every join order,
>>> following
>>> comment will no more be true.
>>> +    /*
>>> +     * Set the relation index.  This is defined as the position of this
>>> +     * joinrel in the join_rel_list list plus the length of the rtable
>>> list.
>>> +     * Note that since this joinrel is at the end of the list when we are
>>> +     * called, we can get the position by list_length.
>>> +     */
>>> +    fpinfo->relation_index =
>>> +        list_length(root->parse->rtable) +
>>> list_length(root->join_rel_list);

I wrote:
>> I don't agree on that point.  As I said before, the approach using the
>> lowest/highest relid would make a remote query having nested subqueries
>> difficult to read.  And even if we decided to consider paths for every join
>> order, we could define the relation_index safely, by modifying
>> postgresGetForeignJoinPaths so that it (1) sets the relation_index the
>> proposed way at the first time it is called and (2) avoids setting the
>> relation_index after the first call, by checking to see
>> fpinfo->relation_index > 0.

> OK. Although, the alias which contains a number, which apparently
> doesn't show any relation with the relation (no pun intended :)) being
> deparsed, won't make much sense in the deparsed query. But I am fine
> leaving this to the committer's judgement.

Fine with me.

> May be you want to add an
> assert here making sure that relation_index is set only once.
> Something like Assert(fpinfo->relation_index == 0) just before setting
> it.

Will do.

>>> 15. While deparsing every Var, we are descending the RelOptInfo hierarchy.

>> Right.  I think that not only avoids creating redundant data to find the
>> alias of a given Var node but also would be able to find it in optimal cost.

>>> Instead of that, we should probably gather all the alias information once
>>> and
>>> store it in context as an array indexed by relid. While deparsing a Var we
>>> can
>>> get the targetlist and alias for a given relation by using var->varno as
>>> index.
>>> And then search for given Var node in the targetlist to deparse the column
>>> name
>>> by its position in the targetlist. For the relations that are not
>>> converted
>>> into subqueries, this array will not hold any information and the Var will
>>> be
>>> deparsed as it's being done today.

>> Sorry, I don't understand this part.  How does that work when creating a
>> remote query having nested subqueries?  Is that able to search for a given
>> Var node efficiently?

> For every relation that is deparsed as a subquery, we will create a
> separate context. Each such context will have an array described
> above. This array will contain the targetlist and aliases for all the
> relations, covered by that relation, that are required to be deparsed
> as subqueries. These arrays bubble up to topmost relation that is not
> required to be deparsed as subquery. When a relation is required to be
> deparsed as a subquery, its immediate upper relation sets the alias
> and targetlist chosen for that relation at all the indexes
> corresponding to all the base relations covered by that relation.

Thanks for the explanation!

> For
> example, let's assume that a relation (1, 2, 3) is required to be
> deparsed as subquery for an immediate upper relation (1, 2, 3, 4, 5)
> (thus the other joining relation being (4,5)). While deparsing for
> relation (1,2,3,4,5), the context will contain a 5 element array, with
> positions 1, 2, 3 filled by same targetlist and alias whereas
> positions 4 and 5 will not be filled as those relations are not
> deparsed as subqueries.

Sorry, I don't understand this.  In that case, the immediate upper 
relation (1, 2, 3, 4, 5) would need to fill the targetlist and aliases 
for *the join relation (1, 2, 3) somewhere*, not the targetlist and 
aliases for each of the component relations 1, 2, and 3, because the 
join relation is deparsed as a subquery.  Maybe I'm missing something, 
though.

> Let's assume in relation (1, 2, 3), (1, 3) in
> turn requires subquery but (2) does not. Thus the context created
> while deparsing (1, 2, 3) will have a 3 element array with positions 1
> and 3 containing the same targetlist and alias, where as position 2
> will be empty.

> When deparsing a Var node with varno = N and varattno =
> m, if the nth position in the array in the context is empty, that Var
> node will be deparsed as rN.<column name>.

What happens when deparsing eg, a Var with varno = 2 at the topmost 
relation (1, 2, 3, 4, 5)?  The second position of the array is empty, 
but the join relation (1, 2, 3) is deparsed as a subquery, so the Var 
should be deparsed as an alias of an output column of the subquery at 
the topmost relation, I think.

> But if that position is has
> alias sZ, then we search for that Var node in the targetlist and if
> it's found at kth position in the targetlist, we will deparse it as
> sZ.ck. The search in the targetlist can be performed using
> tlist_member, and then fetching the position by TargetEntry::resno.

> This does not require any recursion and thus saves stack space and
> some CPU cycles required for recursion.

Is that true?

> some CPU cycles required for recursion. I guess, the arrays need to be
> computed only once for any relation when the query for that relation
> is deparsed the first time.

Does this algorithm extend to the case where we consider paths for every 
join order?

> Thanks for considering other suggestions.

Your suggestions are really helpful to improve the patch.  Thanks!

Best regards,
Etsuro Fujita





Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
>
>
>> For every relation that is deparsed as a subquery, we will create a
>> separate context. Each such context will have an array described
>> above. This array will contain the targetlist and aliases for all the
>> relations, covered by that relation, that are required to be deparsed
>> as subqueries. These arrays bubble up to topmost relation that is not
>> required to be deparsed as subquery. When a relation is required to be
>> deparsed as a subquery, its immediate upper relation sets the alias
>> and targetlist chosen for that relation at all the indexes
>> corresponding to all the base relations covered by that relation.
>
>
> Thanks for the explanation!
>
>> For
>> example, let's assume that a relation (1, 2, 3) is required to be
>> deparsed as subquery for an immediate upper relation (1, 2, 3, 4, 5)
>> (thus the other joining relation being (4,5)). While deparsing for
>> relation (1,2,3,4,5), the context will contain a 5 element array, with
>> positions 1, 2, 3 filled by same targetlist and alias whereas
>> positions 4 and 5 will not be filled as those relations are not
>> deparsed as subqueries.
>
>
> Sorry, I don't understand this.  In that case, the immediate upper relation
> (1, 2, 3, 4, 5) would need to fill the targetlist and aliases for *the join
> relation (1, 2, 3) somewhere*, not the targetlist and aliases for each of
> the component relations 1, 2, and 3, because the join relation is deparsed
> as a subquery.  Maybe I'm missing something, though.

The description above does not specify "targetlist and alias" for each
of (1, 2, 3). The array in the context will have positions 1, 2, 3
filled with *same* alias and targetlist which is derived from relation
(1, 2, 3).

>
>> Let's assume in relation (1, 2, 3), (1, 3) in
>> turn requires subquery but (2) does not. Thus the context created
>> while deparsing (1, 2, 3) will have a 3 element array with positions 1
>> and 3 containing the same targetlist and alias, where as position 2
>> will be empty.
>
>
>> When deparsing a Var node with varno = N and varattno =
>> m, if the nth position in the array in the context is empty, that Var
>> node will be deparsed as rN.<column name>.
>
>
> What happens when deparsing eg, a Var with varno = 2 at the topmost relation
> (1, 2, 3, 4, 5)?  The second position of the array is empty, but the join
> relation (1, 2, 3) is deparsed as a subquery, so the Var should be deparsed
> as an alias of an output column of the subquery at the topmost relation, I
> think.

position 2 will not be empty, it will be filled by the alias and
targetlist derived from relation (1, 2, 3).

>
>> But if that position is has
>> alias sZ, then we search for that Var node in the targetlist and if
>> it's found at kth position in the targetlist, we will deparse it as
>> sZ.ck. The search in the targetlist can be performed using
>> tlist_member, and then fetching the position by TargetEntry::resno.
>
>
>> This does not require any recursion and thus saves stack space and
>> some CPU cycles required for recursion.
>
>
> Is that true?

Yes, unless you explain why is that false.

>
>> some CPU cycles required for recursion. I guess, the arrays need to be
>> computed only once for any relation when the query for that relation
>> is deparsed the first time.
>
>
> Does this algorithm extend to the case where we consider paths for every
> join order?

Yes, if we store the information about which of relations need
subquery and which don't for every join order.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/10/26 16:11, Ashutosh Bapat wrote:

You wrote:
>>> For
>>> example, let's assume that a relation (1, 2, 3) is required to be
>>> deparsed as subquery for an immediate upper relation (1, 2, 3, 4, 5)
>>> (thus the other joining relation being (4,5)). While deparsing for
>>> relation (1,2,3,4,5), the context will contain a 5 element array, with
>>> positions 1, 2, 3 filled by same targetlist and alias whereas
>>> positions 4 and 5 will not be filled as those relations are not
>>> deparsed as subqueries.

I wrote:
>> Sorry, I don't understand this.  In that case, the immediate upper relation
>> (1, 2, 3, 4, 5) would need to fill the targetlist and aliases for *the join
>> relation (1, 2, 3) somewhere*, not the targetlist and aliases for each of
>> the component relations 1, 2, and 3, because the join relation is deparsed
>> as a subquery.  Maybe I'm missing something, though.

> The description above does not specify "targetlist and alias" for each
> of (1, 2, 3). The array in the context will have positions 1, 2, 3
> filled with *same* alias and targetlist which is derived from relation
> (1, 2, 3).

OK

>>> Let's assume in relation (1, 2, 3), (1, 3) in
>>> turn requires subquery but (2) does not. Thus the context created
>>> while deparsing (1, 2, 3) will have a 3 element array with positions 1
>>> and 3 containing the same targetlist and alias, where as position 2
>>> will be empty.

>>> When deparsing a Var node with varno = N and varattno =
>>> m, if the nth position in the array in the context is empty, that Var
>>> node will be deparsed as rN.<column name>.

>> What happens when deparsing eg, a Var with varno = 2 at the topmost relation
>> (1, 2, 3, 4, 5)?  The second position of the array is empty, but the join
>> relation (1, 2, 3) is deparsed as a subquery, so the Var should be deparsed
>> as an alias of an output column of the subquery at the topmost relation, I
>> think.

> position 2 will not be empty, it will be filled by the alias and
> targetlist derived from relation (1, 2, 3).

OK

>>> But if that position is has
>>> alias sZ, then we search for that Var node in the targetlist and if
>>> it's found at kth position in the targetlist, we will deparse it as
>>> sZ.ck. The search in the targetlist can be performed using
>>> tlist_member, and then fetching the position by TargetEntry::resno.

>>> This does not require any recursion and thus saves stack space and
>>> some CPU cycles required for recursion.

>> Is that true?

> Yes, unless you explain why is that false.

OK, that would be true, I think.

>>> I guess, the arrays need to be
>>> computed only once for any relation when the query for that relation
>>> is deparsed the first time.

>> Does this algorithm extend to the case where we consider paths for every
>> join order?

> Yes, if we store the information about which of relations need
> subquery and which don't for every join order.

Hmm.  Sorry, I'm not so excited about this proposal.  I think (1) that 
is solving a problem that hasn't been proven to be a problem, (2) that 
would complicate the deparser logic, and (3) the cost of creating this 
array for each relation by the bottom-up method while deparsing a remote 
query would be not small (especially when the query is large), so that 
might need more cycles for deparsing the query than what I proposed when 
use_remote_estimate=false.  So, I'd like to go with what I proposed, at 
least as the first cut.

Best regards,
Etsuro Fujita





Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
>
>>>> I guess, the arrays need to be
>>>> computed only once for any relation when the query for that relation
>>>> is deparsed the first time.
>
>
>>> Does this algorithm extend to the case where we consider paths for every
>>> join order?
>
>
>> Yes, if we store the information about which of relations need
>> subquery and which don't for every join order.
>
>
> Hmm.  Sorry, I'm not so excited about this proposal.  I think (1) that is
> solving a problem that hasn't been proven to be a problem, (2) that would
> complicate the deparser logic, and (3) the cost of creating this array for
> each relation by the bottom-up method while deparsing a remote query would
> be not small (especially when the query is large), so that might need more
> cycles for deparsing the query than what I proposed when
> use_remote_estimate=false.  So, I'd like to go with what I proposed, at
> least as the first cut.

The arrays will need to computed only when there is at least one
relation required to be deparsed as subquery. Every relation above the
relation which is converted into subquery requires the array. Each
array will be N * size of pointer long where N is the largest relid
covered by that relation. N will vary across the RelOptInfo hierarchy.
All that array holds is targetlist pointer which are required to
computed anyway. So, the amount of memory is bounded by N * size of
pointer * (2N - 1), which is way lesser than what we use in other
places.

Your patch calls isSubqueryExpr() recursively for every Var in the
targetlist, which can be many for foreign tables with many columns.
For every such Var it may need to reach upto the relation which is
converted into subquery, which can as bad as reaching every base
relation. So, it looks like the number of recursive calls to
isSubqueryExpr() is bounded by V * N (i.e. worst case depth of the
RelOptInfo tree), which can be quite costly. If use_remote_estimates
is true, we do this for every intermediate relation and for every path
created. That isn't very performance efficient.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/10/26 18:25, Ashutosh Bapat wrote:

> Your patch calls isSubqueryExpr() recursively for every Var in the
> targetlist, which can be many for foreign tables with many columns.
> For every such Var it may need to reach upto the relation which is
> converted into subquery, which can as bad as reaching every base
> relation. So, it looks like the number of recursive calls to
> isSubqueryExpr() is bounded by V * N (i.e. worst case depth of the
> RelOptInfo tree), which can be quite costly. If use_remote_estimates
> is true, we do this for every intermediate relation and for every path
> created. That isn't very performance efficient.

In practice, the search cost would be negligible compared to the cost of 
explaining/executing the query.

My concern about your proposal is: it might not be worth complicating 
the code to solve a problem that is actually not a problem in practice.

Best regards,
Etsuro Fujita





Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
On Wed, Oct 26, 2016 at 3:35 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2016/10/26 18:25, Ashutosh Bapat wrote:
>
>> Your patch calls isSubqueryExpr() recursively for every Var in the
>> targetlist, which can be many for foreign tables with many columns.
>> For every such Var it may need to reach upto the relation which is
>> converted into subquery, which can as bad as reaching every base
>> relation. So, it looks like the number of recursive calls to
>> isSubqueryExpr() is bounded by V * N (i.e. worst case depth of the
>> RelOptInfo tree), which can be quite costly. If use_remote_estimates
>> is true, we do this for every intermediate relation and for every path
>> created. That isn't very performance efficient.
>
>
> In practice, the search cost would be negligible compared to the cost of
> explaining/executing the query.
>
> My concern about your proposal is: it might not be worth complicating the
> code to solve a problem that is actually not a problem in practice.
>

To me the current code looks complicated esp. because of the recursion
involved and usage of out parameters to isSubqueryExpr(). My
suggestion goes inline with the current method of deparsing a Var.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/10/26 19:53, Ashutosh Bapat wrote:
> On Wed, Oct 26, 2016 at 3:35 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:

>> In practice, the search cost would be negligible compared to the cost of
>> explaining/executing the query.
>>
>> My concern about your proposal is: it might not be worth complicating the
>> code to solve a problem that is actually not a problem in practice.

> To me the current code looks complicated esp. because of the recursion
> involved and usage of out parameters to isSubqueryExpr().

I don't think so.  isSubqueryExpr is prety small, written in less than 
50 lines, and the code looks rather simple to me.

> My
> suggestion goes inline with the current method of deparsing a Var.

Yeah, I think your approach makes it easy to search for the alias to a 
given Var from the array you proposed.  I think the complexity of your 
approach would be in extra work for building and maintaining the array 
while deparsing the query.  I think that would probably need more 
invasive and much larger changes to the existing code than what I proposed.

I think this issue is optional, so I'd like to propose to leave this for 
the committer's judge.

Best regards,
Etsuro Fujita





Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
On Thu, Oct 27, 2016 at 12:46 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2016/10/26 19:53, Ashutosh Bapat wrote:
>>
>> On Wed, Oct 26, 2016 at 3:35 PM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp> wrote:
>
>
>>> In practice, the search cost would be negligible compared to the cost of
>>> explaining/executing the query.
>>>
>>> My concern about your proposal is: it might not be worth complicating the
>>> code to solve a problem that is actually not a problem in practice.
>
>
>> To me the current code looks complicated esp. because of the recursion
>> involved and usage of out parameters to isSubqueryExpr().
>
>
> I don't think so.  isSubqueryExpr is prety small, written in less than 50
> lines, and the code looks rather simple to me.
>
>> My
>> suggestion goes inline with the current method of deparsing a Var.
>
>
> Yeah, I think your approach makes it easy to search for the alias to a given
> Var from the array you proposed.  I think the complexity of your approach
> would be in extra work for building and maintaining the array while
> deparsing the query.  I think that would probably need more invasive and
> much larger changes to the existing code than what I proposed.
>
> I think this issue is optional, so I'd like to propose to leave this for the
> committer's judge.
>

Fine with me.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/10/27 18:06, Ashutosh Bapat wrote:
> On Thu, Oct 27, 2016 at 12:46 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> On 2016/10/26 19:53, Ashutosh Bapat wrote:

>>> On Wed, Oct 26, 2016 at 3:35 PM, Etsuro Fujita
>>>> My concern about your proposal is: it might not be worth complicating the
>>>> code to solve a problem that is actually not a problem in practice.

>>> To me the current code looks complicated esp. because of the recursion
>>> involved and usage of out parameters to isSubqueryExpr().

>> I don't think so.  isSubqueryExpr is prety small, written in less than 50
>> lines, and the code looks rather simple to me.

>>> My
>>> suggestion goes inline with the current method of deparsing a Var.

>> Yeah, I think your approach makes it easy to search for the alias to a given
>> Var from the array you proposed.  I think the complexity of your approach
>> would be in extra work for building and maintaining the array while
>> deparsing the query.  I think that would probably need more invasive and
>> much larger changes to the existing code than what I proposed.
>>
>> I think this issue is optional, so I'd like to propose to leave this for the
>> committer's judge.

> Fine with me.

OK

Here is the updated version, which includes the restructuring you
proposed.  Other than the above issue and the alias issue we discussed,
I addressed all your comments except one on testing; I tried to add test
cases where the remote query is deparsed as nested subqueries, but I
couldn't because IIUC, reduce_outer_joins reduced full joins to inner
joins or left joins.  So, I added two test cases: (1) the joining
relations are both base relations (actually, we already have that) and
(2) one of the joining relations is a base relation and the other is a
join relation.  I rebased the patch to HEAD, so I added a test case with
aggregate pushdown also.

Best regards,
Etsuro Fujita

Attachment

Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/10/27 20:41, Etsuro Fujita wrote:
> Here is the updated version, which includes the restructuring you
> proposed.  Other than the above issue and the alias issue we discussed,
> I addressed all your comments except one on testing; I tried to add test
> cases where the remote query is deparsed as nested subqueries, but I
> couldn't because IIUC, reduce_outer_joins reduced full joins to inner
> joins or left joins.  So, I added two test cases: (1) the joining
> relations are both base relations (actually, we already have that) and
> (2) one of the joining relations is a base relation and the other is a
> join relation.  I rebased the patch to HEAD, so I added a test case with
> aggregate pushdown also.

I've updated another patch for evaluating PHVs remotely.  Attached is an
updated version of the patch, which is created on top of the patch for
supporting deparsing subqueries.  You can find test cases where a remote
join query is deparsed as nested subqueries.

Best regards,
Etsuro Fujita

Attachment

Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
> Here is the updated version, which includes the restructuring you proposed.
> Other than the above issue and the alias issue we discussed, I addressed all
> your comments except one on testing; I tried to add test cases where the
> remote query is deparsed as nested subqueries, but I couldn't because IIUC,
> reduce_outer_joins reduced full joins to inner joins or left joins.

No always. It will do so in some cases as explained in the prologue of
reduce_outer_joins()
* More generally, an outer join can be reduced in strength if there is a* strict qual above it in the qual tree that
constrainsa Var from the* nullable side of the join to be non-null.  (For FULL joins this applies* to each side
separately.)

>  So, I
> added two test cases: (1) the joining relations are both base relations
> (actually, we already have that) and (2) one of the joining relations is a
> base relation and the other is a join relation.  I rebased the patch to
> HEAD, so I added a test case with aggregate pushdown also.
>

This patch again has a lot of unrelated changes, esp. in
deparseSelectStmtForRel(). What was earlier part of deparseSelectSql()
and deparseFromExpr() is now flattened in deparseSelectStmtForRel(),
which seems unnecessary. Or there are changes like

-deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *rel,
-                       List *tlist, List *remote_conds, List *pathkeys,
+deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
+                       RelOptInfo *foreignrel, List *tlist,
+                       List *remote_conds, List *pathkeys,

which arise because rel has been renamed as foreignrel. The patch will
work even without such renaming.

I think such refactoring, should be done in a separate patch.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/11/03 18:52, Ashutosh Bapat wrote:
>> Here is the updated version, which includes the restructuring you proposed.
>> Other than the above issue and the alias issue we discussed, I addressed all
>> your comments except one on testing; I tried to add test cases where the
>> remote query is deparsed as nested subqueries, but I couldn't because IIUC,
>> reduce_outer_joins reduced full joins to inner joins or left joins.

> No always. It will do so in some cases as explained in the prologue of
> reduce_outer_joins()
>
>  * More generally, an outer join can be reduced in strength if there is a
>  * strict qual above it in the qual tree that constrains a Var from the
>  * nullable side of the join to be non-null.  (For FULL joins this applies
>  * to each side separately.)

Hmm.  Would it be necessary for this patch to include test cases for 
nested subqueries?  As mentioned in a previous email, those test cases 
can be added by the split patch that allow PHVs to be evaluated on the 
remote side.

> This patch again has a lot of unrelated changes, esp. in
> deparseSelectStmtForRel(). What was earlier part of deparseSelectSql()
> and deparseFromExpr() is now flattened in deparseSelectStmtForRel(),
> which seems unnecessary.

IIUC, I think this change was done to address your comment (see the 
comment #2 in [1]).  Am I missing something?

> Or there are changes like
>
> -deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *rel,
> -                       List *tlist, List *remote_conds, List *pathkeys,
> +deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
> +                       RelOptInfo *foreignrel, List *tlist,
> +                       List *remote_conds, List *pathkeys,
>
> which arise because rel has been renamed as foreignrel. The patch will
> work even without such renaming.

I did that because we have a Relation named rel (the same name!) within 
that function, to execute deparseTargetList in a base-relation case. 
Another reason for that is because (1) that would match with other 
function definitions in deparse.c and (2) that would be consistent with 
the existing function definition for that function in postgres_fdw.h.

Best regards,
Etsuro Fujita

[1] 
https://www.postgresql.org/message-id/CAFjFpRfwoSsJr9418b2jA7g0nwagjZSWhPQPUmK9M6z5XSOAqQ%40mail.gmail.com





Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
On Fri, Nov 4, 2016 at 9:19 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2016/11/03 18:52, Ashutosh Bapat wrote:
>>>
>>> Here is the updated version, which includes the restructuring you
>>> proposed.
>>> Other than the above issue and the alias issue we discussed, I addressed
>>> all
>>> your comments except one on testing; I tried to add test cases where the
>>> remote query is deparsed as nested subqueries, but I couldn't because
>>> IIUC,
>>> reduce_outer_joins reduced full joins to inner joins or left joins.
>
>
>> No always. It will do so in some cases as explained in the prologue of
>> reduce_outer_joins()
>>
>>  * More generally, an outer join can be reduced in strength if there is a
>>  * strict qual above it in the qual tree that constrains a Var from the
>>  * nullable side of the join to be non-null.  (For FULL joins this applies
>>  * to each side separately.)
>
>
> Hmm.  Would it be necessary for this patch to include test cases for nested
> subqueries?  As mentioned in a previous email, those test cases can be added
> by the split patch that allow PHVs to be evaluated on the remote side.

A patch should have testcases testing the functionality added. This
patch adds functionality to deparse nested subqueries, so there should
be testcase to test it. If we can not come up with a testcase, then
it's very much possible that the code related to that functionality is
not needed. PHV is a separate patch and we do not know whether it will
be committed or when it will be committed after this patch is
committed. It's better to have self-sufficient patch.

>
>> This patch again has a lot of unrelated changes, esp. in
>> deparseSelectStmtForRel(). What was earlier part of deparseSelectSql()
>> and deparseFromExpr() is now flattened in deparseSelectStmtForRel(),
>> which seems unnecessary.
>
>
> IIUC, I think this change was done to address your comment (see the comment
> #2 in [1]).  Am I missing something?

There is some misunderstanding here. That comment basically says,
deparseRangeTblRef() duplicates code in deparseSelectStmtForRel(), so
we should either remove deparseRangeTblRef() altogether or should keep
it to minimal avoiding duplication of code. What might have confused
you is the last sentence in that comment "This will also make the
current changes to deparseSelectSql unnecessary." By current changes I
meant changes to deparseSelectSql() in your patch, not the one that's
in the repository. I don't think we should flatten
deparseSelectStmtForRel() in this patch.

>
>> Or there are changes like
>>
>> -deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root, RelOptInfo
>> *rel,
>> -                       List *tlist, List *remote_conds, List *pathkeys,
>> +deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
>> +                       RelOptInfo *foreignrel, List *tlist,
>> +                       List *remote_conds, List *pathkeys,
>>
>> which arise because rel has been renamed as foreignrel. The patch will
>> work even without such renaming.
>
>
> I did that because we have a Relation named rel (the same name!) within that
> function, to execute deparseTargetList in a base-relation case.

That's because you have flattened deparseSelectStmtForRel(). If we
don't flatten it out, the variable name conflict doesn't arise.

> Another
> reason for that is because (1) that would match with other function
> definitions in deparse.c and (2) that would be consistent with the existing
> function definition for that function in postgres_fdw.h.
>

That would be a separate patch, I guess.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/11/04 13:08, Ashutosh Bapat wrote:
> On Fri, Nov 4, 2016 at 9:19 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> On 2016/11/03 18:52, Ashutosh Bapat wrote:

I wrote:
>>>> Here is the updated version,

>>>> Other than the above issue and the alias issue we discussed, I addressed
>>>> all
>>>> your comments except one on testing; I tried to add test cases where the
>>>> remote query is deparsed as nested subqueries, but I couldn't because
>>>> IIUC,
>>>> reduce_outer_joins reduced full joins to inner joins or left joins.

>>> No always. It will do so in some cases as explained in the prologue of
>>> reduce_outer_joins()
>>>
>>>  * More generally, an outer join can be reduced in strength if there is a
>>>  * strict qual above it in the qual tree that constrains a Var from the
>>>  * nullable side of the join to be non-null.  (For FULL joins this applies
>>>  * to each side separately.)

Right.

>> Would it be necessary for this patch to include test cases for nested
>> subqueries?

> A patch should have testcases testing the functionality added. This
> patch adds functionality to deparse nested subqueries, so there should
> be testcase to test it.

OK, I added such a test case.

>>> This patch again has a lot of unrelated changes, esp. in
>>> deparseSelectStmtForRel(). What was earlier part of deparseSelectSql()
>>> and deparseFromExpr() is now flattened in deparseSelectStmtForRel(),
>>> which seems unnecessary.

>> IIUC, I think this change was done to address your comment (see the comment
>> #2 in [1]).  Am I missing something?

> There is some misunderstanding here. That comment basically says,
> deparseRangeTblRef() duplicates code in deparseSelectStmtForRel(), so
> we should either remove deparseRangeTblRef() altogether or should keep
> it to minimal avoiding duplication of code. What might have confused
> you is the last sentence in that comment "This will also make the
> current changes to deparseSelectSql unnecessary." By current changes I
> meant changes to deparseSelectSql() in your patch, not the one that's
> in the repository. I don't think we should flatten
> deparseSelectStmtForRel() in this patch.

I noticed that I misunderstood your words.  Sorry about that.  I agree
with you, so I removed that change from the patch.

Attached is an updated version of the patch.

Best regards,
Etsuro Fujita

Attachment

Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/11/04 19:55, Etsuro Fujita wrote:
> Attached is an updated version of the patch.

I noticed that I have included an unrelated regression test in the
patch.  Attached is a patch with the test removed.

Best regards,
Etsuro Fujita

Attachment

Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/11/07 11:24, Etsuro Fujita wrote:
> On 2016/11/04 19:55, Etsuro Fujita wrote:
>> Attached is an updated version of the patch.

> I noticed that I have included an unrelated regression test in the
> patch.  Attached is a patch with the test removed.

I noticed that I inadvertently removed some changes from that patch, so
I fixed that.  Please find attached an updated version of the patch.
I'm also attaching an updated version of another patch for evaluating
PHVs remotely, which has been created on top of that patch.  Changes to
the latter: I revised comments and added a bit more regression tests.

Best regards,
Etsuro Fujita

Attachment

Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
On Mon, Nov 7, 2016 at 5:50 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2016/11/07 11:24, Etsuro Fujita wrote:
>>
>> On 2016/11/04 19:55, Etsuro Fujita wrote:
>>>
>>> Attached is an updated version of the patch.
>
>
>> I noticed that I have included an unrelated regression test in the
>> patch.  Attached is a patch with the test removed.
>
>
> I noticed that I inadvertently removed some changes from that patch, so I
> fixed that.  Please find attached an updated version of the patch. I'm also
> attaching an updated version of another patch for evaluating PHVs remotely,
> which has been created on top of that patch.  Changes to the latter: I
> revised comments and added a bit more regression tests.

The patch looks in good shape now. Here are some comments. I have also
made several changes to comments correcting grammar, typos, style and
at few places logic. Let me know if the patch looks good.

I guess, below code
+   if (!fpinfo->subquery_rels)
+       return false;
can be changed to
    if (!bms_is_member(node->varno, fpinfo->subquery_rels))
        return false;
Also the return values from the recursive calls to isSubqueryExpr() can be
returned as is. I have included this change in the patch.

deparse.c seems to be using capitalized names for function which
actually deparse something and an non-capitalized form for helper
functions. From that perspective attached patch renames isSubqueryExpr
as is_subquery_var() and getSubselectAliasInfo() as
get_alias_id_for_var(). Actually both these functions accept a Var
node but somehow their names refer to expr.

This patch is using make_tlist_from_pathtarget() to create tlist to be
deparsed but search in RelOptInfo::reltarget::exprs for a given Var.
As long as the relations deparsed do not carry expressions, this might
work, but it will certainly break once we start deparsing relations
with expressions since the upper most query's tlist contains only
Vars. Instead, we should probably, create tlist and save it in fpinfo
and use it later for searching (tlist_member()?). Possibly use using
build_tlist_to_deparse(), to create the tlist similar so that
targetlist list creation logic is same for all the relations being
deparsed. I haven't included this change in the patch.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/11/11 19:22, Ashutosh Bapat wrote:
> The patch looks in good shape now.

Thanks for the review!

> The patch looks in good shape now. Here are some comments. I have also
> made several changes to comments correcting grammar, typos, style and
> at few places logic. Let me know if the patch looks good.

OK, will look into that.

> I guess, below code
> +   if (!fpinfo->subquery_rels)
> +       return false;
> can be changed to
>     if (!bms_is_member(node->varno, fpinfo->subquery_rels))
>         return false;
> Also the return values from the recursive calls to isSubqueryExpr() can be
> returned as is. I have included this change in the patch.

Will look into that too.

> deparse.c seems to be using capitalized names for function which
> actually deparse something and an non-capitalized form for helper
> functions.

That's not true.  There is a function named classifyConditions().  The 
function naming in deparse.c is a bit arbitrary.

> From that perspective attached patch renames isSubqueryExpr
> as is_subquery_var() and getSubselectAliasInfo() as
> get_alias_id_for_var(). Actually both these functions accept a Var
> node but somehow their names refer to expr.

The reason why I named that function isSubqueryExpr is that I think that 
function would be soon extended so as to handle PHVs.  See another patch 
for evaluating PHVs remotely.

> This patch is using make_tlist_from_pathtarget() to create tlist to be
> deparsed but search in RelOptInfo::reltarget::exprs for a given Var.
> As long as the relations deparsed do not carry expressions, this might
> work, but it will certainly break once we start deparsing relations
> with expressions since the upper most query's tlist contains only
> Vars. Instead, we should probably, create tlist and save it in fpinfo
> and use it later for searching (tlist_member()?). Possibly use using
> build_tlist_to_deparse(), to create the tlist similar so that
> targetlist list creation logic is same for all the relations being
> deparsed. I haven't included this change in the patch.

Seems like a good idea.  Will revise.

Best regards,
Etsuro Fujita





Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/11/11 20:50, Etsuro Fujita wrote:
> On 2016/11/11 19:22, Ashutosh Bapat wrote:
>> The patch looks in good shape now. Here are some comments. I have also
>> made several changes to comments correcting grammar, typos, style and
>> at few places logic. Let me know if the patch looks good.

> OK, will look into that.

Done.  +1 for the changes you made, except for few things; (1) You added
the following comments to deparseSelectSql:

+       /*
+        * For a non-base relation, use the input tlist. If a base
relation is
+        * being deparsed as a subquery, use input tlist, if the caller
has passed
+        * one. The column aliases of the subquery are crafted based on
the input
+        * tlist. If tlist is NIL, there will not be any expression
referring to
+        * any of the columns of the base relation being deparsed. Hence
it doesn't
+        * matter whether the base relation is being deparsed as
subquery or not.
+        */

The last sentence seems confusing to me.  My understanding is: if a base
relation has tlist=NIL, then the base relation *must* be deparsed as
ordinary, not as a subquery.  Maybe I'm missing something, though.  (I
left that as-is, but I think we need to reword that to be more clear, at
least.)

(2) You added the following comments to deparseRangeTblRef:

 > +  * If make_subquery is true, deparse the relation as a subquery.
Otherwise,
 > +  * deparse it as relation name with alias.

The second sentence seems confusing to me, too, because it looks like
the relation being deparsed is assumed to be a base relation, but the
relation can be a join relation, which might join base relations, lower
join relations, and/or lower subqueries.  So, I modified the sentence a bit.

(3) I don't think we need this in isSubqueryExpr, so I removed it from
the patch:

+    /* Keep compiler happy. */
+    return false;

Also, I revised comments you added a little bit.

>> I guess, below code
>> +   if (!fpinfo->subquery_rels)
>> +       return false;
>> can be changed to
>>     if (!bms_is_member(node->varno, fpinfo->subquery_rels))
>>         return false;
>> Also the return values from the recursive calls to isSubqueryExpr()
>> can be
>> returned as is. I have included this change in the patch.

> Will look into that too.

Done.  That's a good idea!

>> deparse.c seems to be using capitalized names for function which
>> actually deparse something and an non-capitalized form for helper
>> functions.

>> From that perspective attached patch renames isSubqueryExpr
>> as is_subquery_var() and getSubselectAliasInfo() as
>> get_alias_id_for_var(). Actually both these functions accept a Var
>> node but somehow their names refer to expr.

OK, I changed isSubqueryExpr to is_subquery_expr; I kept to refer to
expr because I think we would soon extend that function so that it can
handle PHVs, as I said upthread.  For getSubselectAliasInfo, I changed
the name to get_subselect_alias_id, because (1) the word "alias" seems
general and (2) the "for_var" part would make the name a bit long.

>> This patch is using make_tlist_from_pathtarget() to create tlist to be
>> deparsed but search in RelOptInfo::reltarget::exprs for a given Var.
>> As long as the relations deparsed do not carry expressions, this might
>> work, but it will certainly break once we start deparsing relations
>> with expressions since the upper most query's tlist contains only
>> Vars. Instead, we should probably, create tlist and save it in fpinfo
>> and use it later for searching (tlist_member()?). Possibly use using
>> build_tlist_to_deparse(), to create the tlist similar so that
>> targetlist list creation logic is same for all the relations being
>> deparsed. I haven't included this change in the patch.

> Seems like a good idea.  Will revise.

Done.  I modified the patch as proposed; create the tlist by
build_tlist_to_deparse in foreign_join_ok, if needed, and search the
tlist by tlist_member.  I also added a new member "tlist" to
PgFdwRelationInfo to save the tlist created in foreign_join_ok.

Another idea on the "tlist" member would be to save a tlist created for
EXPLAIN into that member in estimate_patch_cost_size, so that we don't
need to generate the tlist again in postgresGetForeignPlan, when
use_remote_estimate=true.  But I'd like to leave that for another patch.

Please find attached an updated version of the patch.

Best regards,
Etsuro Fujita

Attachment

Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
Thanks.

> except for few things; (1) You added the
> following comments to deparseSelectSql:
>
> +       /*
> +        * For a non-base relation, use the input tlist. If a base relation
> is
> +        * being deparsed as a subquery, use input tlist, if the caller has
> passed
> +        * one. The column aliases of the subquery are crafted based on the
> input
> +        * tlist. If tlist is NIL, there will not be any expression
> referring to
> +        * any of the columns of the base relation being deparsed. Hence it
> doesn't
> +        * matter whether the base relation is being deparsed as subquery or
> not.
> +        */
>
> The last sentence seems confusing to me.  My understanding is: if a base
> relation has tlist=NIL, then the base relation *must* be deparsed as
> ordinary, not as a subquery.  Maybe I'm missing something, though.  (I left
> that as-is, but I think we need to reword that to be more clear, at least.)
>

Hmm, I agree. I think the comment should just say, "Use tlist to
create the SELECT clause if one has been provided. For a base relation
with tlist = NIL, check attrs_needed information.". Does that sound
good?

> (2) You added the following comments to deparseRangeTblRef:
>
>> +  * If make_subquery is true, deparse the relation as a subquery.
>> Otherwise,
>> +  * deparse it as relation name with alias.
>
> The second sentence seems confusing to me, too, because it looks like the
> relation being deparsed is assumed to be a base relation, but the relation
> can be a join relation, which might join base relations, lower join
> relations, and/or lower subqueries.  So, I modified the sentence a bit.
>

OK.

> (3) I don't think we need this in isSubqueryExpr, so I removed it from the
> patch:
>
> +       /* Keep compiler happy. */
> +       return false;
>

Doesn't that cause compiler warning, saying "non-void function
returning nothing" or something like that. Actually, the "if
(bms_is_member(node->varno, outerrel->relids))" ends with a "return"
always. Hence we don't need to encapsulate the code in "else" block in
else { }. It could be taken outside.


>
>
> OK, I changed isSubqueryExpr to is_subquery_expr; I kept to refer to expr
> because I think we would soon extend that function so that it can handle
> PHVs, as I said upthread.  For getSubselectAliasInfo, I changed the name to
> get_subselect_alias_id, because (1) the word "alias" seems general and (2)
> the "for_var" part would make the name a bit long.

is_subquery_expr(Var *node -- that looks odd. Either it should
is_subquery_var(Var * ... OR it should be is_subquery_expr(Expr * ...
. I would prefer the first one, since that's what current patch is
doing. When we introduce PHVs, we may change it, if required.


>
>
> Done.  I modified the patch as proposed; create the tlist by
> build_tlist_to_deparse in foreign_join_ok, if needed, and search the tlist
> by tlist_member.  I also added a new member "tlist" to PgFdwRelationInfo to
> save the tlist created in foreign_join_ok.
>

Instead of adding a new member, you might want to reuse grouped_tlist
by renaming it.

> Another idea on the "tlist" member would be to save a tlist created for
> EXPLAIN into that member in estimate_patch_cost_size, so that we don't need
> to generate the tlist again in postgresGetForeignPlan, when
> use_remote_estimate=true.  But I'd like to leave that for another patch.

I think that will happen automatically, while deparsing, whether for
EXPLAIN or for actual execution.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/11/16 18:14, Ashutosh Bapat wrote:
>> (1) You added the
>> following comments to deparseSelectSql:
>>
>> +       /*
>> +        * For a non-base relation, use the input tlist. If a base relation
>> is
>> +        * being deparsed as a subquery, use input tlist, if the caller has
>> passed
>> +        * one. The column aliases of the subquery are crafted based on the
>> input
>> +        * tlist. If tlist is NIL, there will not be any expression
>> referring to
>> +        * any of the columns of the base relation being deparsed. Hence it
>> doesn't
>> +        * matter whether the base relation is being deparsed as subquery or
>> not.
>> +        */
>>
>> The last sentence seems confusing to me.  My understanding is: if a base
>> relation has tlist=NIL, then the base relation *must* be deparsed as
>> ordinary, not as a subquery.  Maybe I'm missing something, though.  (I left
>> that as-is, but I think we need to reword that to be more clear, at least.)

> Hmm, I agree. I think the comment should just say, "Use tlist to
> create the SELECT clause if one has been provided. For a base relation
> with tlist = NIL, check attrs_needed information.". Does that sound
> good?

I don't think that is 100% correct, because (1) tlist can be NIL for a
join relation, you pointed out upthread, but we need to use
deparseExplicitTargetList, so the first sentence is not completely
correct, and (2) we need to check attrs_needed information not only for
a baserel but for an otherrel, so the second sentence is not completely
correct, either.  How about this, instead?:

     /*
      * For a join relation or an upper relation, use
deparseExplicitTargetList.
      * Likewise, for a base relation that is being deparsed as a
subquery, in
      * which case the caller would have passed tlist that is non-NIL,
use that
      * function.  Otherwise, use deparseTargetList.
      */

>> (3) I don't think we need this in isSubqueryExpr, so I removed it from the
>> patch:
>>
>> +       /* Keep compiler happy. */
>> +       return false;

> Doesn't that cause compiler warning, saying "non-void function
> returning nothing" or something like that. Actually, the "if
> (bms_is_member(node->varno, outerrel->relids))" ends with a "return"
> always. Hence we don't need to encapsulate the code in "else" block in
> else { }. It could be taken outside.

Yeah, I think so too, but I like the "if () { } else { }" coding.  That
coding can be found in other places in core, eg,
operator_same_subexprs_lookup.

>> OK, I changed isSubqueryExpr to is_subquery_expr; I kept to refer to expr
>> because I think we would soon extend that function so that it can handle
>> PHVs, as I said upthread.  For getSubselectAliasInfo, I changed the name to
>> get_subselect_alias_id, because (1) the word "alias" seems general and (2)
>> the "for_var" part would make the name a bit long.

> is_subquery_expr(Var *node -- that looks odd. Either it should
> is_subquery_var(Var * ... OR it should be is_subquery_expr(Expr * ...
> . I would prefer the first one, since that's what current patch is
> doing. When we introduce PHVs, we may change it, if required.

OK, I used is_subquery_var().

>> Done.  I modified the patch as proposed; create the tlist by
>> build_tlist_to_deparse in foreign_join_ok, if needed, and search the tlist
>> by tlist_member.  I also added a new member "tlist" to PgFdwRelationInfo to
>> save the tlist created in foreign_join_ok.

> Instead of adding a new member, you might want to reuse grouped_tlist
> by renaming it.

Done.

>> Another idea on the "tlist" member would be to save a tlist created for
>> EXPLAIN into that member in estimate_patch_cost_size, so that we don't need
>> to generate the tlist again in postgresGetForeignPlan, when
>> use_remote_estimate=true.  But I'd like to leave that for another patch.

> I think that will happen automatically, while deparsing, whether for
> EXPLAIN or for actual execution.

Really?  Anyway, I'd like to leave that as-is.

Please find attached a new version of the patch.

Best regards,
Etsuro Fujita

Attachment

Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
>
>     /*
>      * For a join relation or an upper relation, use
> deparseExplicitTargetList.
>      * Likewise, for a base relation that is being deparsed as a subquery,
> in
>      * which case the caller would have passed tlist that is non-NIL, use
> that
>      * function.  Otherwise, use deparseTargetList.
>      */

This looks correct. I have modified it to make it simple in the given
patch. But, I think we shouldn't refer to a function e.g.
deparseExplicitTargetlist() in the comment. Instead we should describe
the intent e.g. "deparse SELECT clause from the given targetlist" or
"deparse SELECT clause from attr_needed".

>
>>> (3) I don't think we need this in isSubqueryExpr, so I removed it from
>>> the
>>> patch:
>>>
>>> +       /* Keep compiler happy. */
>>> +       return false;
>
>
>> Doesn't that cause compiler warning, saying "non-void function
>> returning nothing" or something like that. Actually, the "if
>> (bms_is_member(node->varno, outerrel->relids))" ends with a "return"
>> always. Hence we don't need to encapsulate the code in "else" block in
>> else { }. It could be taken outside.
>
>
> Yeah, I think so too, but I like the "if () { } else { }" coding.  That
> coding can be found in other places in core, eg,
> operator_same_subexprs_lookup.

OK.


>
>>> Done.  I modified the patch as proposed; create the tlist by
>>> build_tlist_to_deparse in foreign_join_ok, if needed, and search the
>>> tlist
>>> by tlist_member.  I also added a new member "tlist" to PgFdwRelationInfo
>>> to
>>> save the tlist created in foreign_join_ok.
>
>
>> Instead of adding a new member, you might want to reuse grouped_tlist
>> by renaming it.
>
>
> Done.

Right now, we are calculating tlist whether or not the ForeignPath
emerges as the cheapest path. Instead we should calculate tlist, the
first time we need it and then add it to the fpinfo.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
Also another point

I guess, this note doesn't add much value in the given context.
Probably we should remove it.
+            * Note: the tlist would have one-to-one correspondence with the
+            * joining relation's reltarget->exprs because (1) the above test
+            * on PHVs guarantees that the reltarget->exprs doesn't contain
+            * any PHVs and (2) the joining relation's local_conds is NIL.
+            * This allows us to search the targetlist entry matching a given
+            * Var node from the tlist in get_subselect_alias_id.

On Fri, Nov 18, 2016 at 5:10 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>>
>>     /*
>>      * For a join relation or an upper relation, use
>> deparseExplicitTargetList.
>>      * Likewise, for a base relation that is being deparsed as a subquery,
>> in
>>      * which case the caller would have passed tlist that is non-NIL, use
>> that
>>      * function.  Otherwise, use deparseTargetList.
>>      */
>
> This looks correct. I have modified it to make it simple in the given
> patch. But, I think we shouldn't refer to a function e.g.
> deparseExplicitTargetlist() in the comment. Instead we should describe
> the intent e.g. "deparse SELECT clause from the given targetlist" or
> "deparse SELECT clause from attr_needed".
>
>>
>>>> (3) I don't think we need this in isSubqueryExpr, so I removed it from
>>>> the
>>>> patch:
>>>>
>>>> +       /* Keep compiler happy. */
>>>> +       return false;
>>
>>
>>> Doesn't that cause compiler warning, saying "non-void function
>>> returning nothing" or something like that. Actually, the "if
>>> (bms_is_member(node->varno, outerrel->relids))" ends with a "return"
>>> always. Hence we don't need to encapsulate the code in "else" block in
>>> else { }. It could be taken outside.
>>
>>
>> Yeah, I think so too, but I like the "if () { } else { }" coding.  That
>> coding can be found in other places in core, eg,
>> operator_same_subexprs_lookup.
>
> OK.
>
>
>>
>>>> Done.  I modified the patch as proposed; create the tlist by
>>>> build_tlist_to_deparse in foreign_join_ok, if needed, and search the
>>>> tlist
>>>> by tlist_member.  I also added a new member "tlist" to PgFdwRelationInfo
>>>> to
>>>> save the tlist created in foreign_join_ok.
>>
>>
>>> Instead of adding a new member, you might want to reuse grouped_tlist
>>> by renaming it.
>>
>>
>> Done.
>
> Right now, we are calculating tlist whether or not the ForeignPath
> emerges as the cheapest path. Instead we should calculate tlist, the
> first time we need it and then add it to the fpinfo.
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/11/19 0:16, Ashutosh Bapat wrote:
> Also another point
>
> I guess, this note doesn't add much value in the given context.
> Probably we should remove it.
> +            * Note: the tlist would have one-to-one correspondence with the
> +            * joining relation's reltarget->exprs because (1) the above test
> +            * on PHVs guarantees that the reltarget->exprs doesn't contain
> +            * any PHVs and (2) the joining relation's local_conds is NIL.
> +            * This allows us to search the targetlist entry matching a given
> +            * Var node from the tlist in get_subselect_alias_id.

OK, I removed.

> On Fri, Nov 18, 2016 at 5:10 PM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:

I wrote:
>>>     /*
>>>      * For a join relation or an upper relation, use
>>> deparseExplicitTargetList.
>>>      * Likewise, for a base relation that is being deparsed as a subquery,
>>> in
>>>      * which case the caller would have passed tlist that is non-NIL, use
>>> that
>>>      * function.  Otherwise, use deparseTargetList.
>>>      */

>> This looks correct. I have modified it to make it simple in the given
>> patch. But, I think we shouldn't refer to a function e.g.
>> deparseExplicitTargetlist() in the comment. Instead we should describe
>> the intent e.g. "deparse SELECT clause from the given targetlist" or
>> "deparse SELECT clause from attr_needed".

My taste would be to refer to those functions, because ISTM that makes
the explanation more simple and direct.  So, I'd like to leave that for
the committer's judge.

I wrote:
>>>>> Done.  I modified the patch as proposed; create the tlist by
>>>>> build_tlist_to_deparse in foreign_join_ok, if needed, and search the
>>>>> tlist
>>>>> by tlist_member.  I also added a new member "tlist" to PgFdwRelationInfo
>>>>> to
>>>>> save the tlist created in foreign_join_ok.

You wrote:
>>>> Instead of adding a new member, you might want to reuse grouped_tlist
>>>> by renaming it.

>>> Done.

>> Right now, we are calculating tlist whether or not the ForeignPath
>> emerges as the cheapest path.

Yeah, I modified the patch so, as I thought that would be consistent
with the aggregate pushdown patch.

>> Instead we should calculate tlist, the
>> first time we need it and then add it to the fpinfo.

Having said that, I agree on that point.  I'd like to propose (1) adding
a new member to fpinfo, to store a list of output Vars from the
subquery, and (2) creating a tlist from it in deparseRangeTblRef, then,
which would allow us to calculate the tlist only when we need it.  The
member added to fpinfo would be also useful to address the comments on
the DML/UPDATE pushdown patch.  See the attached patch in [1].  I named
the member a bit differently in that patch, though.

You modified the comments I added to deparseLockingClause into this:

         /*
+        * Ignore relation if it appears in a lower subquery. Locking clause
+        * for such a relation is included in the subquery.
+        */

I don't think the second sentence is 100% correct because a locking
clause isn't always required for such a relation, so I modified the
sentence a bit.

Best regards,
Etsuro Fujita

[1]
https://www.postgresql.org/message-id/38245b84-fabf-0899-1b24-8f94cdc5900c%40lab.ntt.co.jp

Attachment

Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
>
> Yeah, I modified the patch so, as I thought that would be consistent with
> the aggregate pushdown patch.

aggregate pushdown needs the tlist to judge the shippability of
targetlist. For joins that's not required, so we should defer, if we
can.

>
>>> Instead we should calculate tlist, the
>>> first time we need it and then add it to the fpinfo.
>
>
> Having said that, I agree on that point.  I'd like to propose (1) adding a
> new member to fpinfo, to store a list of output Vars from the subquery, and
> (2) creating a tlist from it in deparseRangeTblRef, then, which would allow
> us to calculate the tlist only when we need it.  The member added to fpinfo
> would be also useful to address the comments on the DML/UPDATE pushdown
> patch.  See the attached patch in [1].  I named the member a bit differently
> in that patch, though.

Again the list of Vars will be wasted if we don't choose that path for
final planning. So, I don't see the point of adding list of Vars
there. It looks like we are replacing one list with the other when
none of those are useful, if the path doesn't get chosen for the final
plan. If you think that the member is useful for DML/UDPATE pushdown,
you may want to add it in the other patch.

>
> You modified the comments I added to deparseLockingClause into this:
>
>         /*
> +        * Ignore relation if it appears in a lower subquery. Locking clause
> +        * for such a relation is included in the subquery.
> +        */
>
> I don't think the second sentence is 100% correct because a locking clause
> isn't always required for such a relation, so I modified the sentence a bit.
>

I guess, "if required" part was implicit in construct "such a
relation". Your version seems to make it explicit. I am fine with it.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/11/21 22:02, Ashutosh Bapat wrote:

You wrote:
>>>> Instead we should calculate tlist, the
>>>> first time we need it and then add it to the fpinfo.

I wrote:
>> Having said that, I agree on that point.  I'd like to propose (1) adding a
>> new member to fpinfo, to store a list of output Vars from the subquery, and
>> (2) creating a tlist from it in deparseRangeTblRef, then, which would allow
>> us to calculate the tlist only when we need it.  The member added to fpinfo
>> would be also useful to address the comments on the DML/UPDATE pushdown
>> patch.  See the attached patch in [1].  I named the member a bit differently
>> in that patch, though.

> Again the list of Vars will be wasted if we don't choose that path for
> final planning. So, I don't see the point of adding list of Vars
> there.

> If you think that the member is useful for DML/UDPATE pushdown,
> you may want to add it in the other patch.

OK, I'd like to propose referencing to foreignrel->reltarget->exprs
directly in deparseRangeTblRef and get_subselect_alias_id, then, which
is the same as what I proposed in the first version of the patch, but
I'd also like to change deparseRangeTblRef to use add_to_flat_tlist, not
make_tlist_from_pathtarget, to create a tlist of the subquery, as you
proposed.

>> You modified the comments I added to deparseLockingClause into this:
>>
>>         /*
>> +        * Ignore relation if it appears in a lower subquery. Locking clause
>> +        * for such a relation is included in the subquery.
>> +        */
>>
>> I don't think the second sentence is 100% correct because a locking clause
>> isn't always required for such a relation, so I modified the sentence a bit.

> I guess, "if required" part was implicit in construct "such a
> relation". Your version seems to make it explicit. I am fine with it.

OK, let's leave that for the committer's judge.

Please find attached an updated version of the patch.

Best regards,
Etsuro Fujita

Attachment

Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
The comments should explain why is the assertion true.
+        /* Shouldn't be NIL */
+        Assert(tlist != NIL);
+        /* Should be same length */
+        Assert(list_length(tlist) ==
list_length(foreignrel->reltarget->exprs));


>
> OK, I'd like to propose referencing to foreignrel->reltarget->exprs directly
> in deparseRangeTblRef and get_subselect_alias_id, then, which is the same as
> what I proposed in the first version of the patch, but I'd also like to
> change deparseRangeTblRef to use add_to_flat_tlist, not
> make_tlist_from_pathtarget, to create a tlist of the subquery, as you
> proposed.

There is a already a function to build targetlist for a given relation
build_tlist_to_deparse(), which does the same thing as this code for a join or
base relation and when there are no local conditions. Why don't we use that
function instead of duplicating that logic? If tomorrow, the logic to build the
targetlist changes, we will need to duplicate those changes. We should avoid
that.
+        /* Build a tlist from the subquery. */
+        tlist = add_to_flat_tlist(tlist, foreignrel->reltarget->exprs);

The comment below says "get the aliases", but what the function really returns
is the identifiers used for creating aliases. Please correct the comment.
+/*
+ * Get the relation and column alias for a given Var node, which belongs to
+ * input foreignrel. They are returned in *tabno and *colno respectively.
+ */

We discussed that we have to deparse and search from the same targetlist. And
that the targetlist should be saved in fpinfo, the first time it gets created.
But the patch seems to be searching in foreignrel->reltarget->exprs and
deparsing from the tlist returned by add_to_flat_tlist(tlist,
foreignrel->reltarget->exprs).
+    foreach(lc, foreignrel->reltarget->exprs)
+    {
+        if (equal(lfirst(lc), (Node *) node))
+        {
+            *colno = i;
+            return;
+        }
+        i++;
+    }
I guess, the reason why you are doing it this way, is SELECT clause for the
outermost query gets deparsed before FROM clause. For later we call
deparseRangeTblRef(), which builds the tlist. So, while deparsing SELECT
clause, we do not have tlist to build from. In that case, I guess, we have to
build the tlist in get_subselect_alias_id() if it's not available and stick it
in fpinfo. Subsequent calls to get_subselect_alias_id() should find tlist
there. Then in deparseRangeTblRef() assert that there's a tlist in fpinfo
and use it to build the SELECT clause of subquery. That way, we don't build
tlist unless it's needed and also use the same tlist for all searches. Please
use tlist_member() to search into the tlist.

The name get_subselect_alias_id() seems to suggest that the function returns
identifier for subselect alias, which isn't correct. It returns the alias
identifiers for deparsing a Var node. But I guess, we have left this to the
committer's judgement.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/11/22 18:28, Ashutosh Bapat wrote:
> The comments should explain why is the assertion true.
> +        /* Shouldn't be NIL */
> +        Assert(tlist != NIL);
> +        /* Should be same length */
> +        Assert(list_length(tlist) ==
> list_length(foreignrel->reltarget->exprs));

Will revise.

>> OK, I'd like to propose referencing to foreignrel->reltarget->exprs directly
>> in deparseRangeTblRef and get_subselect_alias_id, then, which is the same as
>> what I proposed in the first version of the patch, but I'd also like to
>> change deparseRangeTblRef to use add_to_flat_tlist, not
>> make_tlist_from_pathtarget, to create a tlist of the subquery, as you
>> proposed.

> There is a already a function to build targetlist for a given relation
> build_tlist_to_deparse(), which does the same thing as this code for a join or
> base relation and when there are no local conditions. Why don't we use that
> function instead of duplicating that logic? If tomorrow, the logic to build the
> targetlist changes, we will need to duplicate those changes. We should avoid
> that.
> +        /* Build a tlist from the subquery. */
> +        tlist = add_to_flat_tlist(tlist, foreignrel->reltarget->exprs);

Will modify the patch to use build_tlist_to_deparse.  Actually, the 
early versions of the patch used that function, but it looks like I 
changed not to use that function, as I misunderstood about your comments 
on this part at some point.  Sorry for that.

> The comment below says "get the aliases", but what the function really returns
> is the identifiers used for creating aliases. Please correct the comment.
> +/*
> + * Get the relation and column alias for a given Var node, which belongs to
> + * input foreignrel. They are returned in *tabno and *colno respectively.
> + */

Actually, this was rewritten into the above by *you*.  The original 
comment I added was:

+ /*
+  * Get info about the subselect alias to given expression.
+  *
+  * The subselect table and column numbers are returned to *tabno and 
*colno,
+  * respectively.
+  */

I'd like to change the comment into something like the original one.

> We discussed that we have to deparse and search from the same targetlist. And
> that the targetlist should be saved in fpinfo, the first time it gets created.
> But the patch seems to be searching in foreignrel->reltarget->exprs and
> deparsing from the tlist returned by add_to_flat_tlist(tlist,
> foreignrel->reltarget->exprs).
> +    foreach(lc, foreignrel->reltarget->exprs)
> +    {
> +        if (equal(lfirst(lc), (Node *) node))
> +        {
> +            *colno = i;
> +            return;
> +        }
> +        i++;
> +    }
> I guess, the reason why you are doing it this way, is SELECT clause for the
> outermost query gets deparsed before FROM clause. For later we call
> deparseRangeTblRef(), which builds the tlist. So, while deparsing SELECT
> clause, we do not have tlist to build from.

That's right.

> In that case, I guess, we have to
> build the tlist in get_subselect_alias_id() if it's not available and stick it
> in fpinfo. Subsequent calls to get_subselect_alias_id() should find tlist
> there. Then in deparseRangeTblRef() assert that there's a tlist in fpinfo
> and use it to build the SELECT clause of subquery. That way, we don't build
> tlist unless it's needed and also use the same tlist for all searches. Please
> use tlist_member() to search into the tlist.

I don't think that's a good idea because that would make the code 
unnecessarily complicated and inefficient.  I think the direct search 
into the foreignrel->reltarget->exprs shown above would be better 
because that's more simple and efficient than what you proposed.  Note 
that since (1) the foreignrel->reltarget->exprs doesn't contain any PHVs 
and (2) the local_conds is empty, the tlist created for the subquery by 
build_tlist_to_deparse is guaranteed to have one-to-one correspondence 
with the foreignrel->reltarget->exprs, so the direct search works well.

> The name get_subselect_alias_id() seems to suggest that the function returns
> identifier for subselect alias, which isn't correct. It returns the alias
> identifiers for deparsing a Var node. But I guess, we have left this to the
> committer's judgement.

Fine with me.

Best regards,
Etsuro Fujita





Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
>
>
>> There is a already a function to build targetlist for a given relation
>> build_tlist_to_deparse(), which does the same thing as this code for a
>> join or
>> base relation and when there are no local conditions. Why don't we use
>> that
>> function instead of duplicating that logic? If tomorrow, the logic to
>> build the
>> targetlist changes, we will need to duplicate those changes. We should
>> avoid
>> that.
>> +        /* Build a tlist from the subquery. */
>> +        tlist = add_to_flat_tlist(tlist, foreignrel->reltarget->exprs);
>
>
> Will modify the patch to use build_tlist_to_deparse.  Actually, the early
> versions of the patch used that function, but it looks like I changed not to
> use that function, as I misunderstood about your comments on this part at
> some point.  Sorry for that.
>
>> The comment below says "get the aliases", but what the function really
>> returns
>> is the identifiers used for creating aliases. Please correct the comment.
>> +/*
>> + * Get the relation and column alias for a given Var node, which belongs
>> to
>> + * input foreignrel. They are returned in *tabno and *colno respectively.
>> + */
>
>
> Actually, this was rewritten into the above by *you*.  The original comment
> I added was:
>
> + /*
> +  * Get info about the subselect alias to given expression.
> +  *
> +  * The subselect table and column numbers are returned to *tabno and
> *colno,
> +  * respectively.
> +  */
>
> I'd like to change the comment into something like the original one.

Sorry. I think the current version is better than previous one. The
term "subselect alias" is confusing in the previous version. In the
current version, "Get the relation and column alias for a given Var
node," we need to add word "identifiers" like "Get the relation and
column identifiers for a given Var node".

>
>> We discussed that we have to deparse and search from the same targetlist.
>> And
>> that the targetlist should be saved in fpinfo, the first time it gets
>> created.
>> But the patch seems to be searching in foreignrel->reltarget->exprs and
>> deparsing from the tlist returned by add_to_flat_tlist(tlist,
>> foreignrel->reltarget->exprs).
>> +    foreach(lc, foreignrel->reltarget->exprs)
>> +    {
>> +        if (equal(lfirst(lc), (Node *) node))
>> +        {
>> +            *colno = i;
>> +            return;
>> +        }
>> +        i++;
>> +    }
>> I guess, the reason why you are doing it this way, is SELECT clause for
>> the
>> outermost query gets deparsed before FROM clause. For later we call
>> deparseRangeTblRef(), which builds the tlist. So, while deparsing SELECT
>> clause, we do not have tlist to build from.
>
>
> That's right.
>
>> In that case, I guess, we have to
>> build the tlist in get_subselect_alias_id() if it's not available and
>> stick it
>> in fpinfo. Subsequent calls to get_subselect_alias_id() should find tlist
>> there. Then in deparseRangeTblRef() assert that there's a tlist in fpinfo
>> and use it to build the SELECT clause of subquery. That way, we don't
>> build
>> tlist unless it's needed and also use the same tlist for all searches.
>> Please
>> use tlist_member() to search into the tlist.
>
>
> I don't think that's a good idea because that would make the code
> unnecessarily complicated and inefficient.  I think the direct search into
> the foreignrel->reltarget->exprs shown above would be better because that's
> more simple and efficient than what you proposed.  Note that since (1) the
> foreignrel->reltarget->exprs doesn't contain any PHVs and (2) the
> local_conds is empty, the tlist created for the subquery by
> build_tlist_to_deparse is guaranteed to have one-to-one correspondence with
> the foreignrel->reltarget->exprs, so the direct search works well.

If we deparse from and search into different objects, that's not very
maintainable code. Changes to any one of them require changes to the
other, thus creating avenues for bugs. Even if slighly expensive, we
should search into and deparse from the same targetlist. I think I
have explained this before.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/11/23 0:30, Ashutosh Bapat wrote:

You wrote:
>>> The comment below says "get the aliases", but what the function really
>>> returns
>>> is the identifiers used for creating aliases. Please correct the comment.
>>> +/*
>>> + * Get the relation and column alias for a given Var node, which belongs
>>> to
>>> + * input foreignrel. They are returned in *tabno and *colno respectively.
>>> + */

I wrote:
>> Actually, this was rewritten into the above by *you*.  The original comment
>> I added was:
>>
>> + /*
>> +  * Get info about the subselect alias to given expression.
>> +  *
>> +  * The subselect table and column numbers are returned to *tabno and
>> *colno,
>> +  * respectively.
>> +  */
>>
>> I'd like to change the comment into something like the original one.

> Sorry. I think the current version is better than previous one. The
> term "subselect alias" is confusing in the previous version. In the
> current version, "Get the relation and column alias for a given Var
> node," we need to add word "identifiers" like "Get the relation and
> column identifiers for a given Var node".

OK, but one thing I'm concerned about is the term "relation alias" seems 
a bit confusing because we already used the term for the alias of the 
form "rN".  To avoid that, how about saying "table alias", not "relation 
alias"?  (in which case, the comment would be something like "Get the 
table and column identifiers for a given Var node".)

>>> We discussed that we have to deparse and search from the same targetlist.
>>> And
>>> that the targetlist should be saved in fpinfo, the first time it gets
>>> created.
>>> But the patch seems to be searching in foreignrel->reltarget->exprs and
>>> deparsing from the tlist returned by add_to_flat_tlist(tlist,
>>> foreignrel->reltarget->exprs).
>>> +    foreach(lc, foreignrel->reltarget->exprs)
>>> +    {
>>> +        if (equal(lfirst(lc), (Node *) node))
>>> +        {
>>> +            *colno = i;
>>> +            return;
>>> +        }
>>> +        i++;
>>> +    }
>>> I guess, the reason why you are doing it this way, is SELECT clause for
>>> the
>>> outermost query gets deparsed before FROM clause. For later we call
>>> deparseRangeTblRef(), which builds the tlist. So, while deparsing SELECT
>>> clause, we do not have tlist to build from.

>> That's right.

>>> In that case, I guess, we have to
>>> build the tlist in get_subselect_alias_id() if it's not available and
>>> stick it
>>> in fpinfo. Subsequent calls to get_subselect_alias_id() should find tlist
>>> there. Then in deparseRangeTblRef() assert that there's a tlist in fpinfo
>>> and use it to build the SELECT clause of subquery. That way, we don't
>>> build
>>> tlist unless it's needed and also use the same tlist for all searches.
>>> Please
>>> use tlist_member() to search into the tlist.

>> I don't think that's a good idea because that would make the code
>> unnecessarily complicated and inefficient.  I think the direct search into
>> the foreignrel->reltarget->exprs shown above would be better because that's
>> more simple and efficient than what you proposed.  Note that since (1) the
>> foreignrel->reltarget->exprs doesn't contain any PHVs and (2) the
>> local_conds is empty, the tlist created for the subquery by
>> build_tlist_to_deparse is guaranteed to have one-to-one correspondence with
>> the foreignrel->reltarget->exprs, so the direct search works well.

> If we deparse from and search into different objects, that's not very
> maintainable code. Changes to any one of them require changes to the
> other, thus creating avenues for bugs.  Even if slighly expensive, we
> should search into and deparse from the same targetlist.

I don't think so; in the current version, we essentially deparse from 
and search into the same object, ie, foreignrel->reltarget->exprs, since 
the tlist created by build_tlist_to_deparse is build from the 
foreignrel->reltarget->exprs.  Also, since it is just created for 
deparsing the relation as a subquery in deparseRangeTblRef and isn't 
stored into fpinfo or anywhere alse, we would need no concern about 
creating such avenues.  IMO I think adding comments would be enough for 
this.  Anyway, I think this is an optional issue, so I'd like to leave 
this for the committer's judge.

> I think I
> have explained this before.

My apologies for having misunderstood your words.

Best regards,
Etsuro Fujita





Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
>
>> Sorry. I think the current version is better than previous one. The
>> term "subselect alias" is confusing in the previous version. In the
>> current version, "Get the relation and column alias for a given Var
>> node," we need to add word "identifiers" like "Get the relation and
>> column identifiers for a given Var node".
>
>
> OK, but one thing I'm concerned about is the term "relation alias" seems a
> bit confusing because we already used the term for the alias of the form
> "rN".  To avoid that, how about saying "table alias", not "relation alias"?
> (in which case, the comment would be something like "Get the table and
> column identifiers for a given Var node".)

table will be misleading as subquery can represent a join and
corresponding alias would represent the join. Relation is better term.


>
>
>> If we deparse from and search into different objects, that's not very
>> maintainable code. Changes to any one of them require changes to the
>> other, thus creating avenues for bugs.  Even if slighly expensive, we
>> should search into and deparse from the same targetlist.
>
>
> I don't think so; in the current version, we essentially deparse from and
> search into the same object, ie, foreignrel->reltarget->exprs, since the
> tlist created by build_tlist_to_deparse is build from the
> foreignrel->reltarget->exprs.  Also, since it is just created for deparsing
> the relation as a subquery in deparseRangeTblRef and isn't stored into
> fpinfo or anywhere alse, we would need no concern about creating such
> avenues.  IMO I think adding comments would be enough for this.

build_tlist_to_depase() calls pull_var_nodes() before creating the
tlist, whereas the code that searches does not do that. Code-wise
those are not the same things.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/11/24 16:46, Ashutosh Bapat wrote:
>>> Sorry. I think the current version is better than previous one. The
>>> term "subselect alias" is confusing in the previous version. In the
>>> current version, "Get the relation and column alias for a given Var
>>> node," we need to add word "identifiers" like "Get the relation and
>>> column identifiers for a given Var node".

I wrote:
>> OK, but one thing I'm concerned about is the term "relation alias" seems a
>> bit confusing because we already used the term for the alias of the form
>> "rN".  To avoid that, how about saying "table alias", not "relation alias"?
>> (in which case, the comment would be something like "Get the table and
>> column identifiers for a given Var node".)

> table will be misleading as subquery can represent a join and
> corresponding alias would represent the join. Relation is better term.

But the documentation uses the word "table" for a join relation.  See 
7.2.1.2. Table and Column Aliases:
https://www.postgresql.org/docs/9.6/static/queries-table-expressions.html#QUERIES-TABLE-ALIASES

>> I don't think so; in the current version, we essentially deparse from and
>> search into the same object, ie, foreignrel->reltarget->exprs, since the
>> tlist created by build_tlist_to_deparse is build from the
>> foreignrel->reltarget->exprs.  Also, since it is just created for deparsing
>> the relation as a subquery in deparseRangeTblRef and isn't stored into
>> fpinfo or anywhere alse, we would need no concern about creating such
>> avenues.  IMO I think adding comments would be enough for this.

> build_tlist_to_depase() calls pull_var_nodes() before creating the
> tlist, whereas the code that searches does not do that. Code-wise
> those are not the same things.

You missed the point; the foreignrel->reltarget->exprs doesn't contain 
any PHVs, so the tlist created by build_tlist_to_depase will be 
guaranteed to be one-to-one with the foreignrel->reltarget->exprs.

Best regards,
Etsuro Fujita





Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
On Thu, Nov 24, 2016 at 1:27 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2016/11/24 16:46, Ashutosh Bapat wrote:
>>>>
>>>> Sorry. I think the current version is better than previous one. The
>>>> term "subselect alias" is confusing in the previous version. In the
>>>> current version, "Get the relation and column alias for a given Var
>>>> node," we need to add word "identifiers" like "Get the relation and
>>>> column identifiers for a given Var node".
>
>
> I wrote:
>>>
>>> OK, but one thing I'm concerned about is the term "relation alias" seems
>>> a
>>> bit confusing because we already used the term for the alias of the form
>>> "rN".  To avoid that, how about saying "table alias", not "relation
>>> alias"?
>>> (in which case, the comment would be something like "Get the table and
>>> column identifiers for a given Var node".)
>
>
>> table will be misleading as subquery can represent a join and
>> corresponding alias would represent the join. Relation is better term.
>
>
> But the documentation uses the word "table" for a join relation.  See
> 7.2.1.2. Table and Column Aliases:
> https://www.postgresql.org/docs/9.6/static/queries-table-expressions.html#QUERIES-TABLE-ALIASES

The definition there says "A temporary name can be given to tables and
complex table references to be used for references to the derived
table in the rest of the query. This is called a table alias.", please
note the usage "complex table references". Within the code, we do not
refer to a relation as table. So, the comment in this code should not
refer to a relation as table either.

>
>>> I don't think so; in the current version, we essentially deparse from and
>>> search into the same object, ie, foreignrel->reltarget->exprs, since the
>>> tlist created by build_tlist_to_deparse is build from the
>>> foreignrel->reltarget->exprs.  Also, since it is just created for
>>> deparsing
>>> the relation as a subquery in deparseRangeTblRef and isn't stored into
>>> fpinfo or anywhere alse, we would need no concern about creating such
>>> avenues.  IMO I think adding comments would be enough for this.
>
>
>> build_tlist_to_depase() calls pull_var_nodes() before creating the
>> tlist, whereas the code that searches does not do that. Code-wise
>> those are not the same things.
>
>
> You missed the point; the foreignrel->reltarget->exprs doesn't contain any
> PHVs, so the tlist created by build_tlist_to_depase will be guaranteed to be
> one-to-one with the foreignrel->reltarget->exprs.

It's guaranteed now, but can not be forever. This means anybody who
supports PHV tomorrow, will have to "remember" to update this code as
well. If s/he misses it, a bug will be introduced. That's the avenue
for bug, I am talking about.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/11/24 17:39, Ashutosh Bapat wrote:
> On Thu, Nov 24, 2016 at 1:27 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> On 2016/11/24 16:46, Ashutosh Bapat wrote:
>>> table will be misleading as subquery can represent a join and
>>> corresponding alias would represent the join. Relation is better term.

>> But the documentation uses the word "table" for a join relation.  See
>> 7.2.1.2. Table and Column Aliases:
>> https://www.postgresql.org/docs/9.6/static/queries-table-expressions.html#QUERIES-TABLE-ALIASES

> The definition there says "A temporary name can be given to tables and
> complex table references to be used for references to the derived
> table in the rest of the query. This is called a table alias.", please
> note the usage "complex table references". Within the code, we do not
> refer to a relation as table. So, the comment in this code should not
> refer to a relation as table either.

OK, will keep using the term "relation".

I wrote:
>>>> I don't think so; in the current version, we essentially deparse from and
>>>> search into the same object, ie, foreignrel->reltarget->exprs, since the
>>>> tlist created by build_tlist_to_deparse is build from the
>>>> foreignrel->reltarget->exprs.  Also, since it is just created for
>>>> deparsing
>>>> the relation as a subquery in deparseRangeTblRef and isn't stored into
>>>> fpinfo or anywhere alse, we would need no concern about creating such
>>>> avenues.  IMO I think adding comments would be enough for this.

>>> build_tlist_to_depase() calls pull_var_nodes() before creating the
>>> tlist, whereas the code that searches does not do that. Code-wise
>>> those are not the same things.

>> You missed the point; the foreignrel->reltarget->exprs doesn't contain any
>> PHVs, so the tlist created by build_tlist_to_depase will be guaranteed to be
>> one-to-one with the foreignrel->reltarget->exprs.

> It's guaranteed now, but can not be forever. This means anybody who
> supports PHV tomorrow, will have to "remember" to update this code as
> well. If s/he misses it, a bug will be introduced. That's the avenue
> for bug, I am talking about.

It *can* be guaranteed.  See another patch for supporting evaluating 
PHVs remotely.

Best regards,
Etsuro Fujita





Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
>
>
>>>> build_tlist_to_depase() calls pull_var_nodes() before creating the
>>>> tlist, whereas the code that searches does not do that. Code-wise
>>>> those are not the same things.
>
>
>>> You missed the point; the foreignrel->reltarget->exprs doesn't contain
>>> any
>>> PHVs, so the tlist created by build_tlist_to_depase will be guaranteed to
>>> be
>>> one-to-one with the foreignrel->reltarget->exprs.
>
>
>> It's guaranteed now, but can not be forever. This means anybody who
>> supports PHV tomorrow, will have to "remember" to update this code as
>> well. If s/he misses it, a bug will be introduced. That's the avenue
>> for bug, I am talking about.
>
>
> It *can* be guaranteed.  See another patch for supporting evaluating PHVs
> remotely.

We can't base assumptions in this patch on something in the other
patch, esp. when that's not even reviewed once. PHV is just one case,
subqueries involving aggregates is other and there are others. But
that's not really the point. The point is
add_to_flat_tlist(pull_var_clause(rel->reltarget->exprs) can not be
proved to be same as rel->reltarget->exprs in general. So, we should
base our code on an assumption that can not be proved.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/11/24 18:20, Ashutosh Bapat wrote:

I wrote:
>>>> You missed the point; the foreignrel->reltarget->exprs doesn't contain
>>>> any
>>>> PHVs, so the tlist created by build_tlist_to_depase will be guaranteed to
>>>> be
>>>> one-to-one with the foreignrel->reltarget->exprs.

You wrote:
>>> It's guaranteed now, but can not be forever. This means anybody who
>>> supports PHV tomorrow, will have to "remember" to update this code as
>>> well. If s/he misses it, a bug will be introduced. That's the avenue
>>> for bug, I am talking about.

I wrote:
>> It *can* be guaranteed.  See another patch for supporting evaluating PHVs
>> remotely.

> We can't base assumptions in this patch on something in the other
> patch, esp. when that's not even reviewed once. PHV is just one case,
> subqueries involving aggregates is other and there are others.

Let's discuss this together with the patch for PHVs.  That was one of 
the reasons why I had merged the two patches into one.  I'd like to 
leave enhancements such as subqueries involving aggregates for future 
work, though.

> But
> that's not really the point. The point is
> add_to_flat_tlist(pull_var_clause(rel->reltarget->exprs) can not be
> proved to be same as rel->reltarget->exprs in general.

Really?  As mentioned in comments for RelOptInfo in relation.h shown 
below, the rel->reltarget->exprs for a base/join relation only contains 
Vars and PHVs, so the two things would be proved to be one-to-one. 
(Note: we don't need to care about the appendrel-child-relation case 
because appendrel child relations wouldn't appear in the main join tree.)
 *      reltarget - Default Path output tlist for this rel; normally 
contains *                  Var and PlaceHolderVar nodes for the values we need to *                  output from this
relation.*                  List is in no particular order, but all rels of an *                  appendrel set must
usecorresponding orders. *                  NOTE: in an appendrel child relation, may contain *
arbitraryexpressions pulled up from a subquery!
 

Best regards,
Etsuro Fujita





Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/11/22 18:28, Ashutosh Bapat wrote:
> The comments should explain why is the assertion true.
> +        /* Shouldn't be NIL */
> +        Assert(tlist != NIL);

I noticed that I was wrong; in the Assertion the tlist can be empty.  An
example for such a case is:

SELECT 1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1 FULL
JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t2 ON (TRUE);

In this example any columns of the relations ft4 and ft5 wouldn't be
needed for the join or the final output, so both the tlists for the
relations created in deparseRangeTblRef would be empty.  Attached is an
updated version fixing this bug.  Changes are:

* I removed make_outerrel_subquery/make_innerrel_subquery from fpinfo
and added a new member "is_subquery_rel" to fpinfo, as a flag on whether
to deparse the relation as a subquery.
* I modified deparseRangeTblRef, deparseSelectSql, and is_subquery_var
to see the is_subquery_rel, not
make_outerrel_subquery/make_innerrel_subquery.
* I modified appendSubselectAlias to handle the case where ncols = 0
properly.
* I renamed subquery_rels in fpinfo to lower_subquery_rels, to make it
easy to distinguish this from is_subquery_rel clearly.
* I added regression tests for that.

The rest of the patch is the same as the previous version, but:

> +        /* Should be same length */
> +        Assert(list_length(tlist) ==
> list_length(foreignrel->reltarget->exprs));

I added comments explaining the reason.

> The name get_subselect_alias_id() seems to suggest that the function returns
> identifier for subselect alias, which isn't correct. It returns the alias
> identifiers for deparsing a Var node. But I guess, we have left this to the
> committer's judgement.

I agree that the name isn't good, so I changed it to
get_relation_column_alias_ids().  Let me know if it may be better.

I also revised code and comments a bit, just for consistency.

I will update another patch for PHVs on top of the attached one.

Best regards,
Etsuro Fujita

Attachment

Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/11/24 21:45, Etsuro Fujita wrote:
> On 2016/11/22 18:28, Ashutosh Bapat wrote:
>> The comments should explain why is the assertion true.
>> +        /* Shouldn't be NIL */
>> +        Assert(tlist != NIL);

> I noticed that I was wrong; in the Assertion the tlist can be empty.  An
> example for such a case is:
>
> SELECT 1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1 FULL
> JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t2 ON (TRUE);
>
> In this example any columns of the relations ft4 and ft5 wouldn't be
> needed for the join or the final output, so both the tlists for the
> relations created in deparseRangeTblRef would be empty.  Attached is an
> updated version fixing this bug.  Changes are:
>
> * I removed make_outerrel_subquery/make_innerrel_subquery from fpinfo
> and added a new member "is_subquery_rel" to fpinfo, as a flag on whether
> to deparse the relation as a subquery.
> * I modified deparseRangeTblRef, deparseSelectSql, and is_subquery_var
> to see the is_subquery_rel, not
> make_outerrel_subquery/make_innerrel_subquery.
> * I modified appendSubselectAlias to handle the case where ncols = 0
> properly.
> * I renamed subquery_rels in fpinfo to lower_subquery_rels, to make it
> easy to distinguish this from is_subquery_rel clearly.
> * I added regression tests for that.

I noticed that the above changes are not adequate; the updated patch
breaks EXPLAIN outputs for EvalPlanQual subplans of foreign joins in
which foreign tables are full joined and in the remote join query the
foreign tables are deparsed as subqueries.  Consider:

postgres=# explain verbose select * from test, ((select * from ft1 where
b between 1 and 3) t1 full join (select * from ft2 where b between 1 and
3) t2
  on (t1.a = t2.a)) ss for update of test;


QUERY PLA
N

-------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------
  LockRows  (cost=100.00..287.33 rows=6780 width=94)
    Output: test.a, test.b, ft1.a, ft1.b, ft2.a, ft2.b, test.ctid,
ft1.*, ft2.*
    ->  Nested Loop  (cost=100.00..219.53 rows=6780 width=94)
          Output: test.a, test.b, ft1.a, ft1.b, ft2.a, ft2.b, test.ctid,
ft1.*, ft2.*
          ->  Seq Scan on public.test  (cost=0.00..32.60 rows=2260 width=14)
                Output: test.a, test.b, test.ctid
          ->  Materialize  (cost=100.00..102.19 rows=3 width=80)
                Output: ft1.a, ft1.b, ft1.*, ft2.a, ft2.b, ft2.*
                ->  Foreign Scan  (cost=100.00..102.17 rows=3 width=80)
                      Output: ft1.a, ft1.b, ft1.*, ft2.a, ft2.b, ft2.*
                      Relations: (public.ft1) FULL JOIN (public.ft2)
                      Remote SQL: SELECT s5.c1, s5.c2, s5.c3, s6.c1,
s6.c2, s6.c3 FROM ((SELECT a, b, ROW(a, b) FROM public.t1 WHERE ((b >=
1)) AND ((b
<= 3))) s5(c1, c2, c3) FULL JOIN (SELECT a, b, ROW(a, b) FROM public.t2
WHERE ((b >= 1)) AND ((b <= 3))) s6(c1, c2, c3) ON (((s5.c1 = s6.c1))))
                      ->  Hash Full Join  (cost=201.14..202.29 rows=3
width=80)
                            Output: ft1.a, ft1.b, ft1.*, ft2.a, ft2.b, ft2.*
                            Hash Cond: (ft1.a = ft2.a)
                            ->  Foreign Scan on public.ft1
(cost=100.00..101.11 rows=3 width=40)
                                  Output: ft1.a, ft1.b, ft1.*
                                  Remote SQL: SELECT NULL FROM public.t1
WHERE ((b >= 1)) AND ((b <= 3))
                            ->  Hash  (cost=101.11..101.11 rows=3 width=40)
                                  Output: ft2.a, ft2.b, ft2.*
                                  ->  Foreign Scan on public.ft2
(cost=100.00..101.11 rows=3 width=40)
                                        Output: ft2.a, ft2.b, ft2.*
                                        Remote SQL: SELECT NULL FROM
public.t2 WHERE ((b >= 1)) AND ((b <= 3))
(23 rows)

The SELECT clauses of the remote queries for the Foreign Scans in the
EPQ subplan are deparsed incorrectly into "SELECT NULL".  The reason for
that is that since the foreign tables' fpinfo->is_subquery_rel has been
set true at the time those clauses are deparsed (due to that the foreign
tables have been deparsed as subqueries in the main remote join query),
deparseSelectSql calls deparseExplicitTargetList with an empty tlist, to
construct the clauses, so it deparses the tlist into "NULL".  This is
harmless because we don't use the remote queries for the Foreign Scans
during an EPQ recheck, but that would be confusing for users, so I
modified the patch a bit further to make the EXPLAIN output as-is.
Attached is a new version of the patch.

> I also revised code and comments a bit, just for consistency.

Also, I renamed appendSubselectAlias to appendSubqueryAlias, because we
use the term "subquery" for other places.

> I will update another patch for PHVs on top of the attached one.

Done.  I also revised some code and comments.  I'm also attaching the
updated version of the patch for PHVs.

Best regards,
Etsuro Fujita

Attachment

Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
On Mon, Nov 28, 2016 at 3:52 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2016/11/24 21:45, Etsuro Fujita wrote:
>>
>> On 2016/11/22 18:28, Ashutosh Bapat wrote:
>>>
>>> The comments should explain why is the assertion true.
>>> +        /* Shouldn't be NIL */
>>> +        Assert(tlist != NIL);
>
>
>> I noticed that I was wrong; in the Assertion the tlist can be empty.  An
>> example for such a case is:
>>
>> SELECT 1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1 FULL
>> JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t2 ON (TRUE);
>>
>> In this example any columns of the relations ft4 and ft5 wouldn't be
>> needed for the join or the final output, so both the tlists for the
>> relations created in deparseRangeTblRef would be empty.  Attached is an
>> updated version fixing this bug.  Changes are:
>>
>> * I removed make_outerrel_subquery/make_innerrel_subquery from fpinfo
>> and added a new member "is_subquery_rel" to fpinfo, as a flag on whether
>> to deparse the relation as a subquery.

Replacing make_outerrel_subquery/make_innerrel_subquery with is_subquery_rel
seems to be a bad idea. Whether a relation needs to be deparsed as a subquery
or not depends upon the relation which joins it with another relation. Take for
example a case when a join ABC can pull up the conditions on AB, but ABD can
not. In this case we will mark that AB in ABD needs to be deparsed as a
subquery but will not mark so in ABC. So, if we choose to join ABCD as (ABC)D
we don't need AB to be deparsed as a subquery. If we choose to join ABCD as
(ABD)C, we will need AB to deparsed as a subquery. Regression doesn't have a
testcase like that hence the code seems to be working.

Some more comments
1. The output of the following query
+SELECT 1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1 FULL
JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t2 ON (TRUE);
produces 21 rows all containing a "1". That's not quite good use of expected
output space, given that all that the test tests is the empty
targetlists are deparsed
correctly. You might want to either just test EXPLAIN output or make "between 50
and 60" condition more stringent to reduce the number of rows output.

2. Got lost here. I guess, all you need to say is "test deparsing FOR UPDATE
clause with subqueries.". Anyway, the sentence is very hard to read and needs
simplification.
+-- d. test deparsing the remote queries for simple foreign table scans in
+-- an EPQ subplan of a foreign join in which the foreign tables are full
+-- joined and in the remote join query the foreign tables are deparsed as
+-- subqueries

3. Why is foreignrel variable changed to rel?
-extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
-                        RelOptInfo *foreignrel, List *tlist,
-                        List *remote_conds, List *pathkeys,
-                        List **retrieved_attrs, List **params_list);
+extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo
*root, RelOptInfo *rel,
+                        List *tlist, List *remote_conds, List *pathkeys,
+                        bool is_subquery, List **retrieved_attrs,
+                        List **params_list);

4. I am still not happy with this change
+        /*
+         * Since (1) the expressions in foreignrel's reltarget doesn't contain
+         * any PHVs and (2) foreignrel's local_conds is empty, the tlist
+         * created by build_tlist_to_deparse must be one-to-one with the
+         * expressions.
+         */
+        Assert(list_length(tlist) ==
list_length(foreignrel->reltarget->exprs));
the assertion only checks that the number of elements in both the lists are
same but does not check whether those lists are same i.e. they contain the same
elements in the same order. This equality is crucial to deparsing logic. If
somehow build_tlist_to_deparse() breaks that assumption in future, we have no
way to detect it, unless a regression test fails.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Robert Haas
Date:
On Mon, Dec 5, 2016 at 6:20 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> 4. I am still not happy with this change
> +        /*
> +         * Since (1) the expressions in foreignrel's reltarget doesn't contain
> +         * any PHVs and (2) foreignrel's local_conds is empty, the tlist
> +         * created by build_tlist_to_deparse must be one-to-one with the
> +         * expressions.
> +         */
> +        Assert(list_length(tlist) ==
> list_length(foreignrel->reltarget->exprs));
> the assertion only checks that the number of elements in both the lists are
> same but does not check whether those lists are same i.e. they contain the same
> elements in the same order. This equality is crucial to deparsing logic. If
> somehow build_tlist_to_deparse() breaks that assumption in future, we have no
> way to detect it, unless a regression test fails.

If there's an easy way to do a more exact comparison, great.  But if
we can't get an awesome Assert(), a helpful Assert() is still better
than a kick in the head.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/12/07 5:27, Robert Haas wrote:
> On Mon, Dec 5, 2016 at 6:20 AM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> 4. I am still not happy with this change
>> +        /*
>> +         * Since (1) the expressions in foreignrel's reltarget doesn't contain
>> +         * any PHVs and (2) foreignrel's local_conds is empty, the tlist
>> +         * created by build_tlist_to_deparse must be one-to-one with the
>> +         * expressions.
>> +         */
>> +        Assert(list_length(tlist) ==
>> list_length(foreignrel->reltarget->exprs));
>> the assertion only checks that the number of elements in both the lists are
>> same but does not check whether those lists are same i.e. they contain the same
>> elements in the same order. This equality is crucial to deparsing logic. If
>> somehow build_tlist_to_deparse() breaks that assumption in future, we have no
>> way to detect it, unless a regression test fails.

> If there's an easy way to do a more exact comparison, great.

Ashutosh proposed this to do the comparison:

On 2016/11/22 18:28, Ashutosh Bapat wrote:
> I guess, the reason why you are doing it this way, is SELECT clause for the
> outermost query gets deparsed before FROM clause. For later we call
> deparseRangeTblRef(), which builds the tlist. So, while deparsing SELECT
> clause, we do not have tlist to build from. In that case, I guess, we have to
> build the tlist in get_subselect_alias_id() if it's not available and stick it
> in fpinfo. Subsequent calls to get_subselect_alias_id() should find tlist
> there. Then in deparseRangeTblRef() assert that there's a tlist in fpinfo
> and use it to build the SELECT clause of subquery. That way, we don't build
> tlist unless it's needed and also use the same tlist for all searches. Please
> use tlist_member() to search into the tlist.

This would probably work, but seems to me a bit complicated.  Instead, 
I'd like to propose that we build the tlist for each relation being 
deparsed as a subquery in a given join tree, right before deparsing the 
SELECT clause in deparseSelectStmtForRel, if is_subquery is false and 
lower_subquery_rels isn't NULL, and store the tlist into the relation's 
fpinfo.  That would allow us to build the tlist only when we need it, 
and to use tlist_member for the exact comparison.  I think it would be 
much easier to implement that.

Best regards,
Etsuro Fujita





Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
On Wed, Dec 7, 2016 at 1:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Dec 5, 2016 at 6:20 AM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> 4. I am still not happy with this change
>> +        /*
>> +         * Since (1) the expressions in foreignrel's reltarget doesn't contain
>> +         * any PHVs and (2) foreignrel's local_conds is empty, the tlist
>> +         * created by build_tlist_to_deparse must be one-to-one with the
>> +         * expressions.
>> +         */
>> +        Assert(list_length(tlist) ==
>> list_length(foreignrel->reltarget->exprs));
>> the assertion only checks that the number of elements in both the lists are
>> same but does not check whether those lists are same i.e. they contain the same
>> elements in the same order. This equality is crucial to deparsing logic. If
>> somehow build_tlist_to_deparse() breaks that assumption in future, we have no
>> way to detect it, unless a regression test fails.
>
> If there's an easy way to do a more exact comparison, great.  But if
> we can't get an awesome Assert(), a helpful Assert() is still better
> than a kick in the head.

The assert is not a problem in itself, but the reason we have to add
the assert. The problem is explained in [1], point #9.

[1] https://www.postgresql.org/message-id/CAFjFpRfwoSsJr9418b2jA7g0nwagjZSWhPQPUmK9M6z5XSOAqQ%40mail.gmail.com


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
>
>
> Ashutosh proposed this to do the comparison:
>
> On 2016/11/22 18:28, Ashutosh Bapat wrote:
>>
>> I guess, the reason why you are doing it this way, is SELECT clause for
>> the
>> outermost query gets deparsed before FROM clause. For later we call
>> deparseRangeTblRef(), which builds the tlist. So, while deparsing SELECT
>> clause, we do not have tlist to build from. In that case, I guess, we have
>> to
>> build the tlist in get_subselect_alias_id() if it's not available and
>> stick it
>> in fpinfo. Subsequent calls to get_subselect_alias_id() should find tlist
>> there. Then in deparseRangeTblRef() assert that there's a tlist in fpinfo
>> and use it to build the SELECT clause of subquery. That way, we don't
>> build
>> tlist unless it's needed and also use the same tlist for all searches.
>> Please
>> use tlist_member() to search into the tlist.
>
>
> This would probably work, but seems to me a bit complicated.  Instead, I'd
> like to propose that we build the tlist for each relation being deparsed as
> a subquery in a given join tree, right before deparsing the SELECT clause in
> deparseSelectStmtForRel, if is_subquery is false and lower_subquery_rels
> isn't NULL, and store the tlist into the relation's fpinfo.  That would
> allow us to build the tlist only when we need it, and to use tlist_member
> for the exact comparison.  I think it would be much easier to implement
> that.
>

IIRC, this is inline with my original proposal of creating tlists
before deparsing anything. Along-with that I also proposed to bubble
this array of tlist up the join hierarchy to avoid recursion [1] point
#15, further elaborated in [2]

[1] https://www.postgresql.org/message-id/ad449b25-66ee-4c06-568c-0eb2e1bef9f9%40lab.ntt.co.jp
[2] https://www.postgresql.org/message-id/CAFjFpRcn7%3DDNOXm-PJ_jVDqAmghKVf6tApQC%2B_RgMZ8tOPExcA%40mail.gmail.com

We seem to be moving towards that solution, but as discussed we have
left it to committer's judgement.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/12/05 20:20, Ashutosh Bapat wrote:
> On Mon, Nov 28, 2016 at 3:52 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> On 2016/11/24 21:45, Etsuro Fujita wrote:
>>> * I removed make_outerrel_subquery/make_innerrel_subquery from fpinfo
>>> and added a new member "is_subquery_rel" to fpinfo, as a flag on whether
>>> to deparse the relation as a subquery.

> Replacing make_outerrel_subquery/make_innerrel_subquery with is_subquery_rel
> seems to be a bad idea. Whether a relation needs to be deparsed as a subquery
> or not depends upon the relation which joins it with another relation. Take for
> example a case when a join ABC can pull up the conditions on AB, but ABD can
> not. In this case we will mark that AB in ABD needs to be deparsed as a
> subquery but will not mark so in ABC. So, if we choose to join ABCD as (ABC)D
> we don't need AB to be deparsed as a subquery. If we choose to join ABCD as
> (ABD)C, we will need AB to deparsed as a subquery.

This is not true; since (1) currently, a relation needs to be deparsed 
as a subquery only when the relation is full-joined with any other 
relation, and (2) the join ordering for full joins is preserved, we 
wouldn't have such an example.  (You might think we would have that when 
extending the patch to handle PHVs, but the answer would be no, because 
the patch for PHVs would determine whether a relation emitting PHVs 
needs to be deparsed as a subquery, depending on the relation itself. 
See the patch for PHVs.)  I like is_subquery_rel more than 
make_outerrel_subquery/make_innerrel_subquery, because it reduces the 
number of members in fpinfo.  But the choice would be arbitrary, so I 
wouldn't object to going back to 
make_outerrel_subquery/make_innerrel_subquery.

> Some more comments
> 1. The output of the following query
> +SELECT 1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1 FULL
> JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t2 ON (TRUE);
> produces 21 rows all containing a "1". That's not quite good use of expected
> output space, given that all that the test tests is the empty
> targetlists are deparsed
> correctly. You might want to either just test EXPLAIN output or make "between 50
> and 60" condition more stringent to reduce the number of rows output.

Will do.

> 2. Got lost here. I guess, all you need to say is "test deparsing FOR UPDATE
> clause with subqueries.". Anyway, the sentence is very hard to read and needs
> simplification.
> +-- d. test deparsing the remote queries for simple foreign table scans in
> +-- an EPQ subplan of a foreign join in which the foreign tables are full
> +-- joined and in the remote join query the foreign tables are deparsed as
> +-- subqueries

Will do.

> 3. Why is foreignrel variable changed to rel?
> -extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
> -                        RelOptInfo *foreignrel, List *tlist,
> -                        List *remote_conds, List *pathkeys,
> -                        List **retrieved_attrs, List **params_list);
> +extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo
> *root, RelOptInfo *rel,
> +                        List *tlist, List *remote_conds, List *pathkeys,
> +                        bool is_subquery, List **retrieved_attrs,
> +                        List **params_list);

I changed the name that way to match with the function definition in 
deparse.c.

Thanks for the comments!

Best regards,
Etsuro Fujita





Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/12/07 15:39, Ashutosh Bapat wrote:

>> On 2016/11/22 18:28, Ashutosh Bapat wrote:
>>>
>>> I guess, the reason why you are doing it this way, is SELECT clause for
>>> the
>>> outermost query gets deparsed before FROM clause. For later we call
>>> deparseRangeTblRef(), which builds the tlist. So, while deparsing SELECT
>>> clause, we do not have tlist to build from. In that case, I guess, we have
>>> to
>>> build the tlist in get_subselect_alias_id() if it's not available and
>>> stick it
>>> in fpinfo. Subsequent calls to get_subselect_alias_id() should find tlist
>>> there. Then in deparseRangeTblRef() assert that there's a tlist in fpinfo
>>> and use it to build the SELECT clause of subquery. That way, we don't
>>> build
>>> tlist unless it's needed and also use the same tlist for all searches.
>>> Please
>>> use tlist_member() to search into the tlist.

I wrote:
>> This would probably work, but seems to me a bit complicated.  Instead, I'd
>> like to propose that we build the tlist for each relation being deparsed as
>> a subquery in a given join tree, right before deparsing the SELECT clause in
>> deparseSelectStmtForRel, if is_subquery is false and lower_subquery_rels
>> isn't NULL, and store the tlist into the relation's fpinfo.  That would
>> allow us to build the tlist only when we need it, and to use tlist_member
>> for the exact comparison.  I think it would be much easier to implement
>> that.

> IIRC, this is inline with my original proposal of creating tlists
> before deparsing anything. Along-with that I also proposed to bubble
> this array of tlist up the join hierarchy to avoid recursion [1] point
> #15, further elaborated in [2]
>
> [1] https://www.postgresql.org/message-id/ad449b25-66ee-4c06-568c-0eb2e1bef9f9%40lab.ntt.co.jp
> [2] https://www.postgresql.org/message-id/CAFjFpRcn7%3DDNOXm-PJ_jVDqAmghKVf6tApQC%2B_RgMZ8tOPExcA%40mail.gmail.com
>
> We seem to be moving towards that solution, but as discussed we have
> left it to committer's judgement.

My proposal here would be a bit different from what you proposed; right 
before deparseSelectSql in deparseSelectStmtForRel, build the tlists for 
relations present in the given jointree that will be deparsed as 
subqueries, by (1) traversing the given jointree and (2) applying 
build_tlist_to_deparse to each relation to be deparsed as a subquery and 
saving the result in that relation's fpinfo.  I think that would be 
implemented easily, and the overhead would be small.

Best regards,
Etsuro Fujita





Re: Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
On Wed, Dec 7, 2016 at 4:51 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2016/12/05 20:20, Ashutosh Bapat wrote:
>>
>> On Mon, Nov 28, 2016 at 3:52 PM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp> wrote:
>>>
>>> On 2016/11/24 21:45, Etsuro Fujita wrote:
>>>>
>>>> * I removed make_outerrel_subquery/make_innerrel_subquery from fpinfo
>>>> and added a new member "is_subquery_rel" to fpinfo, as a flag on whether
>>>> to deparse the relation as a subquery.
>
>
>> Replacing make_outerrel_subquery/make_innerrel_subquery with
>> is_subquery_rel
>> seems to be a bad idea. Whether a relation needs to be deparsed as a
>> subquery
>> or not depends upon the relation which joins it with another relation.
>> Take for
>> example a case when a join ABC can pull up the conditions on AB, but ABD
>> can
>> not. In this case we will mark that AB in ABD needs to be deparsed as a
>> subquery but will not mark so in ABC. So, if we choose to join ABCD as
>> (ABC)D
>> we don't need AB to be deparsed as a subquery. If we choose to join ABCD
>> as
>> (ABD)C, we will need AB to deparsed as a subquery.
>
>
> This is not true; since (1) currently, a relation needs to be deparsed as a
> subquery only when the relation is full-joined with any other relation, and
> (2) the join ordering for full joins is preserved, we wouldn't have such an
> example.  (You might think we would have that when extending the patch to
> handle PHVs, but the answer would be no, because the patch for PHVs would
> determine whether a relation emitting PHVs needs to be deparsed as a
> subquery, depending on the relation itself. See the patch for PHVs.)  I like
> is_subquery_rel more than make_outerrel_subquery/make_innerrel_subquery,
> because it reduces the number of members in fpinfo.  But the choice would be
> arbitrary, so I wouldn't object to going back to
> make_outerrel_subquery/make_innerrel_subquery.

I think you are right. And even if we were to deparse a relation as
subquery, when it shouldn't be, we will only loose a syntactic
optimization. So, it shouldn't result in a bug.


>> 3. Why is foreignrel variable changed to rel?
>> -extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
>> -                        RelOptInfo *foreignrel, List *tlist,
>> -                        List *remote_conds, List *pathkeys,
>> -                        List **retrieved_attrs, List **params_list);
>> +extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo
>> *root, RelOptInfo *rel,
>> +                        List *tlist, List *remote_conds, List *pathkeys,
>> +                        bool is_subquery, List **retrieved_attrs,
>> +                        List **params_list);
>
>
> I changed the name that way to match with the function definition in
> deparse.c.
>

That's something good to do but it's unrelated to this patch. I have
repeated this line several times in this thread :)

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/12/07 21:11, Ashutosh Bapat wrote:
> On Wed, Dec 7, 2016 at 4:51 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> On 2016/12/05 20:20, Ashutosh Bapat wrote:

>>> 3. Why is foreignrel variable changed to rel?
>>> -extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
>>> -                        RelOptInfo *foreignrel, List *tlist,
>>> -                        List *remote_conds, List *pathkeys,
>>> -                        List **retrieved_attrs, List **params_list);
>>> +extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo
>>> *root, RelOptInfo *rel,
>>> +                        List *tlist, List *remote_conds, List *pathkeys,
>>> +                        bool is_subquery, List **retrieved_attrs,
>>> +                        List **params_list);

>> I changed the name that way to match with the function definition in
>> deparse.c.

> That's something good to do but it's unrelated to this patch. I have
> repeated this line several times in this thread :)

OK.

Best regards,
Etsuro Fujita





Re: Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/12/07 20:23, Etsuro Fujita wrote:
> On 2016/12/07 15:39, Ashutosh Bapat wrote:

>>> On 2016/11/22 18:28, Ashutosh Bapat wrote:
>>>>
>>>> I guess, the reason why you are doing it this way, is SELECT clause for
>>>> the
>>>> outermost query gets deparsed before FROM clause. For later we call
>>>> deparseRangeTblRef(), which builds the tlist. So, while deparsing
>>>> SELECT
>>>> clause, we do not have tlist to build from. In that case, I guess,
>>>> we have
>>>> to
>>>> build the tlist in get_subselect_alias_id() if it's not available and
>>>> stick it
>>>> in fpinfo. Subsequent calls to get_subselect_alias_id() should find
>>>> tlist
>>>> there. Then in deparseRangeTblRef() assert that there's a tlist in
>>>> fpinfo
>>>> and use it to build the SELECT clause of subquery. That way, we don't
>>>> build
>>>> tlist unless it's needed and also use the same tlist for all searches.
>>>> Please
>>>> use tlist_member() to search into the tlist.

> I wrote:
>>> This would probably work, but seems to me a bit complicated.
>>> Instead, I'd
>>> like to propose that we build the tlist for each relation being
>>> deparsed as
>>> a subquery in a given join tree, right before deparsing the SELECT
>>> clause in
>>> deparseSelectStmtForRel, if is_subquery is false and lower_subquery_rels
>>> isn't NULL, and store the tlist into the relation's fpinfo.  That would
>>> allow us to build the tlist only when we need it, and to use
>>> tlist_member
>>> for the exact comparison.  I think it would be much easier to implement
>>> that.

>> IIRC, this is inline with my original proposal of creating tlists
>> before deparsing anything. Along-with that I also proposed to bubble
>> this array of tlist up the join hierarchy to avoid recursion [1] point
>> #15, further elaborated in [2]
>>
>> [1]
>> https://www.postgresql.org/message-id/ad449b25-66ee-4c06-568c-0eb2e1bef9f9%40lab.ntt.co.jp
>>
>> [2]
>> https://www.postgresql.org/message-id/CAFjFpRcn7%3DDNOXm-PJ_jVDqAmghKVf6tApQC%2B_RgMZ8tOPExcA%40mail.gmail.com

> My proposal here would be a bit different from what you proposed; right
> before deparseSelectSql in deparseSelectStmtForRel, build the tlists for
> relations present in the given jointree that will be deparsed as
> subqueries, by (1) traversing the given jointree and (2) applying
> build_tlist_to_deparse to each relation to be deparsed as a subquery and
> saving the result in that relation's fpinfo.  I think that would be
> implemented easily, and the overhead would be small.

I implemented that to address your concern:
* deparseRangeTblRef uses the tlist created as above, to build the
SELECT clause of the subquery.  (You said "Then in deparseRangeTblRef()
assert that there's a tlist in fpinfo", but the tlist may be empty, so I
didn't add any assertion to that function.)
* get_relation_column_alias_ids uses tlist_member with the tlist.

I also addressed the comments #1, #2 and #3 in [1].  For #1, I added
"LIMIT 10" to the query.  Attached is the new version of the patch.

Other changes:
* As discussed before, renamed grouped_tlist in fpinfo to "tlist" and
used it to store the tlist created as above.
* Also, renamed SS_REL_ALIAS_PREFIX to SUBQUERY_REL_ALIAS_PREFIX
(Likewise for SS_COL_ALIAS_PREFIX).
* Relocated some functions.
* Revised comments a little bit.

Best regards,
Etsuro Fujita

[1]
https://www.postgresql.org/message-id/CAFjFpRfU4-gxqZ8ahoKM1ZtDJEThe3K8Lb_6beRKa8mmP%3Dv%3DfA%40mail.gmail.com

Attachment

Re: [HACKERS] Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2016/12/08 21:08, Etsuro Fujita wrote:
> On 2016/12/07 20:23, Etsuro Fujita wrote:
>> My proposal here would be a bit different from what you proposed; right
>> before deparseSelectSql in deparseSelectStmtForRel, build the tlists for
>> relations present in the given jointree that will be deparsed as
>> subqueries, by (1) traversing the given jointree and (2) applying
>> build_tlist_to_deparse to each relation to be deparsed as a subquery and
>> saving the result in that relation's fpinfo.  I think that would be
>> implemented easily, and the overhead would be small.

> I implemented that to address your concern:
> * deparseRangeTblRef uses the tlist created as above, to build the
> SELECT clause of the subquery.  (You said "Then in deparseRangeTblRef()
> assert that there's a tlist in fpinfo", but the tlist may be empty, so I
> didn't add any assertion to that function.)
> * get_relation_column_alias_ids uses tlist_member with the tlist.
>
> I also addressed the comments #1, #2 and #3 in [1].  For #1, I added
> "LIMIT 10" to the query.  Attached is the new version of the patch.
>
> Other changes:
> * As discussed before, renamed grouped_tlist in fpinfo to "tlist" and
> used it to store the tlist created as above.
> * Also, renamed SS_REL_ALIAS_PREFIX to SUBQUERY_REL_ALIAS_PREFIX
> (Likewise for SS_COL_ALIAS_PREFIX).
> * Relocated some functions.
> * Revised comments a little bit.

I updated the patch a bit further: simplified the function name 
(s/build_subquery_rel_tlists/build_subquery_tlists/), and revised 
comments a little bit.  Attached is an updated version 
(postgres-fdw-subquery-support-v14.patch).  And I rebased another patch 
for PHVs against that patch, which is also attached 
(postgres-fdw-phv-pushdown-v14.patch).

Best regards,
Etsuro Fujita

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
>
> I updated the patch a bit further: simplified the function name
> (s/build_subquery_rel_tlists/build_subquery_tlists/), and revised comments a
> little bit.  Attached is an updated version
> (postgres-fdw-subquery-support-v14.patch).  And I rebased another patch for
> PHVs against that patch, which is also attached
> (postgres-fdw-phv-pushdown-v14.patch).
>
Few comments

In build_subquery_tlists(), why don't we handle base relations?
+   if (foreignrel->reloptkind != RELOPT_JOINREL)
+       return;

Also, in this function, if fpinfo->tlist is already set, why do we want to
build it again?

In build_tlist_to_deparse(), if fpinfo->tlist for the given relation is set, we
should just return it rather than constructing it again.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2017/01/03 17:28, Ashutosh Bapat wrote:

I wrote:
>> I updated the patch a bit further: simplified the function name
>> (s/build_subquery_rel_tlists/build_subquery_tlists/), and revised comments a
>> little bit.  Attached is an updated version
>> (postgres-fdw-subquery-support-v14.patch).

> Few comments

Thanks for the comments!

> In build_subquery_tlists(), why don't we handle base relations?
> +   if (foreignrel->reloptkind != RELOPT_JOINREL)
> +       return;

The reason for that is we don't need to handle the baserel cases; the 
tlist for a base relation, if needed, would be created while recursing 
into a join relation that joins the base relation to other base/join 
relation.

> Also, in this function, if fpinfo->tlist is already set, why do we want to
> build it again?

When this function gets called, fpinfo->tlist isn't set for any base or 
join relation that needs to build the tlist, so we always need to build 
it for each such relation.

> In build_tlist_to_deparse(), if fpinfo->tlist for the given relation is set, we
> should just return it rather than constructing it again.

In that function we wouldn't have such cases for base or join relations 
needing the tlist.

Best regards,
Etsuro Fujita





Re: [HACKERS] Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
On Thu, Jan 5, 2017 at 5:14 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2017/01/03 17:28, Ashutosh Bapat wrote:
>
> I wrote:
>>>
>>> I updated the patch a bit further: simplified the function name
>>> (s/build_subquery_rel_tlists/build_subquery_tlists/), and revised
>>> comments a
>>> little bit.  Attached is an updated version
>>> (postgres-fdw-subquery-support-v14.patch).
>
>
>> Few comments
>
>
> Thanks for the comments!
>
>> In build_subquery_tlists(), why don't we handle base relations?
>> +   if (foreignrel->reloptkind != RELOPT_JOINREL)
>> +       return;
>
>
> The reason for that is we don't need to handle the baserel cases; the tlist
> for a base relation, if needed, would be created while recursing into a join
> relation that joins the base relation to other base/join relation.

Right. Sorry, I misunderstood the code. May be a comment would help.

>
>> Also, in this function, if fpinfo->tlist is already set, why do we want to
>> build it again?
>
>
> When this function gets called, fpinfo->tlist isn't set for any base or join
> relation that needs to build the tlist, so we always need to build it for
> each such relation.

IIUC, for a relation with use_remote_estimates we will deparse the
query twice and will build the targetlist twice.

>
>> In build_tlist_to_deparse(), if fpinfo->tlist for the given relation is
>> set, we
>> should just return it rather than constructing it again.
>
>
> In that function we wouldn't have such cases for base or join relations
> needing the tlist.

Same explanation as above.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2017/01/05 21:11, Ashutosh Bapat wrote:
> On Thu, Jan 5, 2017 at 5:14 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> On 2017/01/03 17:28, Ashutosh Bapat wrote:
>>> In build_subquery_tlists(), why don't we handle base relations?
>>> +   if (foreignrel->reloptkind != RELOPT_JOINREL)
>>> +       return;

>> The reason for that is we don't need to handle the baserel cases; the tlist
>> for a base relation, if needed, would be created while recursing into a join
>> relation that joins the base relation to other base/join relation.

> Right. Sorry, I misunderstood the code. May be a comment would help.

Will add the comment.

>>> Also, in this function, if fpinfo->tlist is already set, why do we want to
>>> build it again?

>> When this function gets called, fpinfo->tlist isn't set for any base or join
>> relation that needs to build the tlist, so we always need to build it for
>> each such relation.

> IIUC, for a relation with use_remote_estimates we will deparse the
> query twice and will build the targetlist twice.

That's right.  We could avoid the duplicate work the way you proposed, 
but I was thinking to leave that for another patch.  Should we do that 
in this patch?

>>> In build_tlist_to_deparse(), if fpinfo->tlist for the given relation is
>>> set, we
>>> should just return it rather than constructing it again.

>> In that function we wouldn't have such cases for base or join relations
>> needing the tlist.

> Same explanation as above.

Will revise if it's better to do that in this patch.

Best regards,
Etsuro Fujita





Re: [HACKERS] Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
On Thu, Jan 5, 2017 at 5:51 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2017/01/05 21:11, Ashutosh Bapat wrote:
>>
>> On Thu, Jan 5, 2017 at 5:14 PM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp> wrote:
>>>
>>> On 2017/01/03 17:28, Ashutosh Bapat wrote:
>>>>
>>>> In build_subquery_tlists(), why don't we handle base relations?
>>>> +   if (foreignrel->reloptkind != RELOPT_JOINREL)
>>>> +       return;
>
>
>>> The reason for that is we don't need to handle the baserel cases; the
>>> tlist
>>> for a base relation, if needed, would be created while recursing into a
>>> join
>>> relation that joins the base relation to other base/join relation.
>
>
>> Right. Sorry, I misunderstood the code. May be a comment would help.
>
>
> Will add the comment.
>
>>>> Also, in this function, if fpinfo->tlist is already set, why do we want
>>>> to
>>>> build it again?
>
>
>>> When this function gets called, fpinfo->tlist isn't set for any base or
>>> join
>>> relation that needs to build the tlist, so we always need to build it for
>>> each such relation.
>
>
>> IIUC, for a relation with use_remote_estimates we will deparse the
>> query twice and will build the targetlist twice.
>
>
> That's right.  We could avoid the duplicate work the way you proposed, but I
> was thinking to leave that for another patch.  Should we do that in this
> patch?

If you are agree that the change is needed, it's better to do it in
this patch itself if we can, instead of a one liner patch.
>
>>>> In build_tlist_to_deparse(), if fpinfo->tlist for the given relation is
>>>> set, we
>>>> should just return it rather than constructing it again.
>
>
>>> In that function we wouldn't have such cases for base or join relations
>>> needing the tlist.
>
>
>> Same explanation as above.
>
>
> Will revise if it's better to do that in this patch.

Thanks.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2017/01/05 21:38, Ashutosh Bapat wrote:
> On Thu, Jan 5, 2017 at 5:51 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> On 2017/01/05 21:11, Ashutosh Bapat wrote:

>>>> On 2017/01/03 17:28, Ashutosh Bapat wrote:
>>>>> Also, in this function, if fpinfo->tlist is already set, why do we want
>>>>> to
>>>>> build it again?

>>> IIUC, for a relation with use_remote_estimates we will deparse the
>>> query twice and will build the targetlist twice.

>> That's right.  We could avoid the duplicate work the way you proposed, but I
>> was thinking to leave that for another patch.  Should we do that in this
>> patch?

> If you are agree that the change is needed, it's better to do it in
> this patch itself if we can, instead of a one liner patch.

While working on this, I noticed a weird case.  Consider:

postgres=# explain verbose select 1 from ft1 left join ft2 on (ft1.a = 
ft2.a) inner join test on (true);                                           QUERY PLAN
------------------------------------------------------------------------------------------------- Nested Loop
(cost=100.00..103.06rows=1 width=4)   Output: 1   ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=0)
Relations:(public.ft1) LEFT JOIN (public.ft2)         Remote SQL: SELECT NULL FROM (public.t1 r1 LEFT JOIN public.t2 
 
r2 ON (((r1.a = r2.a))))   ->  Seq Scan on public.test  (cost=0.00..1.01 rows=1 width=0)         Output: test.a,
test.b
(7 rows)

In this case the fpinfo->tlist of the foreign join is NIL, so whether or 
not the tlist is already built cannot be discriminated by the 
fpinfo->tlist.  We might need another flag to show that the tlist has 
been built already.  Since this is an existing issue and we would need 
to give careful thought to this, so I'd like to leave this for another 
patch.

Best regards,
Etsuro Fujita





Re: [HACKERS] Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
On Wed, Jan 11, 2017 at 5:45 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2017/01/05 21:38, Ashutosh Bapat wrote:
>>
>> On Thu, Jan 5, 2017 at 5:51 PM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp> wrote:
>>>
>>> On 2017/01/05 21:11, Ashutosh Bapat wrote:
>
>
>>>>> On 2017/01/03 17:28, Ashutosh Bapat wrote:
>>>>>>
>>>>>> Also, in this function, if fpinfo->tlist is already set, why do we
>>>>>> want
>>>>>> to
>>>>>> build it again?
>
>
>>>> IIUC, for a relation with use_remote_estimates we will deparse the
>>>> query twice and will build the targetlist twice.
>
>
>>> That's right.  We could avoid the duplicate work the way you proposed,
>>> but I
>>> was thinking to leave that for another patch.  Should we do that in this
>>> patch?
>
>
>> If you are agree that the change is needed, it's better to do it in
>> this patch itself if we can, instead of a one liner patch.
>
>
> While working on this, I noticed a weird case.  Consider:
>
> postgres=# explain verbose select 1 from ft1 left join ft2 on (ft1.a =
> ft2.a) inner join test on (true);
>                                            QUERY PLAN
> -------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=100.00..103.06 rows=1 width=4)
>    Output: 1
>    ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=0)
>          Relations: (public.ft1) LEFT JOIN (public.ft2)
>          Remote SQL: SELECT NULL FROM (public.t1 r1 LEFT JOIN public.t2 r2
> ON (((r1.a = r2.a))))
>    ->  Seq Scan on public.test  (cost=0.00..1.01 rows=1 width=0)
>          Output: test.a, test.b
> (7 rows)
>
> In this case the fpinfo->tlist of the foreign join is NIL, so whether or not
> the tlist is already built cannot be discriminated by the fpinfo->tlist.  We
> might need another flag to show that the tlist has been built already.
> Since this is an existing issue and we would need to give careful thought to
> this, so I'd like to leave this for another patch.

I think in that case, relation's targetlist will also be NIL or
contain no Var node. It wouldn't be expensive to build it again and
again. That's better than maintaining a new flag. This is just a
suggestion, for an additional check, you might want to check
rel->reltarget->exprs for NIL. But I think we don't need it.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2017/01/12 18:25, Ashutosh Bapat wrote:
> On Wed, Jan 11, 2017 at 5:45 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:

>>>> On 2017/01/05 21:11, Ashutosh Bapat wrote:
>>>>> IIUC, for a relation with use_remote_estimates we will deparse the
>>>>> query twice and will build the targetlist twice.

>> While working on this, I noticed a weird case.  Consider:
>>
>> postgres=# explain verbose select 1 from ft1 left join ft2 on (ft1.a =
>> ft2.a) inner join test on (true);
>>                                            QUERY PLAN
>> -------------------------------------------------------------------------------------------------
>>  Nested Loop  (cost=100.00..103.06 rows=1 width=4)
>>    Output: 1
>>    ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=0)
>>          Relations: (public.ft1) LEFT JOIN (public.ft2)
>>          Remote SQL: SELECT NULL FROM (public.t1 r1 LEFT JOIN public.t2 r2
>> ON (((r1.a = r2.a))))
>>    ->  Seq Scan on public.test  (cost=0.00..1.01 rows=1 width=0)
>>          Output: test.a, test.b
>> (7 rows)
>>
>> In this case the fpinfo->tlist of the foreign join is NIL, so whether or not
>> the tlist is already built cannot be discriminated by the fpinfo->tlist.  We
>> might need another flag to show that the tlist has been built already.
>> Since this is an existing issue and we would need to give careful thought to
>> this, so I'd like to leave this for another patch.

> I think in that case, relation's targetlist will also be NIL or
> contain no Var node. It wouldn't be expensive to build it again and
> again. That's better than maintaining a new flag.

I think that's ugly.  A more clean way I'm thinking is: (1) in 
postgresGetForeignJoinPaths(), create a tlist by 
build_tlist_to_deparse() and save it in fpinfo->tlist before 
estimate_path_cost_size() if use_remote_estimates=true, (2) in 
estimate_path_cost_size(), use the fpinfo->tlist if 
use_remote_estimates=true, and (3) in postgresGetForeignPlan(), use the 
fpinfo->tlist as the fdw_scan_tlist if use_remote_estimates=true, 
otherwise create a tlist as the fdw_scan_tlist by 
build_tlist_to_deparse(), like the attached.

What do you think about that?

Another change is: I simplified build_tlist_to_deparse() a bit and put 
that in postgres_fdw.c because that is used only in postgres_fdw.c.

I still think we should discuss this separately because this is an 
existing issue and that makes it easy to review the patch.  If the 
attached is the right way to go, I'll update the join-pushdown patch on 
top of the patch.

Best regards,
Etsuro Fujita

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
On Fri, Jan 13, 2017 at 12:30 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2017/01/12 18:25, Ashutosh Bapat wrote:
>>
>> On Wed, Jan 11, 2017 at 5:45 PM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp> wrote:
>
>
>>>>> On 2017/01/05 21:11, Ashutosh Bapat wrote:
>>>>>>
>>>>>> IIUC, for a relation with use_remote_estimates we will deparse the
>>>>>> query twice and will build the targetlist twice.
>
>
>>> While working on this, I noticed a weird case.  Consider:
>>>
>>> postgres=# explain verbose select 1 from ft1 left join ft2 on (ft1.a =
>>> ft2.a) inner join test on (true);
>>>                                            QUERY PLAN
>>>
>>> -------------------------------------------------------------------------------------------------
>>>  Nested Loop  (cost=100.00..103.06 rows=1 width=4)
>>>    Output: 1
>>>    ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=0)
>>>          Relations: (public.ft1) LEFT JOIN (public.ft2)
>>>          Remote SQL: SELECT NULL FROM (public.t1 r1 LEFT JOIN public.t2
>>> r2
>>> ON (((r1.a = r2.a))))
>>>    ->  Seq Scan on public.test  (cost=0.00..1.01 rows=1 width=0)
>>>          Output: test.a, test.b
>>> (7 rows)
>>>
>>> In this case the fpinfo->tlist of the foreign join is NIL, so whether or
>>> not
>>> the tlist is already built cannot be discriminated by the fpinfo->tlist.
>>> We
>>> might need another flag to show that the tlist has been built already.
>>> Since this is an existing issue and we would need to give careful thought
>>> to
>>> this, so I'd like to leave this for another patch.
>
>
>> I think in that case, relation's targetlist will also be NIL or
>> contain no Var node. It wouldn't be expensive to build it again and
>> again. That's better than maintaining a new flag.
>
>
> I think that's ugly.  A more clean way I'm thinking is: (1) in
> postgresGetForeignJoinPaths(), create a tlist by build_tlist_to_deparse()
> and save it in fpinfo->tlist before estimate_path_cost_size() if
> use_remote_estimates=true, (2) in estimate_path_cost_size(), use the
> fpinfo->tlist if use_remote_estimates=true, and (3) in
> postgresGetForeignPlan(), use the fpinfo->tlist as the fdw_scan_tlist if
> use_remote_estimates=true, otherwise create a tlist as the fdw_scan_tlist by
> build_tlist_to_deparse(), like the attached.
>
> What do you think about that?
>
> Another change is: I simplified build_tlist_to_deparse() a bit and put that
> in postgres_fdw.c because that is used only in postgres_fdw.c.
>
> I still think we should discuss this separately because this is an existing
> issue and that makes it easy to review the patch.  If the attached is the
> right way to go, I'll update the join-pushdown patch on top of the patch.

I don't think it's right to assume that the targetlist will be
available only when use_remote_estimate is true; for grouping it's
certainly not.

But I don't see this discussion getting anywhere. I will leave it to
the committer's judgement. I think we should pick up your patch on
27th December, update the comment per your mail on 5th Jan. I will
review it once and list down the things left to committer's judgement.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2017/01/27 20:04, Ashutosh Bapat wrote:
> On Fri, Jan 13, 2017 at 12:30 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> A more clean way I'm thinking is: (1) in
>> postgresGetForeignJoinPaths(), create a tlist by build_tlist_to_deparse()
>> and save it in fpinfo->tlist before estimate_path_cost_size() if
>> use_remote_estimates=true, (2) in estimate_path_cost_size(), use the
>> fpinfo->tlist if use_remote_estimates=true, and (3) in
>> postgresGetForeignPlan(), use the fpinfo->tlist as the fdw_scan_tlist if
>> use_remote_estimates=true, otherwise create a tlist as the fdw_scan_tlist by
>> build_tlist_to_deparse(), like the attached.
>>
>> What do you think about that?
>>
>> Another change is: I simplified build_tlist_to_deparse() a bit and put that
>> in postgres_fdw.c because that is used only in postgres_fdw.c.
>>
>> I still think we should discuss this separately because this is an existing
>> issue and that makes it easy to review the patch.  If the attached is the
>> right way to go, I'll update the join-pushdown patch on top of the patch.

> I don't think it's right to assume that the targetlist will be
> available only when use_remote_estimate is true; for grouping it's
> certainly not.

My explanation was not enough.  Sorry about that.  My proposal described 
above was for join relations, not upper relations.  We build the 
targetlist of an upper relation during postgresGetForeignUpperPaths, so 
for grouping I think we should use that targetlist in 
estimate_path_cost_size and postgresGetForeignPlan.  The patch was 
created that way.

> But I don't see this discussion getting anywhere. I will leave it to
> the committer's judgement.

I'm fine with that.

> I think we should pick up your patch on
> 27th December, update the comment per your mail on 5th Jan. I will
> review it once and list down the things left to committer's judgement.

Sorry, I started thinking we went in the wrong direction.  I added to 
deparseSelectStmtForRel build_subquery_tlists, which creates a tlist for 
each subquery present in a given join tree prior to deparsing a whole 
remote query.  But that's nothing but an overhead; we need to create a 
tlist for the top-level query because we use it as fdw_scan_tlist, but 
not for subqueries, and we need to create retrieved_attrs for the 
top-level query while deparsing the targetlist in 
deparseExplicitTargetList, but not for subqueries.  So, we should need 
some work to avoid such a useless overhead.

Best regards,
Etsuro Fujita





Re: [HACKERS] Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2017/01/27 21:25, Etsuro Fujita wrote:
> On 2017/01/27 20:04, Ashutosh Bapat wrote:
>> I think we should pick up your patch on
>> 27th December, update the comment per your mail on 5th Jan. I will
>> review it once and list down the things left to committer's judgement.

> Sorry, I started thinking we went in the wrong direction.  I added to
> deparseSelectStmtForRel build_subquery_tlists, which creates a tlist for
> each subquery present in a given join tree prior to deparsing a whole
> remote query.  But that's nothing but an overhead; we need to create a
> tlist for the top-level query because we use it as fdw_scan_tlist, but
> not for subqueries, and we need to create retrieved_attrs for the
> top-level query while deparsing the targetlist in
> deparseExplicitTargetList, but not for subqueries.  So, we should need
> some work to avoid such a useless overhead.

I think we can avoid the useless overhead by adding a new function, 
deparseSubqueryTargetList, that deparses expressions in the given 
relation's reltarget, not the tlist, as a SELECT clause of the subquery 
representing the given relation.  That would also allow us to make the 
1-to-1 relationship between the subquery output columns and their 
aliases more explicit, which was your original comment.  Please find 
attached the new version.  (The patch doesn't need the patch to avoid 
duplicate construction of the tlist, discussed upthread.)

Other changes:
* I went back to make_outerrel_subquery and make_innerrel_subquery, 
which are flags to indicate whether to deparse the input relations as 
subqueries.  is_subquery_rel would work well for handling the cases of 
full joins with restrictions on the input relations, but I noticed that 
that wouldn't work well when extending to handle the cases where we 
deparse the input relations as subqueries to evaluate PHVs remotely.
* Since appendSubqueryAlias in the previous patch is pretty small, I 
included the code into deparseRangeTblRef.

Best regards,
Etsuro Fujita

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Push down more full joins in postgres_fdw

From
Ashutosh Bapat
Date:
On Mon, Jan 30, 2017 at 5:00 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2017/01/27 21:25, Etsuro Fujita wrote:
>>
>> On 2017/01/27 20:04, Ashutosh Bapat wrote:
>>>
>>> I think we should pick up your patch on
>>> 27th December, update the comment per your mail on 5th Jan. I will
>>> review it once and list down the things left to committer's judgement.
>
>
>> Sorry, I started thinking we went in the wrong direction.  I added to
>> deparseSelectStmtForRel build_subquery_tlists, which creates a tlist for
>> each subquery present in a given join tree prior to deparsing a whole
>> remote query.  But that's nothing but an overhead; we need to create a
>> tlist for the top-level query because we use it as fdw_scan_tlist, but
>> not for subqueries, and we need to create retrieved_attrs for the
>> top-level query while deparsing the targetlist in
>> deparseExplicitTargetList, but not for subqueries.  So, we should need
>> some work to avoid such a useless overhead.
>
>
> I think we can avoid the useless overhead by adding a new function,
> deparseSubqueryTargetList, that deparses expressions in the given relation's
> reltarget, not the tlist, as a SELECT clause of the subquery representing
> the given relation.  That would also allow us to make the 1-to-1
> relationship between the subquery output columns and their aliases more
> explicit, which was your original comment.  Please find attached the new
> version.  (The patch doesn't need the patch to avoid duplicate construction
> of the tlist, discussed upthread.)

I have not looked at the patch, but the reason we use a tlist instead
of list of expressions is because fdw_scan_tlist is expected in that
form and we don't want two different representations one to deparse
SELECT and one to interpret results from the foreign server. What you
describe above seems to introduce exactly that hazard.

>
> Other changes:
> * I went back to make_outerrel_subquery and make_innerrel_subquery, which
> are flags to indicate whether to deparse the input relations as subqueries.
> is_subquery_rel would work well for handling the cases of full joins with
> restrictions on the input relations, but I noticed that that wouldn't work
> well when extending to handle the cases where we deparse the input relations
> as subqueries to evaluate PHVs remotely.

I had objected to using a single variable instead of two previously
and you had argued against it in [1]. There you had mentioned that PHV
doesn't need two variables, but now you are taking the other stand,
without any apparent reason. Can you please clarify it?

[1] https://www.postgresql.org/message-id/1eb58ee4-8ffa-7c40-1229-c8973e6498ea@lab.ntt.co.jp

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2017/01/30 21:05, Ashutosh Bapat wrote:
> On Mon, Jan 30, 2017 at 5:00 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> On 2017/01/27 21:25, Etsuro Fujita wrote:
>>> Sorry, I started thinking we went in the wrong direction.  I added to
>>> deparseSelectStmtForRel build_subquery_tlists, which creates a tlist for
>>> each subquery present in a given join tree prior to deparsing a whole
>>> remote query.  But that's nothing but an overhead; we need to create a
>>> tlist for the top-level query because we use it as fdw_scan_tlist, but
>>> not for subqueries, and we need to create retrieved_attrs for the
>>> top-level query while deparsing the targetlist in
>>> deparseExplicitTargetList, but not for subqueries.  So, we should need
>>> some work to avoid such a useless overhead.

>> I think we can avoid the useless overhead by adding a new function,
>> deparseSubqueryTargetList, that deparses expressions in the given relation's
>> reltarget, not the tlist, as a SELECT clause of the subquery representing
>> the given relation.  That would also allow us to make the 1-to-1
>> relationship between the subquery output columns and their aliases more
>> explicit, which was your original comment.  Please find attached the new
>> version.  (The patch doesn't need the patch to avoid duplicate construction
>> of the tlist, discussed upthread.)

> I have not looked at the patch, but the reason we use a tlist instead
> of list of expressions is because fdw_scan_tlist is expected in that
> form

Actually, we wouldn't need that to deparse a remote join query; a list 
of expressions would be enough.

> and we don't want two different representations one to deparse
> SELECT and one to interpret results from the foreign server. What you
> describe above seems to introduce exactly that hazard.

I agree with you to some extent, BUT:
* I don't think it's a good idea to create a tlist for each base/join 
relation that is deparsed as a subquery, to just avoid that hazard.  As 
I said above, that's nothing but an overhead.
* I think we would need to have two different representations for at 
least base relations; we use fpinfo->attrs_used to deparse a simple 
foreign table scan query for a base relation, but when deparsing the 
base relation as a subquery, we would need to use the list of 
expressions in the base relation's reltarget, to deparse a SELECT clause 
of the subquery, because we need to deparse a whole-row reference to the 
base relation as ROW, not all the user columns expanded, as shown in 
this extracted from the regression tests in the patch:

+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 
50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 
between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 
and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) 
ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
+                                               QUERY PLAN 

+ 

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
+    ->  Nested Loop
+          Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
+          ->  Foreign Scan
+                Output: ft4.c1, ft4.*, ft5.c1, ft5.*
+                Relations: (public.ft4) FULL JOIN (public.ft5)
+                Remote SQL: SELECT s8.c1, s8.c2, s9.c1, s9.c2 FROM 
((SELECT c1, ROW(c1, c2, c3) FROM "S 1"."T 3" WHERE ((c1 >= 50)) AND 
((c1 <= 60))) s8(c1, c2) FULL JOIN (SELECT c1, ROW(c1, c2, c3) FROM "S 
1"."T 4" WHERE ((c1 >= 50)) AND ((c1 <= 60))) s9(c1, c2) ON (((s8.c1 = 
s9.c1)))) WHERE (((s8.c1 IS NULL) OR (s8.c1 IS NOT NULL))) ORDER BY 
s8.c1 ASC NULLS LAST, s9.c1 ASC NULLS LAST
+                ->  Hash Full Join
+                      Output: ft4.c1, ft4.*, ft5.c1, ft5.*
+                      Hash Cond: (ft4.c1 = ft5.c1)
+                      Filter: ((ft4.c1 IS NULL) OR (ft4.c1 IS NOT NULL))
+                      ->  Foreign Scan on public.ft4
+                            Output: ft4.c1, ft4.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 
3" WHERE ((c1 >= 50)) AND ((c1 <= 60))
+                      ->  Hash
+                            Output: ft5.c1, ft5.*
+                            ->  Foreign Scan on public.ft5
+                                  Output: ft5.c1, ft5.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 
1"."T 4" WHERE ((c1 >= 50)) AND ((c1 <= 60))
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (25 rows)

>> Other changes:
>> * I went back to make_outerrel_subquery and make_innerrel_subquery, which
>> are flags to indicate whether to deparse the input relations as subqueries.
>> is_subquery_rel would work well for handling the cases of full joins with
>> restrictions on the input relations, but I noticed that that wouldn't work
>> well when extending to handle the cases where we deparse the input relations
>> as subqueries to evaluate PHVs remotely.

> I had objected to using a single variable instead of two previously
> and you had argued against it in [1]. There you had mentioned that PHV
> doesn't need two variables, but now you are taking the other stand,
> without any apparent reason. Can you please clarify it?

Sorry, I missed some cases.  Consider the join {A, B, C} satisfying the 
identity 3 in optimizer/README, ie,
    (A leftjoin B on (Pab)) leftjoin C on (Pbc)    = A leftjoin (B leftjoin C on (Pbc)) on (Pab)

where predicate Pbc fails for all-null B rows.  Assume that B has a PHV 
in the relation's reltarget.  While considering 2-way joins, 
foreign_join_ok would determine to deparse B as a subquery and hence set 
the B's is_subquery_rel to true for the join relation {A, B}.  However, 
if the planner selects the join order {A leftjoin (B leftjoin C on 
(Pbc)) on (Pab)} as the cheapest path for the join {A, B, C}, we would 
deparse B as a subquery according to the is_subquery_rel flag while 
creating the FROM clause entry for the join relation {B, C}.  That 
wouldn't look good.  That would be logically correct, though.  To avoid 
this, I'd like to go back to the two variables previously proposed.

Best regards,
Etsuro Fujita





Re: [HACKERS] Push down more full joins in postgres_fdw

From
Michael Paquier
Date:
On Mon, Jan 30, 2017 at 8:30 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> Other changes:
> * I went back to make_outerrel_subquery and make_innerrel_subquery, which
> are flags to indicate whether to deparse the input relations as subqueries.
> is_subquery_rel would work well for handling the cases of full joins with
> restrictions on the input relations, but I noticed that that wouldn't work
> well when extending to handle the cases where we deparse the input relations
> as subqueries to evaluate PHVs remotely.
> * Since appendSubqueryAlias in the previous patch is pretty small, I
> included the code into deparseRangeTblRef.

The patch is very fresh, so moved to CF 2017-03.
-- 
Michael



Re: [HACKERS] Push down more full joins in postgres_fdw

From
David Steele
Date:
On 1/30/17 6:30 AM, Etsuro Fujita wrote:
> On 2017/01/27 21:25, Etsuro Fujita wrote:
>> On 2017/01/27 20:04, Ashutosh Bapat wrote:
>>> I think we should pick up your patch on
>>> 27th December, update the comment per your mail on 5th Jan. I will
>>> review it once and list down the things left to committer's judgement.
> 
>> Sorry, I started thinking we went in the wrong direction.  I added to
>> deparseSelectStmtForRel build_subquery_tlists, which creates a tlist for
>> each subquery present in a given join tree prior to deparsing a whole
>> remote query.  But that's nothing but an overhead; we need to create a
>> tlist for the top-level query because we use it as fdw_scan_tlist, but
>> not for subqueries, and we need to create retrieved_attrs for the
>> top-level query while deparsing the targetlist in
>> deparseExplicitTargetList, but not for subqueries.  So, we should need
>> some work to avoid such a useless overhead.
> 
> I think we can avoid the useless overhead by adding a new function,
> deparseSubqueryTargetList, that deparses expressions in the given
> relation's reltarget, not the tlist, as a SELECT clause of the subquery
> representing the given relation.  That would also allow us to make the
> 1-to-1 relationship between the subquery output columns and their
> aliases more explicit, which was your original comment.  Please find
> attached the new version.  (The patch doesn't need the patch to avoid
> duplicate construction of the tlist, discussed upthread.)
> 
> Other changes:
> * I went back to make_outerrel_subquery and make_innerrel_subquery,
> which are flags to indicate whether to deparse the input relations as
> subqueries.  is_subquery_rel would work well for handling the cases of
> full joins with restrictions on the input relations, but I noticed that
> that wouldn't work well when extending to handle the cases where we
> deparse the input relations as subqueries to evaluate PHVs remotely.
> * Since appendSubqueryAlias in the previous patch is pretty small, I
> included the code into deparseRangeTblRef.

This patch does not apply cleanly at cccbdde:

$ patch -p1 < ../other/postgres-fdw-subquery-support-v15.patch
patching file contrib/postgres_fdw/deparse.c
Hunk #11 succeeded at 1371 (offset -3 lines).
Hunk #12 succeeded at 1419 (offset -3 lines).
Hunk #13 succeeded at 1486 (offset -3 lines).
Hunk #14 succeeded at 2186 (offset -3 lines).
Hunk #15 succeeded at 3082 (offset -3 lines).
patching file contrib/postgres_fdw/expected/postgres_fdw.out
patching file contrib/postgres_fdw/postgres_fdw.c
Hunk #1 succeeded at 669 (offset 1 line).
Hunk #2 succeeded at 1245 (offset -1 lines).
Hunk #3 succeeded at 2557 (offset -1 lines).
Hunk #4 succeeded at 4157 (offset 3 lines).
Hunk #5 succeeded at 4183 (offset 3 lines).
Hunk #6 succeeded at 4212 (offset 3 lines).
Hunk #7 succeeded at 4315 (offset 3 lines).
patching file contrib/postgres_fdw/postgres_fdw.h
patching file contrib/postgres_fdw/sql/postgres_fdw.sql

Since these are just offsets I'll leave the patch as "Needs review" but
an updated patch would be appreciated.

Thanks,
-- 
-David
david@pgmasters.net



Re: [HACKERS] Push down more full joins in postgres_fdw

From
Robert Haas
Date:
On Thu, Mar 16, 2017 at 11:46 AM, David Steele <david@pgmasters.net> wrote:
> $ patch -p1 < ../other/postgres-fdw-subquery-support-v15.patch
> patching file contrib/postgres_fdw/deparse.c
> Hunk #11 succeeded at 1371 (offset -3 lines).
> Hunk #12 succeeded at 1419 (offset -3 lines).
> Hunk #13 succeeded at 1486 (offset -3 lines).
> Hunk #14 succeeded at 2186 (offset -3 lines).
> Hunk #15 succeeded at 3082 (offset -3 lines).
> patching file contrib/postgres_fdw/expected/postgres_fdw.out
> patching file contrib/postgres_fdw/postgres_fdw.c
> Hunk #1 succeeded at 669 (offset 1 line).
> Hunk #2 succeeded at 1245 (offset -1 lines).
> Hunk #3 succeeded at 2557 (offset -1 lines).
> Hunk #4 succeeded at 4157 (offset 3 lines).
> Hunk #5 succeeded at 4183 (offset 3 lines).
> Hunk #6 succeeded at 4212 (offset 3 lines).
> Hunk #7 succeeded at 4315 (offset 3 lines).
> patching file contrib/postgres_fdw/postgres_fdw.h
> patching file contrib/postgres_fdw/sql/postgres_fdw.sql
>
> Since these are just offsets I'll leave the patch as "Needs review" but
> an updated patch would be appreciated.

I don't think that's really needed.  Offsets don't hurt anything.
Even fuzz is OK.  As long as the hunks are applying, I think it's
fine.

Incidentally, I'm reading through this one now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Push down more full joins in postgres_fdw

From
Robert Haas
Date:
On Thu, Mar 16, 2017 at 12:48 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Mar 16, 2017 at 11:46 AM, David Steele <david@pgmasters.net> wrote:
>> $ patch -p1 < ../other/postgres-fdw-subquery-support-v15.patch
>> patching file contrib/postgres_fdw/deparse.c
>> Hunk #11 succeeded at 1371 (offset -3 lines).
>> Hunk #12 succeeded at 1419 (offset -3 lines).
>> Hunk #13 succeeded at 1486 (offset -3 lines).
>> Hunk #14 succeeded at 2186 (offset -3 lines).
>> Hunk #15 succeeded at 3082 (offset -3 lines).
>> patching file contrib/postgres_fdw/expected/postgres_fdw.out
>> patching file contrib/postgres_fdw/postgres_fdw.c
>> Hunk #1 succeeded at 669 (offset 1 line).
>> Hunk #2 succeeded at 1245 (offset -1 lines).
>> Hunk #3 succeeded at 2557 (offset -1 lines).
>> Hunk #4 succeeded at 4157 (offset 3 lines).
>> Hunk #5 succeeded at 4183 (offset 3 lines).
>> Hunk #6 succeeded at 4212 (offset 3 lines).
>> Hunk #7 succeeded at 4315 (offset 3 lines).
>> patching file contrib/postgres_fdw/postgres_fdw.h
>> patching file contrib/postgres_fdw/sql/postgres_fdw.sql
>>
>> Since these are just offsets I'll leave the patch as "Needs review" but
>> an updated patch would be appreciated.
>
> I don't think that's really needed.  Offsets don't hurt anything.
> Even fuzz is OK.  As long as the hunks are applying, I think it's
> fine.
>
> Incidentally, I'm reading through this one now.

And ... I don't see anything to complain about, so, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Push down more full joins in postgres_fdw

From
Etsuro Fujita
Date:
On 2017/03/17 2:35, Robert Haas wrote:
> And ... I don't see anything to complain about, so, committed.

Thanks for committing, Robert!  Thanks for reviewing, Ashutosh and David!

Best regards,
Etsuro Fujita