Thread: Postgres_fdw join pushdown - INNER - FULL OUTER join combination generating wrong result

Postgres_fdw join pushdown - INNER - FULL OUTER join combination generating wrong result

From
Rajkumar Raghuwanshi
Date:
<div dir="ltr"><span style="font-size:small">Hi,</span><font face="arial, helvetica, sans-serif"><br /><span
style="font-size:small"><br/>I am testing postgres_fdw join pushdown feature for PostgreSQL 9.6 DB, and I observed
belowissue.</span></font><u><br /><br />Observation:</u> Inner join and full outer join combination on a table
generatingwrong result.<br /><div class="gmail_extra"><div class="gmail_quote"><div
style="color:rgb(0,0,0);font-family:Helvetica;font-size:12px"><br/>SELECT * FROM lt;<br /> c1 <br />----<br />  1<br
/> 2<br />(2 rows)<br /><br />SELECT * FROM ft;<br /> c1 <br />----<br />  1<br />  2<br />(2 rows)<br /><br />\d+
ft<br/>                             Foreign table "public.ft"<br /> Column |  Type   | Modifiers | FDW Options |
Storage| Stats target | Description <br
/>--------+---------+-----------+-------------+---------+--------------+-------------<br/> c1     | integer |          
|            | plain   |              | <br />Server: link_server<br />FDW Options: (table_name 'lt')<br /><br
/>--innerjoin and full outer join on local tables<br />SELECT t1.c1,t2.c1,t3.c1 FROM lt t1 INNER JOIN lt t2 ON (t1.c1 =
t2.c1)FULL JOIN lt t3 ON (t2.c1 = t3.c1);<br /> c1 | c1 | c1 <br />----+----+----<br />  1 |  1 |  1<br />  2 |  2 | 
2<br/>(2 rows)<br /><br />--inner join and full outer join on corresponding foreign tables<br />SELECT
t1.c1,t2.c1,t3.c1FROM ft t1 INNER JOIN ft t2 ON (t1.c1 = t2.c1) FULL JOIN ft t3 ON (t2.c1 = t3.c1);<br /> c1 | c1 | c1
<br/>----+----+----<br />  1 |  1 |  1<br />  1 |  2 |   <br />  2 |  1 |   <br />  2 |  2 |  2<br />(4 rows)<br
/><spanstyle="font-size:13px"><br /></span><span style="color:rgb(0,0,0);font-family:Helvetica;font-size:12px">Thanks
&Regards,</span><br />Rajkumar Raghuwanshi<br />QMG, EnterpriseDB Corporation<br /></div><blockquote
class="gmail_quote"style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div
dir="ltr"><divclass="gmail_quote"><div dir="ltr"><div class="gmail_quote"><div
dir="ltr"></div></div></div></div></div></blockquote></div><br/></div></div> 
On 2016/03/28 18:17, Rajkumar Raghuwanshi wrote:
> I am testing postgres_fdw join pushdown feature for PostgreSQL 9.6 DB,
> and I observed below issue._
>
> Observation:_ Inner join and full outer join combination on a table
> generating wrong result.
>
> SELECT * FROM lt;
>   c1
> ----
>    1
>    2
> (2 rows)
>
> SELECT * FROM ft;
>   c1
> ----
>    1
>    2
> (2 rows)
>
> \d+ ft
>                               Foreign table "public.ft"
>   Column |  Type   | Modifiers | FDW Options | Storage | Stats target |
> Description
> --------+---------+-----------+-------------+---------+--------------+-------------
>   c1     | integer |           |             | plain   |              |
> Server: link_server
> FDW Options: (table_name 'lt')
>
> --inner join and full outer join on local tables
> SELECT t1.c1,t2.c1,t3.c1 FROM lt t1 INNER JOIN lt t2 ON (t1.c1 = t2.c1)
> FULL JOIN lt t3 ON (t2.c1 = t3.c1);
>   c1 | c1 | c1
> ----+----+----
>    1 |  1 |  1
>    2 |  2 |  2
> (2 rows)
>
> --inner join and full outer join on corresponding foreign tables
> SELECT t1.c1,t2.c1,t3.c1 FROM ft t1 INNER JOIN ft t2 ON (t1.c1 = t2.c1)
> FULL JOIN ft t3 ON (t2.c1 = t3.c1);
>   c1 | c1 | c1
> ----+----+----
>    1 |  1 |  1
>    1 |  2 |
>    2 |  1 |
>    2 |  2 |  2
> (4 rows)

I think the reason for that is in foreign_join_ok.  This in that function:
    switch (jointype)    {        case JOIN_INNER:            fpinfo->remote_conds = list_concat(fpinfo->remote_conds,

list_copy(fpinfo_i->remote_conds));            fpinfo->remote_conds = list_concat(fpinfo->remote_conds,

list_copy(fpinfo_o->remote_conds));            break;
        case JOIN_LEFT:            fpinfo->joinclauses = list_concat(fpinfo->joinclauses,

list_copy(fpinfo_i->remote_conds));            fpinfo->remote_conds = list_concat(fpinfo->remote_conds,

list_copy(fpinfo_o->remote_conds));            break;
        case JOIN_RIGHT:            fpinfo->joinclauses = list_concat(fpinfo->joinclauses,

list_copy(fpinfo_o->remote_conds));            fpinfo->remote_conds = list_concat(fpinfo->remote_conds,

list_copy(fpinfo_i->remote_conds));            break;
        case JOIN_FULL:            fpinfo->joinclauses = list_concat(fpinfo->joinclauses,

list_copy(fpinfo_i->remote_conds));            fpinfo->joinclauses = list_concat(fpinfo->joinclauses,

list_copy(fpinfo_o->remote_conds));            break;
        default:            /* Should not happen, we have just check this above */            elog(ERROR, "unsupported
jointype %d", jointype);    }
 

wrongly pulls up remote_conds from joining relations in the FULL JOIN 
case.  I think we should not pull up such conditions in the FULL JOIN case.

Best regards,
Etsuro Fujita






Observation:_ Inner join and full outer join combination on a table

generating wrong result.

SELECT * FROM lt;
  c1
----
   1
   2
(2 rows)

SELECT * FROM ft;
  c1
----
   1
   2
(2 rows)

\d+ ft
                              Foreign table "public.ft"
  Column |  Type   | Modifiers | FDW Options | Storage | Stats target |
Description
--------+---------+-----------+-------------+---------+--------------+-------------
  c1     | integer |           |             | plain   |              |
Server: link_server
FDW Options: (table_name 'lt')

--inner join and full outer join on local tables
SELECT t1.c1,t2.c1,t3.c1 FROM lt t1 INNER JOIN lt t2 ON (t1.c1 = t2.c1)
FULL JOIN lt t3 ON (t2.c1 = t3.c1);
  c1 | c1 | c1
----+----+----
   1 |  1 |  1
   2 |  2 |  2
(2 rows)

--inner join and full outer join on corresponding foreign tables
SELECT t1.c1,t2.c1,t3.c1 FROM ft t1 INNER JOIN ft t2 ON (t1.c1 = t2.c1)
FULL JOIN ft t3 ON (t2.c1 = t3.c1);
  c1 | c1 | c1
----+----+----
   1 |  1 |  1
   1 |  2 |
   2 |  1 |
   2 |  2 |  2
(4 rows)

Thanks Rajkumar for the detailed report.
 

I think the reason for that is in foreign_join_ok.  This in that function:

wrongly pulls up remote_conds from joining relations in the FULL JOIN case.  I think we should not pull up such conditions in the FULL JOIN case.


Right. For a full outer join, since each joining relation acts as outer for the other, we can not pull up the quals to either join clauses or other clauses. So, in such a case, we will need to encapsulate the joining relation with conditions into a subquery. Unfortunately, the current deparse logic does not handle this encapsulation. Adding that functionality so close to the feature freeze might be risky given the amount of code changes required.

PFA patch with a quick fix. A full outer join with either of the joining relations having WHERE conditions (or other clauses) is not pushed down. In the particular case that was reported, the bug triggered because of the way conditions are handled for an inner join. For an inner join, all the conditions in ON as well as WHERE clause are treated like they are part of WHERE clause. This allows pushing down a join even if there are unpushable join clauses. But the pushable conditions can be put back into the ON clause. This avoids using subqueries while deparsing.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Attachment
Thanks Ashutosh for the patch. I have applied and tested it. Now getting proper result for reported issue.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

On Tue, Mar 29, 2016 at 7:50 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

Observation:_ Inner join and full outer join combination on a table

generating wrong result.

SELECT * FROM lt;
  c1
----
   1
   2
(2 rows)

SELECT * FROM ft;
  c1
----
   1
   2
(2 rows)

\d+ ft
                              Foreign table "public.ft"
  Column |  Type   | Modifiers | FDW Options | Storage | Stats target |
Description
--------+---------+-----------+-------------+---------+--------------+-------------
  c1     | integer |           |             | plain   |              |
Server: link_server
FDW Options: (table_name 'lt')

--inner join and full outer join on local tables
SELECT t1.c1,t2.c1,t3.c1 FROM lt t1 INNER JOIN lt t2 ON (t1.c1 = t2.c1)
FULL JOIN lt t3 ON (t2.c1 = t3.c1);
  c1 | c1 | c1
----+----+----
   1 |  1 |  1
   2 |  2 |  2
(2 rows)

--inner join and full outer join on corresponding foreign tables
SELECT t1.c1,t2.c1,t3.c1 FROM ft t1 INNER JOIN ft t2 ON (t1.c1 = t2.c1)
FULL JOIN ft t3 ON (t2.c1 = t3.c1);
  c1 | c1 | c1
----+----+----
   1 |  1 |  1
   1 |  2 |
   2 |  1 |
   2 |  2 |  2
(4 rows)

Thanks Rajkumar for the detailed report.
 

I think the reason for that is in foreign_join_ok.  This in that function:

wrongly pulls up remote_conds from joining relations in the FULL JOIN case.  I think we should not pull up such conditions in the FULL JOIN case.


Right. For a full outer join, since each joining relation acts as outer for the other, we can not pull up the quals to either join clauses or other clauses. So, in such a case, we will need to encapsulate the joining relation with conditions into a subquery. Unfortunately, the current deparse logic does not handle this encapsulation. Adding that functionality so close to the feature freeze might be risky given the amount of code changes required.

PFA patch with a quick fix. A full outer join with either of the joining relations having WHERE conditions (or other clauses) is not pushed down. In the particular case that was reported, the bug triggered because of the way conditions are handled for an inner join. For an inner join, all the conditions in ON as well as WHERE clause are treated like they are part of WHERE clause. This allows pushing down a join even if there are unpushable join clauses. But the pushable conditions can be put back into the ON clause. This avoids using subqueries while deparsing.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

On 2016/03/29 23:20, Ashutosh Bapat wrote:
>     I think the reason for that is in foreign_join_ok.  This in that
>     function:
>
>     wrongly pulls up remote_conds from joining relations in the FULL
>     JOIN case.  I think we should not pull up such conditions in the
>     FULL JOIN case.

> Right. For a full outer join, since each joining relation acts as outer
> for the other, we can not pull up the quals to either join clauses or
> other clauses. So, in such a case, we will need to encapsulate the
> joining relation with conditions into a subquery. Unfortunately, the
> current deparse logic does not handle this encapsulation. Adding that
> functionality so close to the feature freeze might be risky given the
> amount of code changes required.
>
> PFA patch with a quick fix. A full outer join with either of the joining
> relations having WHERE conditions (or other clauses) is not pushed down.
> In the particular case that was reported, the bug triggered because of
> the way conditions are handled for an inner join. For an inner join, all
> the conditions in ON as well as WHERE clause are treated like they are
> part of WHERE clause. This allows pushing down a join even if there are
> unpushable join clauses. But the pushable conditions can be put back
> into the ON clause. This avoids using subqueries while deparsing.

I'm not sure that is a good idea.  As you mentioned, we could support 
FULL JOIN fully, by encapsulating a joining relation with conditions 
into a subquery.  And ISTM that it is relatively easy to do that, by 
borrowing some ideas from Hanada-san's original join pushdown patch.  If 
it's OK, I'll create a patch for that in a few days.

Sorry for speaking up late.

Best regards,
Etsuro Fujita







On Thu, Apr 14, 2016 at 8:42 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
On 2016/03/29 23:20, Ashutosh Bapat wrote:
    I think the reason for that is in foreign_join_ok.  This in that
    function:

    wrongly pulls up remote_conds from joining relations in the FULL
    JOIN case.  I think we should not pull up such conditions in the
    FULL JOIN case.

Right. For a full outer join, since each joining relation acts as outer
for the other, we can not pull up the quals to either join clauses or
other clauses. So, in such a case, we will need to encapsulate the
joining relation with conditions into a subquery. Unfortunately, the
current deparse logic does not handle this encapsulation. Adding that
functionality so close to the feature freeze might be risky given the
amount of code changes required.

PFA patch with a quick fix. A full outer join with either of the joining
relations having WHERE conditions (or other clauses) is not pushed down.
In the particular case that was reported, the bug triggered because of
the way conditions are handled for an inner join. For an inner join, all
the conditions in ON as well as WHERE clause are treated like they are
part of WHERE clause. This allows pushing down a join even if there are
unpushable join clauses. But the pushable conditions can be put back
into the ON clause. This avoids using subqueries while deparsing.

I'm not sure that is a good idea.  As you mentioned, we could support FULL JOIN fully, by encapsulating a joining relation with conditions into a subquery.  And ISTM that it is relatively easy to do that, by borrowing some ideas from Hanada-san's original join pushdown patch.  If it's OK, I'll create a patch for that in a few days.

In his patch the deparsing targetlist and conditions required that the FROM clause entries were ready with the columns from base relations and joins realiased. That's no more true. We deparse every Var node as <relation alias>.<column name> where relation alias is nothing but rN; N being index of RangeTblEntry. So, Hanada-san's method to deparse recursively can not be applied as such now.

Here's what needs to be done:
When we identify that certain relation (base or join) needs a subquery to be deparsed (because the join relation above it could not pull the quals up), we remember it in the upper join relation. Before deparsing 1. we walk the join tree and collect targetlists of all such relations, 2. associate column aliases with those targetlists (save the column alises in resname?) and craft a relation alias  3. associate the relations alias, column aliases and targetlists with the base relations involved in such relations (may be creating a list similar to rtable). While deparsing a Var node, we check if corresponding base relation is itself or part of a relation deparsed as a subquery. If it is then we lookup that Var in the targetlist associated with the base relation and use corresponding relation and column alias for deparsing it. Otherwise, deparse it as <relation alias>.<column name> usually.

That looks like a big code change to go after feature freeze. So, we will leave it for 9.7, unless RMT allows us to introduce that change.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On 2016/04/14 15:20, Ashutosh Bapat wrote:
> On Thu, Apr 14, 2016 at 8:42 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:

>     As you mentioned, we could
>     support FULL JOIN fully, by encapsulating a joining relation with
>     conditions into a subquery.  And ISTM that it is relatively easy to
>     do that, by borrowing some ideas from Hanada-san's original join
>     pushdown patch.  If it's OK, I'll create a patch for that in a few days.

> In his patch the deparsing targetlist and conditions required that the
> FROM clause entries were ready with the columns from base relations and
> joins realiased. That's no more true. We deparse every Var node as
> <relation alias>.<column name> where relation alias is nothing but rN; N
> being index of RangeTblEntry. So, Hanada-san's method to deparse
> recursively can not be applied as such now.

I think so, too.  I don't think his ideas could be applied as is.

> Here's what needs to be done:
> When we identify that certain relation (base or join) needs a subquery
> to be deparsed (because the join relation above it could not pull the
> quals up), we remember it in the upper join relation. Before deparsing
> 1. we walk the join tree and collect targetlists of all such relations,
> 2. associate column aliases with those targetlists (save the column
> alises in resname?) and craft a relation alias  3. associate the
> relations alias, column aliases and targetlists with the base relations
> involved in such relations (may be creating a list similar to rtable).
> While deparsing a Var node, we check if corresponding base relation is
> itself or part of a relation deparsed as a subquery. If it is then we
> lookup that Var in the targetlist associated with the base relation and
> use corresponding relation and column alias for deparsing it. Otherwise,
> deparse it as <relation alias>.<column name> usually.

Good to know.  That is what I have in mind, except for the way of
collecting subqueries' columns and associating those columns with
relation and column aliases, which I think can be done more easily.
Please find attached a WIP patch.  That patch works well at least for
queries in your patch.  Maybe I'm missing something, though.

> That looks like a big code change to go after feature freeze. So, we
> will leave it for 9.7, unless RMT allows us to introduce that change.

OK

Best regards,
Etsuro Fujita

Attachment
On Tue, Mar 29, 2016 at 10:20 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>> I think the reason for that is in foreign_join_ok.  This in that function:
>>
>> wrongly pulls up remote_conds from joining relations in the FULL JOIN
>> case.  I think we should not pull up such conditions in the FULL JOIN case.
>>
>
> Right. For a full outer join, since each joining relation acts as outer for
> the other, we can not pull up the quals to either join clauses or other
> clauses. So, in such a case, we will need to encapsulate the joining
> relation with conditions into a subquery. Unfortunately, the current deparse
> logic does not handle this encapsulation. Adding that functionality so close
> to the feature freeze might be risky given the amount of code changes
> required.
>
> PFA patch with a quick fix. A full outer join with either of the joining
> relations having WHERE conditions (or other clauses) is not pushed down. In
> the particular case that was reported, the bug triggered because of the way
> conditions are handled for an inner join. For an inner join, all the
> conditions in ON as well as WHERE clause are treated like they are part of
> WHERE clause. This allows pushing down a join even if there are unpushable
> join clauses. But the pushable conditions can be put back into the ON
> clause. This avoids using subqueries while deparsing.

Committed.

I think we should introduce subquery-based deparsing for 9.7, but I
agree it's better not to do it now.  I think we should try to handle
SEMI and ANTI joins that way, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company