Thread: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.

Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.

From
Rajkumar Raghuwanshi
Date:
Hi,

I am getting "ERROR:  unexpected expression in subquery output" and "ERROR:  variable not found in subplan target lists" errors, for "FOR UPDATE" with postgres_fdw. (when set enable_partition_wise_join to true);

Attached patch have queries which are throwing mentioned error on running make check in contrib/postgres_fdw folder.

An independent test case to reproduce this is given below.

SET enable_partition_wise_join TO true;

CREATE EXTENSION postgres_fdw;
CREATE SERVER fdw_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname 'postgres', port '5432', use_remote_estimate 'true');
CREATE USER MAPPING FOR PUBLIC SERVER fdw_server;

CREATE TABLE pt1 ( c1 int NOT NULL, c2 int NOT NULL, c3 text)PARTITION BY RANGE(c1);
CREATE TABLE pt1p1 (like pt1);
CREATE TABLE pt1p2 (like pt1);

CREATE TABLE pt2 (c1 int NOT NULL, c2 int NOT NULL, c3 text) PARTITION BY RANGE(c1);
CREATE TABLE pt2p1 (like pt2);
CREATE TABLE pt2p2 (like pt2);

INSERT INTO pt1p1 SELECT id, id + 1,'AAA' || to_char(id, 'FM000') FROM generate_series(-100, -1) id;
INSERT INTO pt1p2 SELECT id, id + 1,'AAA' || to_char(id, 'FM000') FROM generate_series(1, 99) id;
INSERT INTO pt2p1 SELECT id, id + 1,'AAA' || to_char(id, 'FM000') FROM generate_series(-100, -1) id;
INSERT INTO pt2p2 SELECT id, id + 1,'AAA' || to_char(id, 'FM000') FROM generate_series(1, 99) id;

CREATE FOREIGN TABLE ft1p1 PARTITION OF pt1 FOR VALUES FROM (MINVALUE) TO (0) SERVER fdw_server OPTIONS (table_name 'pt1p1');
CREATE FOREIGN TABLE ft1p2 PARTITION OF pt1 FOR VALUES FROM (0) TO (MAXVALUE) SERVER fdw_server OPTIONS (table_name 'pt1p2');
CREATE FOREIGN TABLE ft2p1 PARTITION OF pt2 FOR VALUES FROM (MINVALUE) TO (0) SERVER fdw_server OPTIONS (table_name 'pt2p1');
CREATE FOREIGN TABLE ft2p2 PARTITION OF pt2 FOR VALUES FROM (0) TO (MAXVALUE) SERVER fdw_server OPTIONS (table_name 'pt2p2');

ANALYZE pt1;
ANALYZE pt2;
ANALYZE ft1p1;
ANALYZE ft1p2;
ANALYZE ft2p1;
ANALYZE ft2p2;

EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM pt1  WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM pt1  WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM pt2 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
ERROR:  unexpected expression in subquery output

EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM pt1) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM pt1) t2 FULL JOIN (SELECT c1 FROM pt1) t3 ON (t2.c1 = t3.c1)) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
ERROR:  variable not found in subplan target lists

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation
Attachment
On Mon, Feb 12, 2018 at 6:00 PM, Rajkumar Raghuwanshi
<rajkumar.raghuwanshi@enterprisedb.com> wrote:
> Hi,
>
> I am getting "ERROR:  unexpected expression in subquery output" and "ERROR:
> variable not found in subplan target lists" errors, for "FOR UPDATE" with
> postgres_fdw. (when set enable_partition_wise_join to true);
>
> Attached patch have queries which are throwing mentioned error on running
> make check in contrib/postgres_fdw folder.
>

Thanks a lot for the patch. I can apply it and reproduce the error.

Here's my analysis of the bug.

The node for which this error comes is a ConvertRowtypeExpr node with
Var::varattno = 0 under it. Whole row reference of the parent is converted to
ConvertRowtypeExpr with whole row of child as an argument. When partition-wise
join is used, targetlist of child-joins contain such ConvertRowtypeExpr when
the parent-join's targetlist has whole-row references of joining
partitioned tables.

When we deparse the targetlist of join pushed down by postgres FDW,
build_tlist_to_deparse() pulls only Var nodes nodes from the join's targetlist.
So it pulls Var reprensenting a whole-row reference of a child from a
ConvertRowtypeExpr, when building targetlist to be deparsed for a child-join
with whole-row references. This targetlist is then saved as fdw_scan_tlist in
ForeignScanPlan.

This causes two problems shown by the two queries below


>
> EXPLAIN (VERBOSE, COSTS OFF)
> SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM pt1  WHERE c1 = 50) t1 INNER
> JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM pt1  WHERE c1 between 50 and
> 60) t2 FULL JOIN (SELECT c1 FROM pt2 WHERE c1 between 50 and 60) t3 ON
> (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE)
> ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
> ERROR:  unexpected expression in subquery output

get_relation_column_alias_ids() uses foreignrel's targetlist
(foreignrel->reltarget->exprs) as it is to locate given node to be deparsed.
If the joining relation corresponding to ConvertRowtypeExpr is deparsed as a
subquery, this function is called with whole-row reference node (Var node with
varattno = 0).  But the relation's targetlist doesn't contain its whole-row
reference directly but has it embedded in ConvertRowtypeExpr. So, the function
doesn't find the given node and throws error.


>
> EXPLAIN (VERBOSE, COSTS OFF)
> SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM pt1) t1 INNER JOIN (SELECT
> t2.c1, t3.c1 FROM (SELECT c1 FROM pt1) t2 FULL JOIN (SELECT c1 FROM pt1) t3
> ON (t2.c1 = t3.c1)) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE
> OF t1;
> ERROR:  variable not found in subplan target lists

When there is possibility of EvalPlanQual being called, we construct local
join plan matching the pushed down foreign join. In postgresGetForeignPlan()
after we have built the local join plan, the topmost plan node's targetlist is
changed to fdw_scan_tlist to match the output of the ForeignScan node. As
explained above, this targetlist contains a bare reference to whole-row
reference of a child relation if the child-join's targetlist contains a
ConvertRowtypeExpr. When changing the topmost plan node's targetlist, we do not
modify the targetlists of its left and right tree nodes. The left/right plan
involving corresponding child relation will have ConvertRowtypeExpr expression
in its targetlist, but not whole-row reference directly. When the topmost local
join plan node's targetlist is processed by set_plan_ref(), it throws error
"variable not found in subplan target lists" since it doesn't find bare
whole-row reference of the child relation in subplan's targetlists.

The problem can be solved in two ways:

1. Push down ConvertRowtypeExpr and include it in the pushed down targetlist.
This would solve both the problems described above. Both set_plan_ref() and
get_relation_column_alias_ids() will find ConvertRowtypeExpr, they are looking
for and won't throw an error.

This requires two parts
a. In build_tlist_to_deparse(), instead of pulling
Var node from ConvertRowtypeExpr, we pull whole ConvertRowtypeExpr and include
it in the targetlist being deparsed which is also used as to set
fdw_scan_tlist. In order to pull any ConvertRowtypeExpr's in the local quals,
which may be hidden in the expression tree, we will add two more options to
flags viz. PVC_INCLUDE_CONVERTROWTYPEEXPR and PVC_RECURSE_CONVERTROWTYPEEXPR.
Unlike the other PVC_* options, which do not default to a value, we may want to
default to PVC_RECURSE_CONVERTROWTYPEEXPR if nothing is specified. That will
avoid, possibly updating every pull_var_clause call with
PVC_RECURSE_CONVERTROWTYPEEXPR.
b. deparse ConvertRowtypeExpr
For this we need to get the conversion map between the parent and child. We
then deparse ConvertRowtypeExpr as a ROW() with the attributes of child
rearranged per the conversion map. A multi-level partitioned table will have
nested ConvertRowtypeExpr. To deparse such expressions, we need to find the
conversion map between the topmost parent and the child, by ignoring any
intermediate parents.

2. Modify the local join plan entirely to contain whole row reference of
child relations instead of ConvertRowtypeExpr and deparse a ConvertRowtypeExpr
as a whole-row reference in a subquery.
Again we need two part solution:
a. For this we need to write a walker which walks the plan tree distributing the
Vars in the topmost targetlist to the left and right subtrees. Thus we replace
ConvertRowtypeExpr with corresponding whole-row references in the whole plan
tree.
b. When get_relation_column_alias_ids() encounters a
ConvertRowtypeExpr, it pulls
out the embedded whole row reference and returns the corresponding column id.
deparseExpr() calls deparseVar() by pulling out the embedded whole-row
reference Var when it encouters ConvertRowtypeExpr.

Comments, thoughts any other solutions?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


(2018/02/13 21:51), Ashutosh Bapat wrote:
> Here's my analysis of the bug.
>
> The node for which this error comes is a ConvertRowtypeExpr node with
> Var::varattno = 0 under it. Whole row reference of the parent is converted to
> ConvertRowtypeExpr with whole row of child as an argument. When partition-wise
> join is used, targetlist of child-joins contain such ConvertRowtypeExpr when
> the parent-join's targetlist has whole-row references of joining
> partitioned tables.
>
> When we deparse the targetlist of join pushed down by postgres FDW,
> build_tlist_to_deparse() pulls only Var nodes nodes from the join's targetlist.
> So it pulls Var reprensenting a whole-row reference of a child from a
> ConvertRowtypeExpr, when building targetlist to be deparsed for a child-join
> with whole-row references. This targetlist is then saved as fdw_scan_tlist in
> ForeignScanPlan.
>
> This causes two problems shown by the two queries below

>> EXPLAIN (VERBOSE, COSTS OFF)
>> SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM pt1  WHERE c1 = 50) t1 INNER
>> JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM pt1  WHERE c1 between 50 and
>> 60) t2 FULL JOIN (SELECT c1 FROM pt2 WHERE c1 between 50 and 60) t3 ON
>> (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE)
>> ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
>> ERROR:  unexpected expression in subquery output
>
> get_relation_column_alias_ids() uses foreignrel's targetlist
> (foreignrel->reltarget->exprs) as it is to locate given node to be deparsed.
> If the joining relation corresponding to ConvertRowtypeExpr is deparsed as a
> subquery, this function is called with whole-row reference node (Var node with
> varattno = 0).  But the relation's targetlist doesn't contain its whole-row
> reference directly but has it embedded in ConvertRowtypeExpr. So, the function
> doesn't find the given node and throws error.

>> EXPLAIN (VERBOSE, COSTS OFF)
>> SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM pt1) t1 INNER JOIN (SELECT
>> t2.c1, t3.c1 FROM (SELECT c1 FROM pt1) t2 FULL JOIN (SELECT c1 FROM pt1) t3
>> ON (t2.c1 = t3.c1)) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE
>> OF t1;
>> ERROR:  variable not found in subplan target lists
>
> When there is possibility of EvalPlanQual being called, we construct local
> join plan matching the pushed down foreign join. In postgresGetForeignPlan()
> after we have built the local join plan, the topmost plan node's targetlist is
> changed to fdw_scan_tlist to match the output of the ForeignScan node. As
> explained above, this targetlist contains a bare reference to whole-row
> reference of a child relation if the child-join's targetlist contains a
> ConvertRowtypeExpr. When changing the topmost plan node's targetlist, we do not
> modify the targetlists of its left and right tree nodes. The left/right plan
> involving corresponding child relation will have ConvertRowtypeExpr expression
> in its targetlist, but not whole-row reference directly. When the topmost local
> join plan node's targetlist is processed by set_plan_ref(), it throws error
> "variable not found in subplan target lists" since it doesn't find bare
> whole-row reference of the child relation in subplan's targetlists.

Thanks for the analysis!

> The problem can be solved in two ways:
>
> 1. Push down ConvertRowtypeExpr and include it in the pushed down targetlist.
> This would solve both the problems described above. Both set_plan_ref() and
> get_relation_column_alias_ids() will find ConvertRowtypeExpr, they are looking
> for and won't throw an error.
>
> This requires two parts
> a. In build_tlist_to_deparse(), instead of pulling
> Var node from ConvertRowtypeExpr, we pull whole ConvertRowtypeExpr and include
> it in the targetlist being deparsed which is also used as to set
> fdw_scan_tlist. In order to pull any ConvertRowtypeExpr's in the local quals,
> which may be hidden in the expression tree, we will add two more options to
> flags viz. PVC_INCLUDE_CONVERTROWTYPEEXPR and PVC_RECURSE_CONVERTROWTYPEEXPR.
> Unlike the other PVC_* options, which do not default to a value, we may want to
> default to PVC_RECURSE_CONVERTROWTYPEEXPR if nothing is specified. That will
> avoid, possibly updating every pull_var_clause call with
> PVC_RECURSE_CONVERTROWTYPEEXPR.
> b. deparse ConvertRowtypeExpr
> For this we need to get the conversion map between the parent and child. We
> then deparse ConvertRowtypeExpr as a ROW() with the attributes of child
> rearranged per the conversion map. A multi-level partitioned table will have
> nested ConvertRowtypeExpr. To deparse such expressions, we need to find the
> conversion map between the topmost parent and the child, by ignoring any
> intermediate parents.
>
> 2. Modify the local join plan entirely to contain whole row reference of
> child relations instead of ConvertRowtypeExpr and deparse a ConvertRowtypeExpr
> as a whole-row reference in a subquery.
> Again we need two part solution:
> a. For this we need to write a walker which walks the plan tree distributing the
> Vars in the topmost targetlist to the left and right subtrees. Thus we replace
> ConvertRowtypeExpr with corresponding whole-row references in the whole plan
> tree.
> b. When get_relation_column_alias_ids() encounters a
> ConvertRowtypeExpr, it pulls
> out the embedded whole row reference and returns the corresponding column id.
> deparseExpr() calls deparseVar() by pulling out the embedded whole-row
> reference Var when it encouters ConvertRowtypeExpr.

I'd vote for #1, but ISTM that that's more like a feature, not a fix. 
Pushing down ConvertRowtypeExprs to the remote seems to me to be the 
same problem as pushing down PHVs to the remote, which is definitely a 
feature development.  I think a fix for this would be to just give up 
pushing down a child join in foreign_join_ok or somewhere else if the 
reltarget of the child-join relation contains any ConvertRowtypeExprs.

Best regards,
Etsuro Fujita


On Wed, Feb 14, 2018 at 2:50 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
>
> I'd vote for #1, but ISTM that that's more like a feature, not a fix.
> Pushing down ConvertRowtypeExprs to the remote seems to me to be the same
> problem as pushing down PHVs to the remote, which is definitely a feature
> development.  I think a fix for this would be to just give up pushing down a
> child join in foreign_join_ok or somewhere else if the reltarget of the
> child-join relation contains any ConvertRowtypeExprs.

That means we won't be able to push down any scans under a DML on a
partitioned table. That will be too restrictive.

I think the case of PHV and ConvertRowtypeExprs are different. The the
former we need a subquery to push PHVs, whereas to deparse
ConvertRowtypeExpr on the nullable side we can use the same strategy
as whole-row expression deparsing.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


On Tue, Feb 13, 2018 at 6:21 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>
> 1. Push down ConvertRowtypeExpr and include it in the pushed down targetlist.
> This would solve both the problems described above. Both set_plan_ref() and
> get_relation_column_alias_ids() will find ConvertRowtypeExpr, they are looking
> for and won't throw an error.
>
> This requires two parts
> a. In build_tlist_to_deparse(), instead of pulling
> Var node from ConvertRowtypeExpr, we pull whole ConvertRowtypeExpr and include
> it in the targetlist being deparsed which is also used as to set
> fdw_scan_tlist. In order to pull any ConvertRowtypeExpr's in the local quals,
> which may be hidden in the expression tree, we will add two more options to
> flags viz. PVC_INCLUDE_CONVERTROWTYPEEXPR and PVC_RECURSE_CONVERTROWTYPEEXPR.
> Unlike the other PVC_* options, which do not default to a value, we may want to
> default to PVC_RECURSE_CONVERTROWTYPEEXPR if nothing is specified. That will
> avoid, possibly updating every pull_var_clause call with
> PVC_RECURSE_CONVERTROWTYPEEXPR.
> b. deparse ConvertRowtypeExpr
> For this we need to get the conversion map between the parent and child. We
> then deparse ConvertRowtypeExpr as a ROW() with the attributes of child
> rearranged per the conversion map. A multi-level partitioned table will have
> nested ConvertRowtypeExpr. To deparse such expressions, we need to find the
> conversion map between the topmost parent and the child, by ignoring any
> intermediate parents.
>
>

Here's patchset implementing this solution.

0001 adds PVC_*_CONVERTROWTYPEEXPR to pull_var_clause() and adjusts its callers.

0002 fixes a similar bug for regular partitioned tables. The patch has
testcase. The commit message explains the bug in more detail.

0003 has postgres_fdw fixes as outlined above with tests.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment
(2018/02/20 18:13), Ashutosh Bapat wrote:
> Here's patchset implementing this solution.
>
> 0001 adds PVC_*_CONVERTROWTYPEEXPR to pull_var_clause() and adjusts its callers.
>
> 0002 fixes a similar bug for regular partitioned tables. The patch has
> testcase. The commit message explains the bug in more detail.
>
> 0003 has postgres_fdw fixes as outlined above with tests.

The patch 0003 doesn't applies to the latest HEAD any more, so please 
update the patch.  I think this should be an open item for PG11.  My 
apologizes for not having reviewed the patch in the last commitfest.

Best regards,
Etsuro Fujita


On Tue, Apr 17, 2018 at 2:05 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> (2018/02/20 18:13), Ashutosh Bapat wrote:
>>
>> Here's patchset implementing this solution.
>>
>> 0001 adds PVC_*_CONVERTROWTYPEEXPR to pull_var_clause() and adjusts its
>> callers.
>>
>> 0002 fixes a similar bug for regular partitioned tables. The patch has
>> testcase. The commit message explains the bug in more detail.
>>
>> 0003 has postgres_fdw fixes as outlined above with tests.
>
>
> The patch 0003 doesn't applies to the latest HEAD any more, so please update
> the patch.  I think this should be an open item for PG11.  My apologizes for
> not having reviewed the patch in the last commitfest.

Here's updated patch-set.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment
(2018/04/17 18:43), Ashutosh Bapat wrote:
> On Tue, Apr 17, 2018 at 2:05 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> (2018/02/20 18:13), Ashutosh Bapat wrote:
>>>
>>> Here's patchset implementing this solution.
>>>
>>> 0001 adds PVC_*_CONVERTROWTYPEEXPR to pull_var_clause() and adjusts its
>>> callers.
>>>
>>> 0002 fixes a similar bug for regular partitioned tables. The patch has
>>> testcase. The commit message explains the bug in more detail.
>>>
>>> 0003 has postgres_fdw fixes as outlined above with tests.
>>
>>
>> The patch 0003 doesn't applies to the latest HEAD any more, so please update
>> the patch.  I think this should be an open item for PG11.  My apologizes for
>> not having reviewed the patch in the last commitfest.
>
> Here's updated patch-set.

Thanks for updating the patch set!  Will review.

Best regards,
Etsuro Fujita


(2018/04/17 19:00), Etsuro Fujita wrote:
> (2018/04/17 18:43), Ashutosh Bapat wrote:
>> Here's updated patch-set.
>
> Will review.

I started reviewing this.

o 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch:

+   else if (IsA(node, ConvertRowtypeExpr))
+   {
+#ifdef USE_ASSERT_CHECKING
+       ConvertRowtypeExpr *cre = castNode(ConvertRowtypeExpr, node);
+       Var        *var;
+
+       /*
+        * ConvertRowtypeExprs only result when parent's whole-row 
reference is
+        * translated for a child using adjust_appendrel_attrs(). That 
function
+        * does not handle any upper level Var references.
+        */
+       while (IsA(cre->arg, ConvertRowtypeExpr))
+           cre = castNode(ConvertRowtypeExpr, cre->arg);
+       var = castNode(Var, cre->arg);
+       Assert (var->varlevelsup == 0);
+#endif /* USE_ASSERT_CHECKING */

Isn't it better to throw ERROR as in other cases in pull_vars_clause()? 
  Another thing I noticed about this patch is this:

postgres=# create table prt1 (a int, b int, c varchar) partition by 
range (a);
postgres=# create table prt1_p1 partition of prt1 FOR VALUES FROM (0) TO 
(250);
postgres=# create table prt1_p2 partition of prt1 FOR VALUES FROM (250) 
TO (500)
;
postgres=# insert into prt1 select i, i % 25, to_char(i, 'FM0000') from 
generate
_series(0, 499) i where i % 2 = 0;
postgres=# analyze prt1;
postgres=# create table prt2 (a int, b int, c varchar) partition by 
range (b);
postgres=# create table prt2_p1 partition of prt2 FOR VALUES FROM (0) TO 
(250);
postgres=# create table prt2_p2 partition of prt2 FOR VALUES FROM (250) 
TO (500)
;
postgres=# insert into prt2 select i % 25, i, to_char(i, 'FM0000') from 
generate
_series(0, 499) i where i % 3 = 0;
postgres=# analyze prt2;

postgres=# update prt1 t1 set c = 'foo' from prt2 t2 where t1::text = 
t2::text a
nd t1.a = t2.b;
ERROR:  ConvertRowtypeExpr found where not expected

To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES to 
pull_vars_clause() in distribute_qual_to_rels() and 
generate_base_implied_equalities_no_const() as well.

That's all I have for now.  Will continue the review.

Best regards,
Etsuro Fujita


On Tue, Apr 24, 2018 at 4:49 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> (2018/04/17 19:00), Etsuro Fujita wrote:
>>
>> (2018/04/17 18:43), Ashutosh Bapat wrote:
>>>
>>> Here's updated patch-set.
>>
>>
>> Will review.
>
>
> I started reviewing this.
>
> o 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch:
>
> +   else if (IsA(node, ConvertRowtypeExpr))
> +   {
> +#ifdef USE_ASSERT_CHECKING
> +       ConvertRowtypeExpr *cre = castNode(ConvertRowtypeExpr, node);
> +       Var        *var;
> +
> +       /*
> +        * ConvertRowtypeExprs only result when parent's whole-row reference
> is
> +        * translated for a child using adjust_appendrel_attrs(). That
> function
> +        * does not handle any upper level Var references.
> +        */
> +       while (IsA(cre->arg, ConvertRowtypeExpr))
> +           cre = castNode(ConvertRowtypeExpr, cre->arg);
> +       var = castNode(Var, cre->arg);
> +       Assert (var->varlevelsup == 0);
> +#endif /* USE_ASSERT_CHECKING */
>
> Isn't it better to throw ERROR as in other cases in pull_vars_clause()?

Actually I noticed that ConvertRowtypeExpr are used to cast a child's
whole row reference expression (not just a Var node) into that of its
parent and not. For example a cast like NULL::child::parent produces a
ConvertRowtypeExpr encapsulating a NULL constant node and not a Var
node. We are interested only in ConvertRowtypeExprs embedding Var
nodes with Var::varattno = 0. I have changed this code to use function
is_converted_whole_row_reference() instead of the above code with
Assert. In order to use that function, I had to take it out of
setrefs.c and put it in nodeFuncs.c which seemed to be a better fit.

> Another thing I noticed about this patch is this:
>
> postgres=# create table prt1 (a int, b int, c varchar) partition by range
> (a);
> postgres=# create table prt1_p1 partition of prt1 FOR VALUES FROM (0) TO
> (250);
> postgres=# create table prt1_p2 partition of prt1 FOR VALUES FROM (250) TO
> (500)
> ;
> postgres=# insert into prt1 select i, i % 25, to_char(i, 'FM0000') from
> generate
> _series(0, 499) i where i % 2 = 0;
> postgres=# analyze prt1;
> postgres=# create table prt2 (a int, b int, c varchar) partition by range
> (b);
> postgres=# create table prt2_p1 partition of prt2 FOR VALUES FROM (0) TO
> (250);
> postgres=# create table prt2_p2 partition of prt2 FOR VALUES FROM (250) TO
> (500)
> ;
> postgres=# insert into prt2 select i % 25, i, to_char(i, 'FM0000') from
> generate
> _series(0, 499) i where i % 3 = 0;
> postgres=# analyze prt2;
>
> postgres=# update prt1 t1 set c = 'foo' from prt2 t2 where t1::text =
> t2::text a
> nd t1.a = t2.b;
> ERROR:  ConvertRowtypeExpr found where not expected
>
> To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES to
> pull_vars_clause() in distribute_qual_to_rels() and
> generate_base_implied_equalities_no_const() as well.

Thanks for the catch. I have updated patch with your suggested fix.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment
(2018/04/25 18:51), Ashutosh Bapat wrote:
> On Tue, Apr 24, 2018 at 4:49 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> o 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch:

>> Another thing I noticed about this patch is this:
>>
>> postgres=# create table prt1 (a int, b int, c varchar) partition by range
>> (a);
>> postgres=# create table prt1_p1 partition of prt1 FOR VALUES FROM (0) TO
>> (250);
>> postgres=# create table prt1_p2 partition of prt1 FOR VALUES FROM (250) TO
>> (500)
>> ;
>> postgres=# insert into prt1 select i, i % 25, to_char(i, 'FM0000') from
>> generate
>> _series(0, 499) i where i % 2 = 0;
>> postgres=# analyze prt1;
>> postgres=# create table prt2 (a int, b int, c varchar) partition by range
>> (b);
>> postgres=# create table prt2_p1 partition of prt2 FOR VALUES FROM (0) TO
>> (250);
>> postgres=# create table prt2_p2 partition of prt2 FOR VALUES FROM (250) TO
>> (500)
>> ;
>> postgres=# insert into prt2 select i % 25, i, to_char(i, 'FM0000') from
>> generate
>> _series(0, 499) i where i % 3 = 0;
>> postgres=# analyze prt2;
>>
>> postgres=# update prt1 t1 set c = 'foo' from prt2 t2 where t1::text =
>> t2::text a
>> nd t1.a = t2.b;
>> ERROR:  ConvertRowtypeExpr found where not expected
>>
>> To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES to
>> pull_vars_clause() in distribute_qual_to_rels() and
>> generate_base_implied_equalities_no_const() as well.
>
> Thanks for the catch. I have updated patch with your suggested fix.

Thanks, but I don't think that's enough.  Consider:

postgres=# create table base_tbl (a int, b int);
postgres=# insert into base_tbl select i % 25, i from generate_series(0, 
499) i where i % 6 = 0;

postgres=# update prt1 t1 set c = 'foo' from prt2 t2 left join (select 
a, b, 1 from base_tbl) ss(c1, c2, c3) on (t2.b = ss.c2) where t1::text = 
t2::text and t1.a = t2.b;
ERROR:  ConvertRowtypeExpr found where not expected

To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES to 
pull_var_clause() in find_placeholders_in_expr() as well.

The patch doesn't pass that to pull_var_clause() in other places such as 
fix_placeholder_input_needed_levels() or planner.c, but I don't 
understand the reason why that's OK.

Sorry, I've not finished the review yet, so I'll continue that.

Thanks for updating the patch!

Best regards,
Etsuro Fujita


On Thu, Apr 26, 2018 at 5:36 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> (2018/04/25 18:51), Ashutosh Bapat wrote:
>>
>> On Tue, Apr 24, 2018 at 4:49 PM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>>>
>>> o 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch:
>
>
>>> Another thing I noticed about this patch is this:
>>>
>>> postgres=# create table prt1 (a int, b int, c varchar) partition by range
>>> (a);
>>> postgres=# create table prt1_p1 partition of prt1 FOR VALUES FROM (0) TO
>>> (250);
>>> postgres=# create table prt1_p2 partition of prt1 FOR VALUES FROM (250)
>>> TO
>>> (500)
>>> ;
>>> postgres=# insert into prt1 select i, i % 25, to_char(i, 'FM0000') from
>>> generate
>>> _series(0, 499) i where i % 2 = 0;
>>> postgres=# analyze prt1;
>>> postgres=# create table prt2 (a int, b int, c varchar) partition by range
>>> (b);
>>> postgres=# create table prt2_p1 partition of prt2 FOR VALUES FROM (0) TO
>>> (250);
>>> postgres=# create table prt2_p2 partition of prt2 FOR VALUES FROM (250)
>>> TO
>>> (500)
>>> ;
>>> postgres=# insert into prt2 select i % 25, i, to_char(i, 'FM0000') from
>>> generate
>>> _series(0, 499) i where i % 3 = 0;
>>> postgres=# analyze prt2;
>>>
>>> postgres=# update prt1 t1 set c = 'foo' from prt2 t2 where t1::text =
>>> t2::text a
>>> nd t1.a = t2.b;
>>> ERROR:  ConvertRowtypeExpr found where not expected
>>>
>>> To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES to
>>> pull_vars_clause() in distribute_qual_to_rels() and
>>> generate_base_implied_equalities_no_const() as well.
>>
>>
>> Thanks for the catch. I have updated patch with your suggested fix.
>
>
> Thanks, but I don't think that's enough.  Consider:
>
> postgres=# create table base_tbl (a int, b int);
> postgres=# insert into base_tbl select i % 25, i from generate_series(0,
> 499) i where i % 6 = 0;
>
> postgres=# update prt1 t1 set c = 'foo' from prt2 t2 left join (select a, b,
> 1 from base_tbl) ss(c1, c2, c3) on (t2.b = ss.c2) where t1::text = t2::text
> and t1.a = t2.b;
> ERROR:  ConvertRowtypeExpr found where not expected
>

Thanks for the test.

> To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES to
> pull_var_clause() in find_placeholders_in_expr() as well.
>
> The patch doesn't pass that to pull_var_clause() in other places such as
> fix_placeholder_input_needed_levels() or planner.c, but I don't understand
> the reason why that's OK.

I was trying to be conservative not to include
PVC_RECURSE_CONVERTROWTYPES everywhere. But it looks like because of
inheritance_planner() and partition-wise aggregate commit, we can have
ConvertRowtypeExprs almost everywhere. Added
PVC_RECURSE_CONVERTROWTYPEs in almost all the places except the ones
listed in 0001 patch.

>
> Sorry, I've not finished the review yet, so I'll continue that.

Thanks for the tests and review.

Here's updated patch set.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment
(2018/04/27 14:40), Ashutosh Bapat wrote:
> Here's updated patch set.

Thanks for updating the patch!  Here are my review comments on patch 
0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch:

* This assertion in deparseConvertRowtypeExpr wouldn't always be true 
because of cases like ALTER FOREIGN TABLE DROP COLUMN plus ALTER FOREIGN 
TABLE ADD COLUMN:

+   Assert(parent_desc->natts == child_desc->natts);

Here is such an example:

postgres=# create table fprt1 (a int, b int, c varchar) partition by 
range (a);
postgres=# create table fprt1_p1 (like fprt1);
postgres=# create table fprt1_p2 (like fprt1);
postgres=# create foreign table ftprt1_p1 (a int, b int, c varchar) 
server loopback options (table_name 'fprt1_p1', use_remote_estimate 'true');
postgres=# alter foreign table ftprt1_p1 drop column c;
postgres=# alter foreign table ftprt1_p1 add column c varchar;
postgres=# alter table fprt1 attach partition ftprt1_p1 for values from 
(0) to (250);
postgres=# create foreign table ftprt1_p2 partition of fprt1 for values 
from (250) to (500) server loopback options (table_name 'fprt1_p2', 
use_remote_estimate 'true');
postgres=# insert into fprt1 select i, i, to_char(i/50, 'FM0000') from 
generate_series(0, 499, 2) i;
postgres=# analyze fprt1;
postgres=# analyze fprt1_p1;
postgres=# analyze fprt1_p2;
postgres=# create table fprt2 (a int, b int, c varchar) partition by 
range (b);
postgres=# create table fprt2_p1 (like fprt2);
postgres=# create table fprt2_p2 (like fprt2);
postgres=# create foreign table ftprt2_p1 partition of fprt2 for values 
from (0) to (250) server loopback options (table_name 'fprt2_p1', 
use_remote_estimate 'true');
postgres=# create foreign table ftprt2_p2 partition of fprt2 for values 
from (250) to (500) server loopback options (table_name 'fprt2_p2', 
use_remote_estimate 'true');
postgres=# insert into fprt2 select i, i, to_char(i/50, 'FM0000') from 
generate_series(0, 499, 3) i;
postgres=# analyze fprt2;
postgres=# analyze fprt2_p1;
postgres=# analyze fprt2_p2;
postgres=# set enable_partitionwise_join to true;
postgres=# select t1, t2 from fprt1 t1 join fprt2 t2 on (t1.a = t2.b) 
where t1.a % 25 = 0;
server closed the connection unexpectedly
         This probably means the server terminated abnormally
         before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

I think we should remove that assertion.

* But I don't think that change is enough.  Here is another oddity with 
that assertion removed.

postgres=# alter table fprt1 detach partition ftprt1_p1;
postgres=# alter table fprt1 detach partition ftprt1_p2;
postgres=# alter table fprt1 drop column c;
postgres=# alter table fprt1 add column c varchar;
postgres=# alter table fprt1 attach partition ftprt1_p1 for values from 
(0) to (250);
postgres=# alter table fprt1 attach partition ftprt1_p2 for values from 
(250) to (500);
postgres=# set enable_partitionwise_join to true;
postgres=# select t1, t2 from fprt1 t1 join fprt2 t2 on (t1.a = t2.b) 
where t1.a % 25 = 0;
ERROR:  malformed record literal: "(300,300,"(300,300,0006)",0006)"
DETAIL:  Too many columns.
CONTEXT:  processing expression at position 1 in select list

The reason for that would be: in the following, the patch doesn't take 
care of dropped columns in the parent table (ie, columns such that 
attrMap[cnt]=0).

+   /* Construct ROW expression according to the conversion map. */
+   appendStringInfoString(buf, "ROW(");
+   for (cnt = 0; cnt < natts; cnt++)
+   {
+       if (cnt > 0)
+           appendStringInfoString(buf, ", ");
+       deparseColumnRef(buf, child_var->varno, attrMap[cnt], root,
+                        qualify_col);
+   }
+   appendStringInfoChar(buf, ')');

To fix this, I think we should skip the deparseColumnRef processing for 
dropped columns in the parent table.

* I think it would be better to do EXPLAIN with the VERBOSE option to 
the queries this patch added to the regression tests, to see if 
ConvertRowtypeExprs are correctly deparsed in the remote queries.

That's all I have for now.

Best regards,
Etsuro Fujita


On Tue, May 1, 2018 at 5:00 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> (2018/04/27 14:40), Ashutosh Bapat wrote:
>>
>> Here's updated patch set.
>
>
> Thanks for updating the patch!  Here are my review comments on patch
> 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch:
>
> * This assertion in deparseConvertRowtypeExpr wouldn't always be true
> because of cases like ALTER FOREIGN TABLE DROP COLUMN plus ALTER FOREIGN
> TABLE ADD COLUMN:
>
> +   Assert(parent_desc->natts == child_desc->natts);
>
> Here is such an example:

>
> I think we should remove that assertion.

Removed.

>
> * But I don't think that change is enough.  Here is another oddity with that
> assertion removed.

>
> To fix this, I think we should skip the deparseColumnRef processing for
> dropped columns in the parent table.

Done.

I have changed the test file a bit to test these scenarios.

Thanks for testing.

>
> * I think it would be better to do EXPLAIN with the VERBOSE option to the
> queries this patch added to the regression tests, to see if
> ConvertRowtypeExprs are correctly deparsed in the remote queries.

The reason VERBOSE option was omitted for partition-wise join was to
avoid repeated and excessive output for every child-join. I think that
still applies.

PFA patch-set with the fixes.

I also noticed that the new function deparseConvertRowtypeExpr is not
quite following the naming convention of the other deparseFoo
functions. Foo is usually the type of the node the parser would
produced when the SQL string produced by that function is parsed. That
doesn't hold for the SQL string produced by ConvertRowtypeExpr but
then naming it as generic deparseRowExpr() wouldn't be correct either.
And then there are functions like deparseColumnRef which may produce a
variety of SQL strings which get parsed into different node types e.g.
a whole-row reference would produce ROW(...) which gets parsed as a
RowExpr. Please let me know if you have any suggestions for the name.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment
(2018/05/08 16:20), Ashutosh Bapat wrote:
> On Tue, May 1, 2018 at 5:00 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> Here are my review comments on patch
>> 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch:
>>
>> * This assertion in deparseConvertRowtypeExpr wouldn't always be true
>> because of cases like ALTER FOREIGN TABLE DROP COLUMN plus ALTER FOREIGN
>> TABLE ADD COLUMN:
>>
>> +   Assert(parent_desc->natts == child_desc->natts);

>> I think we should remove that assertion.
>
> Removed.

>> * But I don't think that change is enough.  Here is another oddity with that
>> assertion removed.

>> To fix this, I think we should skip the deparseColumnRef processing for
>> dropped columns in the parent table.
>
> Done.

Thanks for those changes!

> I have changed the test file a bit to test these scenarios.

+1

>> * I think it would be better to do EXPLAIN with the VERBOSE option to the
>> queries this patch added to the regression tests, to see if
>> ConvertRowtypeExprs are correctly deparsed in the remote queries.
>
> The reason VERBOSE option was omitted for partition-wise join was to
> avoid repeated and excessive output for every child-join. I think that
> still applies.

I'd like to leave that for the committer's judge.

> PFA patch-set with the fixes.

Thanks for updating the patch!

> I also noticed that the new function deparseConvertRowtypeExpr is not
> quite following the naming convention of the other deparseFoo
> functions. Foo is usually the type of the node the parser would
> produced when the SQL string produced by that function is parsed. That
> doesn't hold for the SQL string produced by ConvertRowtypeExpr but
> then naming it as generic deparseRowExpr() wouldn't be correct either.
> And then there are functions like deparseColumnRef which may produce a
> variety of SQL strings which get parsed into different node types e.g.
> a whole-row reference would produce ROW(...) which gets parsed as a
> RowExpr. Please let me know if you have any suggestions for the name.

To be honest, I don't have any strong opinion about that.  But I like 
"deparseConvertRowtypeExpr" because that name seems to well represent 
the functionality, so I'd vote for that.

BTW, one thing I'm a bit concerned about is this:

(2018/04/25 18:51), Ashutosh Bapat wrote:
 > Actually I noticed that ConvertRowtypeExpr are used to cast a child's
 > whole row reference expression (not just a Var node) into that of its
 > parent and not. For example a cast like NULL::child::parent produces a
 > ConvertRowtypeExpr encapsulating a NULL constant node and not a Var
 > node. We are interested only in ConvertRowtypeExprs embedding Var
 > nodes with Var::varattno = 0. I have changed this code to use function
 > is_converted_whole_row_reference() instead of the above code with
 > Assert. In order to use that function, I had to take it out of
 > setrefs.c and put it in nodeFuncs.c which seemed to be a better fit.

This change seems a bit confusing to me because the flag bits 
"PVC_INCLUDE_CONVERTROWTYPES" and "PVC_RECURSE_CONVERTROWTYPES" passed 
to pull_var_clause look as if it handles any ConvertRowtypeExpr nodes 
from a given clause, but really, it only handles ConvertRowtypeExprs you 
mentioned above.  To make that function easy to understand and use, I 
think it'd be better to use the IsA(node, ConvertRowtypeExpr) test as in 
the first version of the patch, instead of 
is_converted_whole_row_reference, which would be more consistent with 
other cases such as PlaceHolderVar.

Best regards,
Etsuro Fujita


On Wed, May 9, 2018 at 5:20 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
>
> (2018/04/25 18:51), Ashutosh Bapat wrote:
>> Actually I noticed that ConvertRowtypeExpr are used to cast a child's
>> whole row reference expression (not just a Var node) into that of its
>> parent and not. For example a cast like NULL::child::parent produces a
>> ConvertRowtypeExpr encapsulating a NULL constant node and not a Var
>> node. We are interested only in ConvertRowtypeExprs embedding Var
>> nodes with Var::varattno = 0. I have changed this code to use function
>> is_converted_whole_row_reference() instead of the above code with
>> Assert. In order to use that function, I had to take it out of
>> setrefs.c and put it in nodeFuncs.c which seemed to be a better fit.
>
> This change seems a bit confusing to me because the flag bits
> "PVC_INCLUDE_CONVERTROWTYPES" and "PVC_RECURSE_CONVERTROWTYPES" passed to
> pull_var_clause look as if it handles any ConvertRowtypeExpr nodes from a
> given clause, but really, it only handles ConvertRowtypeExprs you mentioned
> above.  To make that function easy to understand and use, I think it'd be
> better to use the IsA(node, ConvertRowtypeExpr) test as in the first version
> of the patch, instead of is_converted_whole_row_reference, which would be
> more consistent with other cases such as PlaceHolderVar.

I agree that using is_converted_whole_row_reference() is not
consistent with the other nodes that are handled by pull_var_clause().
I also agree that PVC_*_CONVERTROWTYPES doesn't reflect exactly what's
being done with those options. But using
is_converted_whole_row_reference() is the correct thing to do since we
are interested only in the whole-row references embedded in
ConvertRowtypeExpr. There can be anything encapsulated in
ConvertRowtypeExpr(), a non-shippable function for example. We don't
want to try to push that down in postgres_fdw's case. Neither in other
cases. So, it boils down to changing names of PVC_*_CONVERTROWTYPES to
something more meaningful. How about PVC_*_CONVERTWHOLEROWS?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


(2018/05/10 13:04), Ashutosh Bapat wrote:
> On Wed, May 9, 2018 at 5:20 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> (2018/04/25 18:51), Ashutosh Bapat wrote:
>>> Actually I noticed that ConvertRowtypeExpr are used to cast a child's
>>> whole row reference expression (not just a Var node) into that of its
>>> parent and not. For example a cast like NULL::child::parent produces a
>>> ConvertRowtypeExpr encapsulating a NULL constant node and not a Var
>>> node. We are interested only in ConvertRowtypeExprs embedding Var
>>> nodes with Var::varattno = 0. I have changed this code to use function
>>> is_converted_whole_row_reference() instead of the above code with
>>> Assert. In order to use that function, I had to take it out of
>>> setrefs.c and put it in nodeFuncs.c which seemed to be a better fit.
>>
>> This change seems a bit confusing to me because the flag bits
>> "PVC_INCLUDE_CONVERTROWTYPES" and "PVC_RECURSE_CONVERTROWTYPES" passed to
>> pull_var_clause look as if it handles any ConvertRowtypeExpr nodes from a
>> given clause, but really, it only handles ConvertRowtypeExprs you mentioned
>> above.  To make that function easy to understand and use, I think it'd be
>> better to use the IsA(node, ConvertRowtypeExpr) test as in the first version
>> of the patch, instead of is_converted_whole_row_reference, which would be
>> more consistent with other cases such as PlaceHolderVar.
>
> I agree that using is_converted_whole_row_reference() is not
> consistent with the other nodes that are handled by pull_var_clause().
> I also agree that PVC_*_CONVERTROWTYPES doesn't reflect exactly what's
> being done with those options. But using
> is_converted_whole_row_reference() is the correct thing to do since we
> are interested only in the whole-row references embedded in
> ConvertRowtypeExpr. There can be anything encapsulated in
> ConvertRowtypeExpr(), a non-shippable function for example. We don't
> want to try to push that down in postgres_fdw's case. Neither in other
> cases.

Yeah, but I think the IsA test would be sufficient to make things work 
correctly because we can assume that there aren't any other nodes than 
Var, PHV, and CRE translating a wholerow value of a child table into a 
wholerow value of its parent table's rowtype in the expression clause to 
which we apply pull_var_clause with PVC_INCLUDE_CONVERTROWTYPES.

BTW, one thing I noticed about the new version of the patch is: there is 
yet another case where pull_var_clause produces an error.  Here is an 
example:

postgres=# create table t1 (a int, b text);
CREATE TABLE
postgres=# create table t2 (a int, b text);
CREATE TABLE
postgres=# create foreign table ft1 (a int, b text) server loopback 
options (table_name 't1');
CREATE FOREIGN TABLE
postgres=# create foreign table ft2 (a int, b text) server loopback 
options (table_name 't2');
CREATE FOREIGN TABLE
postgres=# insert into ft1 values (1, 'foo');
INSERT 0 1
postgres=# insert into ft1 values (2, 'bar');
INSERT 0 1
postgres=# insert into ft2 values (1, 'test1');
INSERT 0 1
postgres=# insert into ft2 values (2, 'test2');
INSERT 0 1
postgres=# analyze ft1;
ANALYZE
postgres=# analyze ft2;
ANALYZE
postgres=# create table parent (a int, b text);
CREATE TABLE
postgres=# alter foreign table ft1 inherit parent;
ALTER FOREIGN TABLE
postgres=# explain verbose update parent set b = ft2.b from ft2 where 
parent.a = ft2.a returning parent;
ERROR:  ConvertRowtypeExpr found where not expected

To produce this, apply the patch in [1] as well as the new version; 
without that patch in [1], the update query would crash the server (see 
[1] for details).  To fix this, I think we need to pass 
PVC_RECURSE_CONVERTROWTYPES to pull_var_clause in build_remote_returning 
in postgres_fdw.c as well.

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5AF43E02.30000%40lab.ntt.co.jp


On Thu, May 10, 2018 at 6:23 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> (2018/05/10 13:04), Ashutosh Bapat wrote:
>>
>> On Wed, May 9, 2018 at 5:20 PM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>>>
>>> (2018/04/25 18:51), Ashutosh Bapat wrote:
>>>>
>>>> Actually I noticed that ConvertRowtypeExpr are used to cast a child's
>>>> whole row reference expression (not just a Var node) into that of its
>>>> parent and not. For example a cast like NULL::child::parent produces a
>>>> ConvertRowtypeExpr encapsulating a NULL constant node and not a Var
>>>> node. We are interested only in ConvertRowtypeExprs embedding Var
>>>> nodes with Var::varattno = 0. I have changed this code to use function
>>>> is_converted_whole_row_reference() instead of the above code with
>>>> Assert. In order to use that function, I had to take it out of
>>>> setrefs.c and put it in nodeFuncs.c which seemed to be a better fit.
>>>
>>>
>>> This change seems a bit confusing to me because the flag bits
>>> "PVC_INCLUDE_CONVERTROWTYPES" and "PVC_RECURSE_CONVERTROWTYPES" passed to
>>> pull_var_clause look as if it handles any ConvertRowtypeExpr nodes from a
>>> given clause, but really, it only handles ConvertRowtypeExprs you
>>> mentioned
>>> above.  To make that function easy to understand and use, I think it'd be
>>> better to use the IsA(node, ConvertRowtypeExpr) test as in the first
>>> version
>>> of the patch, instead of is_converted_whole_row_reference, which would be
>>> more consistent with other cases such as PlaceHolderVar.
>>
>>
>> I agree that using is_converted_whole_row_reference() is not
>> consistent with the other nodes that are handled by pull_var_clause().
>> I also agree that PVC_*_CONVERTROWTYPES doesn't reflect exactly what's
>> being done with those options. But using
>> is_converted_whole_row_reference() is the correct thing to do since we
>> are interested only in the whole-row references embedded in
>> ConvertRowtypeExpr. There can be anything encapsulated in
>> ConvertRowtypeExpr(), a non-shippable function for example. We don't
>> want to try to push that down in postgres_fdw's case. Neither in other
>> cases.
>
>
> Yeah, but I think the IsA test would be sufficient to make things work
> correctly because we can assume that there aren't any other nodes than Var,
> PHV, and CRE translating a wholerow value of a child table into a wholerow
> value of its parent table's rowtype in the expression clause to which we
> apply pull_var_clause with PVC_INCLUDE_CONVERTROWTYPES.

Not really.
Consider following case using the same tables fprt1 and fprt2 in test
sql/postgres_fdw.sql
create function row_as_is(a ftprt2_p1) returns ftprt2_p1 as $$begin
return a; end;$$ language plpgsql;

With the change you propose i.e.

diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index f972712..088da14 100644
--- a/src/backend/optimizer/util/var.c
+++ b/src/backend/optimizer/util/var.c
@@ -646,8 +646,9 @@ pull_var_clause_walker(Node *node,
pull_var_clause_context *context)
                else
                        elog(ERROR, "PlaceHolderVar found where not expected");
        }
-       else if (is_converted_whole_row_reference(node))
+       else if (IsA(node, ConvertRowtypeExpr))
        {
+               Assert(is_converted_whole_row_reference(node));
                if (context->flags & PVC_INCLUDE_CONVERTROWTYPES)
                {
                        context->varlist = lappend(context->varlist, node);


following query crashes at Assert(is_converted_whole_row_reference(node));

If we remove that assert, it would end up pushing down row_as_is(),
which is a local user defined function, to the foreign server. That's
not desirable since the foreign server may not have that user defined
function.

>
> BTW, one thing I noticed about the new version of the patch is: there is yet
> another case where pull_var_clause produces an error.  Here is an example:
>
> postgres=# create table t1 (a int, b text);
> CREATE TABLE
> postgres=# create table t2 (a int, b text);
> CREATE TABLE
> postgres=# create foreign table ft1 (a int, b text) server loopback options
> (table_name 't1');
> CREATE FOREIGN TABLE
> postgres=# create foreign table ft2 (a int, b text) server loopback options
> (table_name 't2');
> CREATE FOREIGN TABLE
> postgres=# insert into ft1 values (1, 'foo');
> INSERT 0 1
> postgres=# insert into ft1 values (2, 'bar');
> INSERT 0 1
> postgres=# insert into ft2 values (1, 'test1');
> INSERT 0 1
> postgres=# insert into ft2 values (2, 'test2');
> INSERT 0 1
> postgres=# analyze ft1;
> ANALYZE
> postgres=# analyze ft2;
> ANALYZE
> postgres=# create table parent (a int, b text);
> CREATE TABLE
> postgres=# alter foreign table ft1 inherit parent;
> ALTER FOREIGN TABLE
> postgres=# explain verbose update parent set b = ft2.b from ft2 where
> parent.a = ft2.a returning parent;
> ERROR:  ConvertRowtypeExpr found where not expected
>
> To produce this, apply the patch in [1] as well as the new version; without
> that patch in [1], the update query would crash the server (see [1] for
> details).  To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES
> to pull_var_clause in build_remote_returning in postgres_fdw.c as well.

I missed that call to PVC since it was added after I had created my
patches. That's a disadvantage of having changed PVC to use flags
instead of bools. We won't notice such changes. Thanks for pointing it
out. Changed as per your suggestion.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment
(2018/05/11 16:17), Ashutosh Bapat wrote:
> On Thu, May 10, 2018 at 6:23 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> Yeah, but I think the IsA test would be sufficient to make things work
>> correctly because we can assume that there aren't any other nodes than Var,
>> PHV, and CRE translating a wholerow value of a child table into a wholerow
>> value of its parent table's rowtype in the expression clause to which we
>> apply pull_var_clause with PVC_INCLUDE_CONVERTROWTYPES.
>
> Not really.
> Consider following case using the same tables fprt1 and fprt2 in test
> sql/postgres_fdw.sql
> create function row_as_is(a ftprt2_p1) returns ftprt2_p1 as $$begin
> return a; end;$$ language plpgsql;
>
> With the change you propose i.e.
>
> diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
> index f972712..088da14 100644
> --- a/src/backend/optimizer/util/var.c
> +++ b/src/backend/optimizer/util/var.c
> @@ -646,8 +646,9 @@ pull_var_clause_walker(Node *node,
> pull_var_clause_context *context)
>                  else
>                          elog(ERROR, "PlaceHolderVar found where not expected");
>          }
> -       else if (is_converted_whole_row_reference(node))
> +       else if (IsA(node, ConvertRowtypeExpr))
>          {
> +               Assert(is_converted_whole_row_reference(node));
>                  if (context->flags&  PVC_INCLUDE_CONVERTROWTYPES)
>                  {
>                          context->varlist = lappend(context->varlist, node);
>
>
> following query crashes at Assert(is_converted_whole_row_reference(node));

I guess you forgot to show the query.

> If we remove that assert, it would end up pushing down row_as_is(),
> which is a local user defined function, to the foreign server. That's
> not desirable since the foreign server may not have that user defined
> function.

I don't understand the counter-example you tried to show, but I think 
that I was wrong here; the IsA test isn't sufficient.

>> BTW, one thing I noticed about the new version of the patch is: there is yet
>> another case where pull_var_clause produces an error.  Here is an example:

>> To produce this, apply the patch in [1] as well as the new version; without
>> that patch in [1], the update query would crash the server (see [1] for
>> details).  To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES
>> to pull_var_clause in build_remote_returning in postgres_fdw.c as well.
>
> I missed that call to PVC since it was added after I had created my
> patches. That's a disadvantage of having changed PVC to use flags
> instead of bools. We won't notice such changes. Thanks for pointing it
> out. Changed as per your suggestion.

Thanks for that change and the updated version!

Yet yet another case where pull_var_clause produces an error:

postgres=# create table list_pt (a int, b text) partition by list (a);
CREATE TABLE
postgres=# create table list_ptp1 partition of list_pt for values in (1);
CREATE TABLE
postgres=# alter table list_ptp1 add constraint list_ptp1_check check 
(list_ptp1::list_pt::text != NULL);
ERROR:  ConvertRowtypeExpr found where not expected

The patch kept the flags passed to pull_var_clause in 
src/backend/catalog/heap.c as-is, but I think we need to modify that 
accordingly.  Probably, the flags passed to pull_var_clause in 
CreateTrigger as well.

Having said that, I started to think this approach would make code much 
complicated.  Another idea I came up with to avoid that would be to use 
pull_var_clause as-is and then rewrite wholerows of partitions back to 
ConvertRowtypeExprs translating those wholerows to wholerows of their 
parents tables' rowtypes if necessary.  That would be annoying and a 
little bit inefficient, but the places where we need to rewrite is 
limited: prepare_sort_from_pathkeys and build_tlist_to_deparse, IIUC. 
So we could really minimize the code change and avoid the additional 
overhead caused by the is_converted_whole_row_reference test added to 
pull_var_clause.  I think the latter would be rather important, because 
PWJ is disabled by default.  What do you think about that approach?

Best regards,
Etsuro Fujita


On Fri, May 11, 2018 at 6:31 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
>
> I guess you forgot to show the query.

Sorry. Here's the query.

explain verbose select * from fprt1 t1 join fprt2 t2 on t1.a = t2.b
where t1 = row_as_is(row(t2.a, t2.b, t2.c)::ftprt2_p1)::fprt2;

>
> Yet yet another case where pull_var_clause produces an error:
>
> postgres=# create table list_pt (a int, b text) partition by list (a);
> CREATE TABLE
> postgres=# create table list_ptp1 partition of list_pt for values in (1);
> CREATE TABLE
> postgres=# alter table list_ptp1 add constraint list_ptp1_check check
> (list_ptp1::list_pt::text != NULL);
> ERROR:  ConvertRowtypeExpr found where not expected
>
> The patch kept the flags passed to pull_var_clause in
> src/backend/catalog/heap.c as-is, but I think we need to modify that
> accordingly.  Probably, the flags passed to pull_var_clause in CreateTrigger
> as well.

With all the support that we have added in partitioning for v11, I
think we need to add PVC_RECURSE_CONVERTROWTYPE at all the places,
which were left unchanged in the earlier versions of patches, which
were written when all that support wasn't in v11.

>
> Having said that, I started to think this approach would make code much
> complicated.  Another idea I came up with to avoid that would be to use
> pull_var_clause as-is and then rewrite wholerows of partitions back to
> ConvertRowtypeExprs translating those wholerows to wholerows of their
> parents tables' rowtypes if necessary.  That would be annoying and a little
> bit inefficient, but the places where we need to rewrite is limited:
> prepare_sort_from_pathkeys and build_tlist_to_deparse, IIUC.

Other reason why we can't use your approach is we can not decide
whether ConvertRowtypeExpr is needed by just looking at the Var node.
You might say, that we add a ConvertRowtypeExpr if the Var::varno
references a child relation. But that's not true. For example a whole
row reference embedded in ConvertRowtypeExpr added by query
adjustments in inheritance_planner() is not a child's whole-row
reference in the adjusted query. So, it's not clear to me when we add
a ConvertRowtypeExpr and we don't. I am not sure if those are the only
two places which require a fix. Right now it looks like those are the
places which need PVC_INCLUDE_CONVERTROWTYPE, but that may not be true
in the future, esp. given the amount of work we expect to happen in
the partitioning area. When that happens, fixing all those places in
that way wouldn't be good.

> So we could
> really minimize the code change and avoid the additional overhead caused by
> the is_converted_whole_row_reference test added to pull_var_clause.  I think
> the latter would be rather important, because PWJ is disabled by default.
> What do you think about that approach?

Partition-wise join and partition-wise aggregation are disabled by
default temporarily. We will change it in near future once the memory
consumption issue has been tackled. The features are useful and users
would want those to be enabled by default like other query
optimizations. So, we can't take a decision based on that.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


On Mon, May 14, 2018 at 10:21 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Fri, May 11, 2018 at 6:31 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>>
>> I guess you forgot to show the query.
>
> Sorry. Here's the query.
>
> explain verbose select * from fprt1 t1 join fprt2 t2 on t1.a = t2.b
> where t1 = row_as_is(row(t2.a, t2.b, t2.c)::ftprt2_p1)::fprt2;
>
>>
>> Yet yet another case where pull_var_clause produces an error:
>>
>> postgres=# create table list_pt (a int, b text) partition by list (a);
>> CREATE TABLE
>> postgres=# create table list_ptp1 partition of list_pt for values in (1);
>> CREATE TABLE
>> postgres=# alter table list_ptp1 add constraint list_ptp1_check check
>> (list_ptp1::list_pt::text != NULL);
>> ERROR:  ConvertRowtypeExpr found where not expected
>>
>> The patch kept the flags passed to pull_var_clause in
>> src/backend/catalog/heap.c as-is, but I think we need to modify that
>> accordingly.  Probably, the flags passed to pull_var_clause in CreateTrigger
>> as well.
>
> With all the support that we have added in partitioning for v11, I
> think we need to add PVC_RECURSE_CONVERTROWTYPE at all the places,
> which were left unchanged in the earlier versions of patches, which
> were written when all that support wasn't in v11.
>
>>
>> Having said that, I started to think this approach would make code much
>> complicated.  Another idea I came up with to avoid that would be to use
>> pull_var_clause as-is and then rewrite wholerows of partitions back to
>> ConvertRowtypeExprs translating those wholerows to wholerows of their
>> parents tables' rowtypes if necessary.  That would be annoying and a little
>> bit inefficient, but the places where we need to rewrite is limited:
>> prepare_sort_from_pathkeys and build_tlist_to_deparse, IIUC.
>
> Other reason why we can't use your approach is we can not decide
> whether ConvertRowtypeExpr is needed by just looking at the Var node.
> You might say, that we add a ConvertRowtypeExpr if the Var::varno
> references a child relation. But that's not true. For example a whole
> row reference embedded in ConvertRowtypeExpr added by query
> adjustments in inheritance_planner() is not a child's whole-row
> reference in the adjusted query. So, it's not clear to me when we add
> a ConvertRowtypeExpr and we don't. I am not sure if those are the only
> two places which require a fix. Right now it looks like those are the
> places which need PVC_INCLUDE_CONVERTROWTYPE, but that may not be true
> in the future, esp. given the amount of work we expect to happen in
> the partitioning area. When that happens, fixing all those places in
> that way wouldn't be good.
>
>> So we could
>> really minimize the code change and avoid the additional overhead caused by
>> the is_converted_whole_row_reference test added to pull_var_clause.  I think
>> the latter would be rather important, because PWJ is disabled by default.
>> What do you think about that approach?
>
> Partition-wise join and partition-wise aggregation are disabled by
> default temporarily. We will change it in near future once the memory
> consumption issue has been tackled. The features are useful and users
> would want those to be enabled by default like other query
> optimizations. So, we can't take a decision based on that.

Here's set of updated patches.



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment
(2018/05/14 15:34), Ashutosh Bapat wrote:
> On Mon, May 14, 2018 at 10:21 AM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com>  wrote:
>> On Fri, May 11, 2018 at 6:31 PM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> Here's the query.
>>
>> explain verbose select * from fprt1 t1 join fprt2 t2 on t1.a = t2.b
>> where t1 = row_as_is(row(t2.a, t2.b, t2.c)::ftprt2_p1)::fprt2;

Thanks!

>>> Yet yet another case where pull_var_clause produces an error:
>>>
>>> postgres=# create table list_pt (a int, b text) partition by list (a);
>>> CREATE TABLE
>>> postgres=# create table list_ptp1 partition of list_pt for values in (1);
>>> CREATE TABLE
>>> postgres=# alter table list_ptp1 add constraint list_ptp1_check check
>>> (list_ptp1::list_pt::text != NULL);
>>> ERROR:  ConvertRowtypeExpr found where not expected
>>>
>>> The patch kept the flags passed to pull_var_clause in
>>> src/backend/catalog/heap.c as-is, but I think we need to modify that
>>> accordingly.  Probably, the flags passed to pull_var_clause in CreateTrigger
>>> as well.
>>
>> With all the support that we have added in partitioning for v11, I
>> think we need to add PVC_RECURSE_CONVERTROWTYPE at all the places,
>> which were left unchanged in the earlier versions of patches, which
>> were written when all that support wasn't in v11.

Actually, we'd get the same error without using anything in 
partitioning.  Here is an example:

postgres=# create table parent (a int, b text);
CREATE TABLE
postgres=# create table child (a int, b text);
CREATE TABLE
postgres=# alter table child inherit parent ;
ALTER TABLE
postgres=# alter table child add constraint child_check check 
(child::parent::text != NULL);
ERROR:  ConvertRowtypeExpr found where not expected

>>> Having said that, I started to think this approach would make code much
>>> complicated.  Another idea I came up with to avoid that would be to use
>>> pull_var_clause as-is and then rewrite wholerows of partitions back to
>>> ConvertRowtypeExprs translating those wholerows to wholerows of their
>>> parents tables' rowtypes if necessary.  That would be annoying and a little
>>> bit inefficient, but the places where we need to rewrite is limited:
>>> prepare_sort_from_pathkeys and build_tlist_to_deparse, IIUC.
>>
>> Other reason why we can't use your approach is we can not decide
>> whether ConvertRowtypeExpr is needed by just looking at the Var node.
>> You might say, that we add a ConvertRowtypeExpr if the Var::varno
>> references a child relation. But that's not true. For example a whole
>> row reference embedded in ConvertRowtypeExpr added by query
>> adjustments in inheritance_planner() is not a child's whole-row
>> reference in the adjusted query.

Sorry, I don't understand this fully.

>> So, it's not clear to me when we add
>> a ConvertRowtypeExpr and we don't.

I think it'd be OK to rewrite so at least in prepare_sort_from_pathkeys 
and build_tlist_to_deparse.

>> I am not sure if those are the only
>> two places which require a fix. Right now it looks like those are the
>> places which need PVC_INCLUDE_CONVERTROWTYPE, but that may not be true
>> in the future, esp. given the amount of work we expect to happen in
>> the partitioning area. When that happens, fixing all those places in
>> that way wouldn't be good.

I have to admit that the approach I proposed is a band-aid fix.

>>> So we could
>>> really minimize the code change and avoid the additional overhead caused by
>>> the is_converted_whole_row_reference test added to pull_var_clause.  I think
>>> the latter would be rather important, because PWJ is disabled by default.
>>> What do you think about that approach?
>>
>> Partition-wise join and partition-wise aggregation are disabled by
>> default temporarily. We will change it in near future once the memory
>> consumption issue has been tackled. The features are useful and users
>> would want those to be enabled by default like other query
>> optimizations. So, we can't take a decision based on that.

Yeah, I really agree on that point!  However, considering that 
pull_var_clause is used in many places where partitioning is *not* 
involved, I still think it's better to avoid spending extra cycles 
needed only for partitioning in that function.

> Here's set of updated patches.

Thanks for updating the patch!

Here are some other minor comments on patch 
0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch:

+   /* Construct the conversion map. */
+   parent_desc = lookup_rowtype_tupdesc(cre->resulttype, 0);

I think it'd be better to pass -1, not 0, as the typmod argument for 
that function because that would be consistent with other places where 
we use that function with named rowtypes (eg, ruleutils.c).

-is_subquery_var(Var *node, RelOptInfo *foreignrel, int *relno, int *colno)
+is_subquery_column(Node *node, Index varno, RelOptInfo *foreignrel,
+               int *relno, int *colno)

-1 for that change; because 1) we use "var" for implying many things as 
in eg, pull_var_clause and 2) I think it'd make back-patching easy to 
keep the name as-is.

Other than that the patch set looks good to me.

Best regards,
Etsuro Fujita


On Wed, May 16, 2018 at 4:01 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
>
>>>> So we could
>>>> really minimize the code change and avoid the additional overhead caused
>>>> by
>>>> the is_converted_whole_row_reference test added to pull_var_clause.  I
>>>> think
>>>> the latter would be rather important, because PWJ is disabled by
>>>> default.
>>>> What do you think about that approach?
>>>
>>>
>>> Partition-wise join and partition-wise aggregation are disabled by
>>> default temporarily. We will change it in near future once the memory
>>> consumption issue has been tackled. The features are useful and users
>>> would want those to be enabled by default like other query
>>> optimizations. So, we can't take a decision based on that.
>
>
> Yeah, I really agree on that point!  However, considering that
> pull_var_clause is used in many places where partitioning is *not* involved,
> I still think it's better to avoid spending extra cycles needed only for
> partitioning in that function.

Right. The changes done in inheritance_planner() imply that we can
encounter a ConvertRowtypeExpr even in the expressions for a relation
which is not a child relation in the translated query. It was a child
in the original query but after translating it becomes a parent and
has ConvertRowtypeExpr somewhere there. So, there is no way to know
whether a given expression will have ConvertRowtypeExpr in it. That's
my understanding. But if we can device such a test, I am happy to
apply that test before or witin the call to pull_var_clause().

We don't need that expensive test if user has passed
PVC_RECURSE_CONVERTROWTYPE. So we could test that first and then check
the shape of expression tree. It would cause more asymmetry in
pull_var_clause(), but we can live with it or change the order of
tests for all the three options. I will differ that to a committer.

> +   /* Construct the conversion map. */
> +   parent_desc = lookup_rowtype_tupdesc(cre->resulttype, 0);
>
> I think it'd be better to pass -1, not 0, as the typmod argument for that
> function because that would be consistent with other places where we use
> that function with named rowtypes (eg, ruleutils.c).

Done.

>
> -is_subquery_var(Var *node, RelOptInfo *foreignrel, int *relno, int *colno)
> +is_subquery_column(Node *node, Index varno, RelOptInfo *foreignrel,
> +               int *relno, int *colno)
>
> -1 for that change; because 1) we use "var" for implying many things as in
> eg, pull_var_clause and 2) I think it'd make back-patching easy to keep the
> name as-is.

I think pull_var_clause is the only place where we do that and I don't
like that very much. I also don't like is_subquery_var() name since
it's too specific. We might want that kind of functionality to ship
generic expressions and not just Vars in future. Usually we would be
searching for columns in the subquery targetlist so the name change
looks good to me. IIRC, the function was added one release back, so
backpatching pain, if any, won't be much. But I don't think we should
carry a misnomer for many releases to come. Let's differ this to a
committer.

Here's set of updated patches.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment
(2018/05/16 22:49), Ashutosh Bapat wrote:
> On Wed, May 16, 2018 at 4:01 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> However, considering that
>> pull_var_clause is used in many places where partitioning is *not* involved,
>> I still think it's better to avoid spending extra cycles needed only for
>> partitioning in that function.
>
> Right. The changes done in inheritance_planner() imply that we can
> encounter a ConvertRowtypeExpr even in the expressions for a relation
> which is not a child relation in the translated query. It was a child
> in the original query but after translating it becomes a parent and
> has ConvertRowtypeExpr somewhere there.

Sorry, I don't understand this.  Could you show me such an example?

> So, there is no way to know
> whether a given expression will have ConvertRowtypeExpr in it. That's
> my understanding. But if we can device such a test, I am happy to
> apply that test before or witin the call to pull_var_clause().
>
> We don't need that expensive test if user has passed
> PVC_RECURSE_CONVERTROWTYPE. So we could test that first and then check
> the shape of expression tree. It would cause more asymmetry in
> pull_var_clause(), but we can live with it or change the order of
> tests for all the three options. I will differ that to a committer.

Sounds more invasive.  Considering where we are in the development cycle 
for PG11, I think it'd be better to address this by something like the 
approach I proposed.  Anyway, +1 for asking the committer's opinion.

>> +   /* Construct the conversion map. */
>> +   parent_desc = lookup_rowtype_tupdesc(cre->resulttype, 0);
>>
>> I think it'd be better to pass -1, not 0, as the typmod argument for that
>> function because that would be consistent with other places where we use
>> that function with named rowtypes (eg, ruleutils.c).
>
> Done.

Thanks!

>> -is_subquery_var(Var *node, RelOptInfo *foreignrel, int *relno, int *colno)
>> +is_subquery_column(Node *node, Index varno, RelOptInfo *foreignrel,
>> +               int *relno, int *colno)
>>
>> -1 for that change; because 1) we use "var" for implying many things as in
>> eg, pull_var_clause and 2) I think it'd make back-patching easy to keep the
>> name as-is.
>
> I think pull_var_clause is the only place where we do that and I don't
> like that very much. I also don't like is_subquery_var() name since
> it's too specific. We might want that kind of functionality to ship
> generic expressions and not just Vars in future. Usually we would be
> searching for columns in the subquery targetlist so the name change
> looks good to me. IIRC, the function was added one release back, so
> backpatching pain, if any, won't be much. But I don't think we should
> carry a misnomer for many releases to come. Let's differ this to a
> committer.

OK

> Here's set of updated patches.

Thanks for updating the patch set!  I noticed followings.  Sorry for not 
having noticed that before.

- On patch 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch:

+extern bool
+is_converted_whole_row_reference(Node *node)

I think we should remove "extern" from the definition.

- On patch 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch:

+   /* Construct ROW expression according to the conversion map. */
+   appendStringInfoString(buf, "ROW(");
+   for (cnt = 0; cnt < parent_desc->natts; cnt++)
+   {
+       /* Ignore dropped columns. */
+       if (attrMap[cnt] == 0)
+           continue;
+
+       if (cnt > 0)
+           appendStringInfoString(buf, ", ");
+       deparseColumnRef(buf, child_var->varno, attrMap[cnt],
+                        planner_rt_fetch(child_var->varno, root),
+                        qualify_col);
+   }
+   appendStringInfoChar(buf, ')');

The result would start with ", " in the case where the cnt=0 column of 
the parent relation is a dropped one.  Here is an example:

postgres=# create table pt1 (a int, b int) partition by range (b);
CREATE TABLE
postgres=# alter table pt1 drop column a;
ALTER TABLE
postgres=# alter table pt1 add column a int;
ALTER TABLE
postgres=# create table loct11 (like pt1);
CREATE TABLE
postgres=# create foreign table pt1p1 partition of pt1 for values from 
(0) to (100) server loopback options (table_name 'loct11', 
use_remote_estimate 'true');
CREATE FOREIGN TABLE
postgres=# insert into pt1 select i, i from generate_series(0, 99, 2) i;
INSERT 0 50
postgres=# analyze pt1;
ANALYZE
postgres=# analyze pt1p1;
ANALYZE
postgres=# create table pt2 (a int, b int) partition by range (b);
CREATE TABLE
postgres=# create table loct21 (like pt2);
CREATE TABLE
postgres=# create foreign table pt2p1 partition of pt2 for values from 
(0) to (100) server loopback options (table_name 'loct21', 
use_remote_estimate 'true');
CREATE FOREIGN TABLE
postgres=# insert into pt2 select i, i from generate_series(0, 99, 3) i;
INSERT 0 34
postgres=# analyze pt2;
ANALYZE
postgres=# analyze pt2p1;
ANALYZE
postgres=# set enable_partitionwise_join to true;
SET
postgres=# explain verbose select pt1.* from pt1, pt2 where pt1.b = 
pt2.b for update of pt1;
ERROR:  syntax error at or near ","
CONTEXT:  remote SQL command: EXPLAIN SELECT r4.b, r4.a, r4.ctid, CASE 
WHEN (r4.*)::text IS NOT NULL THEN ROW(, r4.b, r4.a) END, CASE WHEN 
(r4.*)::text IS NOT NULL THEN 57467 END, r6.ctid, CASE WHEN (r6.*)::text 
IS NOT NULL THEN ROW(r6.a, r6.b) END, CASE WHEN (r6.*)::text IS NOT NULL 
THEN 57476 END FROM (public.loct11 r4 INNER JOIN public.loct21 r6 ON 
(((r4.b = r6.b)))) FOR UPDATE OF r4

I think we can fix this by adding another flag to indicate whether we 
deparsed the first live column of the relation, as in the "first" bool 
flag in deparseTargetList.

One more thing to add is: the patch adds support for deparsing a 
ConvertRowtypeExpr that translates a wholerow of a childrel into a 
wholerow of its parentrel's rowtype, so by modifying is_foreign_expr 
accordingly, we could push down such CREs in JOIN ON conditions to the 
remote for example.  But that would be an improvement, not a fix, so I 
think we should leave that for another patch for PG12.

Other than that, the patch set looks good to me.

Best regards,
Etsuro Fujita


On Thu, May 17, 2018 at 4:50 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> (2018/05/16 22:49), Ashutosh Bapat wrote:
>>
>> On Wed, May 16, 2018 at 4:01 PM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>>>
>>> However, considering that
>>> pull_var_clause is used in many places where partitioning is *not*
>>> involved,
>>> I still think it's better to avoid spending extra cycles needed only for
>>> partitioning in that function.
>>
>>
>> Right. The changes done in inheritance_planner() imply that we can
>> encounter a ConvertRowtypeExpr even in the expressions for a relation
>> which is not a child relation in the translated query. It was a child
>> in the original query but after translating it becomes a parent and
>> has ConvertRowtypeExpr somewhere there.
>
>
> Sorry, I don't understand this.  Could you show me such an example?

Even without inheritance we can induce a ConvertRowtypeExpr on a base
relation which is not "other" relation

create table parent (a int, b int);
create table child () inherits(parent);
alter table child add constraint whole_par_const check ((child::parent).a = 1);

There is no way to tell by looking at the parsed query whether
pull_var_clause() from StoreRelCheck() will encounter a
ConvertRowtypeExpr or not. We could check whether the tables involved
are inherited or not, but that's more expensive than
is_converted_whole_row_reference().

>
>> So, there is no way to know
>> whether a given expression will have ConvertRowtypeExpr in it. That's
>> my understanding. But if we can device such a test, I am happy to
>> apply that test before or witin the call to pull_var_clause().
>>
>> We don't need that expensive test if user has passed
>> PVC_RECURSE_CONVERTROWTYPE. So we could test that first and then check
>> the shape of expression tree. It would cause more asymmetry in
>> pull_var_clause(), but we can live with it or change the order of
>> tests for all the three options. I will differ that to a committer.
>
>
> Sounds more invasive.  Considering where we are in the development cycle for
> PG11, I think it'd be better to address this by something like the approach
> I proposed.  Anyway, +1 for asking the committer's opinion.

Thanks.

>
> - On patch 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch:
>
> +extern bool
> +is_converted_whole_row_reference(Node *node)
>
> I think we should remove "extern" from the definition.

Done.

>
> - On patch 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch:
>
>
> I think we can fix this by adding another flag to indicate whether we
> deparsed the first live column of the relation, as in the "first" bool flag
> in deparseTargetList.

Thanks. Fixed.

>
> One more thing to add is: the patch adds support for deparsing a
> ConvertRowtypeExpr that translates a wholerow of a childrel into a wholerow
> of its parentrel's rowtype, so by modifying is_foreign_expr accordingly, we
> could push down such CREs in JOIN ON conditions to the remote for example.
> But that would be an improvement, not a fix, so I think we should leave that
> for another patch for PG12.

Right.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment
(2018/05/17 23:19), Ashutosh Bapat wrote:
> On Thu, May 17, 2018 at 4:50 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> (2018/05/16 22:49), Ashutosh Bapat wrote:
>>> On Wed, May 16, 2018 at 4:01 PM, Etsuro Fujita
>>> <fujita.etsuro@lab.ntt.co.jp>   wrote:
>>>> However, considering that
>>>> pull_var_clause is used in many places where partitioning is *not*
>>>> involved,
>>>> I still think it's better to avoid spending extra cycles needed only for
>>>> partitioning in that function.
>>>
>>>
>>> Right. The changes done in inheritance_planner() imply that we can
>>> encounter a ConvertRowtypeExpr even in the expressions for a relation
>>> which is not a child relation in the translated query. It was a child
>>> in the original query but after translating it becomes a parent and
>>> has ConvertRowtypeExpr somewhere there.
>>
>>
>> Sorry, I don't understand this.  Could you show me such an example?
>
> Even without inheritance we can induce a ConvertRowtypeExpr on a base
> relation which is not "other" relation
>
> create table parent (a int, b int);
> create table child () inherits(parent);
> alter table child add constraint whole_par_const check ((child::parent).a = 1);
>
> There is no way to tell by looking at the parsed query whether
> pull_var_clause() from StoreRelCheck() will encounter a
> ConvertRowtypeExpr or not. We could check whether the tables involved
> are inherited or not, but that's more expensive than
> is_converted_whole_row_reference().

Yeah, ISTM that the inheritance test makes things worse.

>>> So, there is no way to know
>>> whether a given expression will have ConvertRowtypeExpr in it. That's
>>> my understanding. But if we can device such a test, I am happy to
>>> apply that test before or witin the call to pull_var_clause().
>>>
>>> We don't need that expensive test if user has passed
>>> PVC_RECURSE_CONVERTROWTYPE. So we could test that first and then check
>>> the shape of expression tree. It would cause more asymmetry in
>>> pull_var_clause(), but we can live with it or change the order of
>>> tests for all the three options. I will differ that to a committer.
>>
>>
>> Sounds more invasive.  Considering where we are in the development cycle for
>> PG11, I think it'd be better to address this by something like the approach
>> I proposed.  Anyway, +1 for asking the committer's opinion.
>
> Thanks.

Let's ask that.

>> - On patch 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch:
>>
>> +extern bool
>> +is_converted_whole_row_reference(Node *node)
>>
>> I think we should remove "extern" from the definition.
>
> Done.

>> - On patch 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch:
>>
>>
>> I think we can fix this by adding another flag to indicate whether we
>> deparsed the first live column of the relation, as in the "first" bool flag
>> in deparseTargetList.
>
> Thanks. Fixed.

Thanks for updating the patch set!  Other than pull_var_clause things, 
the updated version looks good to me, so I'll mark this as Ready for 
Committer.  As I said before, I think this issue should be addressed in 
advance of the PG11 release.

Best regards,
Etsuro Fujita


(2018/05/18 16:33), Etsuro Fujita wrote:
> Other than pull_var_clause things,
> the updated version looks good to me, so I'll mark this as Ready for
> Committer.

Since I'm not 100% sure that that is the right way to go, I've been 
rethinking how to fix this issue.  Yet another idea I came up with to 
fix this is to redesign the handling of the tlists for children in the 
partitioning case.  Currently, we build the reltarget for a child by 
applying adjust_appendrel_attrs to the reltarget for its parent in 
set_append_rel_size, which maps a wholerow Var referring to the parent 
rel to a ConvertRowtypeExpr that translates a wholerow of the child rel 
into a wholerow of the parent rel's rowtype.  This works well for the 
non-partitioned inheritance case, but makes complicated the code for 
handling the partitioning case especially when planning 
partitionwise-joins.  And I think this would be the root cause of this 
issue.  I don't think the tlists for the children need to have their 
wholerows transformed into the corresponding ConvertRowtypeExprs *at 
this point*, so what I'd like to propose is to 1) map a parent wholerow 
Var simply to a child wholerow Var, instead (otherwise, the same as 
adjust_appendrel_attrs), when building the reltarget for a child in 
set_append_rel_size, 2) build the tlists for child joins using those 
children's wholerow Vars at path creation time, and 3) adjust those 
wholerow Vars in the tlists for subpaths in the chosen AppendPath so 
that those are transformed into the corresponding ConvertRowtypeExprs, 
at plan creation time (ie, in 
create_append_plan/create_merge_append_plan).  IIUC, this would not 
require any changes to pull_var_clause as proposed by the patch.  This 
would not require any changes to postgres_fdw as proposed by the patch, 
either.  In addition, this would make unnecessary the code added to 
setrefs.c by the partitionwise-join patch.  Maybe I'm missing something 
though.

Best regards,
Etsuro Fujita


On Wed, Jun 6, 2018 at 5:00 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> (2018/05/18 16:33), Etsuro Fujita wrote:
>>
>> Other than pull_var_clause things,
>> the updated version looks good to me, so I'll mark this as Ready for
>> Committer.
>
>
> Since I'm not 100% sure that that is the right way to go, I've been
> rethinking how to fix this issue.  Yet another idea I came up with to fix
> this is to redesign the handling of the tlists for children in the
> partitioning case.  Currently, we build the reltarget for a child by
> applying adjust_appendrel_attrs to the reltarget for its parent in
> set_append_rel_size, which maps a wholerow Var referring to the parent rel
> to a ConvertRowtypeExpr that translates a wholerow of the child rel into a
> wholerow of the parent rel's rowtype.  This works well for the
> non-partitioned inheritance case, but makes complicated the code for
> handling the partitioning case especially when planning partitionwise-joins.
> And I think this would be the root cause of this issue.

Although true, this is not only about targetlist. Even the whole-row
expressions in the conditions, equivalence classes and other
planner/optimizer data structures are translated to
ConvertRowtypeExpr.

>  I don't think the
> tlists for the children need to have their wholerows transformed into the
> corresponding ConvertRowtypeExprs *at this point*, so what I'd like to
> propose is to 1) map a parent wholerow Var simply to a child wholerow Var,
> instead (otherwise, the same as adjust_appendrel_attrs), when building the
> reltarget for a child in set_append_rel_size, 2) build the tlists for child
> joins using those children's wholerow Vars at path creation time, and 3)
> adjust those wholerow Vars in the tlists for subpaths in the chosen
> AppendPath so that those are transformed into the corresponding
> ConvertRowtypeExprs, at plan creation time (ie, in
> create_append_plan/create_merge_append_plan).  IIUC, this would not require
> any changes to pull_var_clause as proposed by the patch.  This would not
> require any changes to postgres_fdw as proposed by the patch, either.  In
> addition, this would make unnecessary the code added to setrefs.c by the
> partitionwise-join patch.  Maybe I'm missing something though.

Not translating whole-row expressions to ConvertRowtypeExpr before
creating paths can lead to a number of anomalies. For example,

1. if there's an index on the whole-row expression of child,
translating parent's whole-row expression to child's whole-row
expression as is will lead to using that index, when in reality the it
can not be used since the condition/ORDER BY clause (which originally
referred the parent's whole-row expression) require the child's
whole-row reassembled as parent's whole-row.
2. Constraints on child'd whole-row expression, will be used to prune
a child when they can not be used since the original condition was on
parent' whole-row expression and not that of the child.
3. Equivalence classes will think that a child whole-row expression
(without ConvertRowtypeExpr) is equivalent to an expression which is
part of the same equivalence class as the parent' whole-row
expression.

Given these serious problems, I don't think we could afford not to
cover a child whole-row reference in ConvertRowtypeExpr.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


(2018/06/07 19:42), Ashutosh Bapat wrote:
> On Wed, Jun 6, 2018 at 5:00 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> Since I'm not 100% sure that that is the right way to go, I've been
>> rethinking how to fix this issue.  Yet another idea I came up with to fix
>> this is to redesign the handling of the tlists for children in the
>> partitioning case.  Currently, we build the reltarget for a child by
>> applying adjust_appendrel_attrs to the reltarget for its parent in
>> set_append_rel_size, which maps a wholerow Var referring to the parent rel
>> to a ConvertRowtypeExpr that translates a wholerow of the child rel into a
>> wholerow of the parent rel's rowtype.  This works well for the
>> non-partitioned inheritance case, but makes complicated the code for
>> handling the partitioning case especially when planning partitionwise-joins.
>> And I think this would be the root cause of this issue.
>
> Although true, this is not only about targetlist. Even the whole-row
> expressions in the conditions, equivalence classes and other
> planner/optimizer data structures are translated to
> ConvertRowtypeExpr.

Yeah, but I mean we modify set_append_rel_size so that we only map a 
parent wholerow Var in the parent tlist to a child wholerow Var in the 
child's tlist; parent wholerow Vars in the parent's other expressions 
such as conditions are transformed into CREs as-is.

>>   I don't think the
>> tlists for the children need to have their wholerows transformed into the
>> corresponding ConvertRowtypeExprs *at this point*, so what I'd like to
>> propose is to 1) map a parent wholerow Var simply to a child wholerow Var,
>> instead (otherwise, the same as adjust_appendrel_attrs), when building the
>> reltarget for a child in set_append_rel_size, 2) build the tlists for child
>> joins using those children's wholerow Vars at path creation time, and 3)
>> adjust those wholerow Vars in the tlists for subpaths in the chosen
>> AppendPath so that those are transformed into the corresponding
>> ConvertRowtypeExprs, at plan creation time (ie, in
>> create_append_plan/create_merge_append_plan).  IIUC, this would not require
>> any changes to pull_var_clause as proposed by the patch.  This would not
>> require any changes to postgres_fdw as proposed by the patch, either.  In
>> addition, this would make unnecessary the code added to setrefs.c by the
>> partitionwise-join patch.  Maybe I'm missing something though.
>
> Not translating whole-row expressions to ConvertRowtypeExpr before
> creating paths can lead to a number of anomalies. For example,
>
> 1. if there's an index on the whole-row expression of child,
> translating parent's whole-row expression to child's whole-row
> expression as is will lead to using that index, when in reality the it
> can not be used since the condition/ORDER BY clause (which originally
> referred the parent's whole-row expression) require the child's
> whole-row reassembled as parent's whole-row.
> 2. Constraints on child'd whole-row expression, will be used to prune
> a child when they can not be used since the original condition was on
> parent' whole-row expression and not that of the child.
> 3. Equivalence classes will think that a child whole-row expression
> (without ConvertRowtypeExpr) is equivalent to an expression which is
> part of the same equivalence class as the parent' whole-row
> expression.

Is that still possible when we only map a parent wholerow Var in the 
parent's tlist to a child wholerow Var in the child's tlist?

Best regards,
Etsuro Fujita


On Mon, Jun 11, 2018 at 4:05 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> (2018/06/07 19:42), Ashutosh Bapat wrote:
>>
>> On Wed, Jun 6, 2018 at 5:00 PM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>>>
>>> Since I'm not 100% sure that that is the right way to go, I've been
>>> rethinking how to fix this issue.  Yet another idea I came up with to fix
>>> this is to redesign the handling of the tlists for children in the
>>> partitioning case.  Currently, we build the reltarget for a child by
>>> applying adjust_appendrel_attrs to the reltarget for its parent in
>>> set_append_rel_size, which maps a wholerow Var referring to the parent
>>> rel
>>> to a ConvertRowtypeExpr that translates a wholerow of the child rel into
>>> a
>>> wholerow of the parent rel's rowtype.  This works well for the
>>> non-partitioned inheritance case, but makes complicated the code for
>>> handling the partitioning case especially when planning
>>> partitionwise-joins.
>>> And I think this would be the root cause of this issue.
>>
>>
>> Although true, this is not only about targetlist. Even the whole-row
>> expressions in the conditions, equivalence classes and other
>> planner/optimizer data structures are translated to
>> ConvertRowtypeExpr.
>
>
> Yeah, but I mean we modify set_append_rel_size so that we only map a parent
> wholerow Var in the parent tlist to a child wholerow Var in the child's
> tlist; parent wholerow Vars in the parent's other expressions such as
> conditions are transformed into CREs as-is.

What happens to a PlaceHolderVar containing whole-row reference when
that appears in a condition and/or targetlist.

>
>
>>>   I don't think the
>>> tlists for the children need to have their wholerows transformed into the
>>> corresponding ConvertRowtypeExprs *at this point*, so what I'd like to
>>> propose is to 1) map a parent wholerow Var simply to a child wholerow
>>> Var,
>>> instead (otherwise, the same as adjust_appendrel_attrs), when building
>>> the
>>> reltarget for a child in set_append_rel_size, 2) build the tlists for
>>> child
>>> joins using those children's wholerow Vars at path creation time, and 3)
>>> adjust those wholerow Vars in the tlists for subpaths in the chosen
>>> AppendPath so that those are transformed into the corresponding
>>> ConvertRowtypeExprs, at plan creation time (ie, in
>>> create_append_plan/create_merge_append_plan).  IIUC, this would not
>>> require
>>> any changes to pull_var_clause as proposed by the patch.  This would not
>>> require any changes to postgres_fdw as proposed by the patch, either.  In
>>> addition, this would make unnecessary the code added to setrefs.c by the
>>> partitionwise-join patch.  Maybe I'm missing something though.
>>
>>
>> Not translating whole-row expressions to ConvertRowtypeExpr before
>> creating paths can lead to a number of anomalies. For example,
>>
>> 1. if there's an index on the whole-row expression of child,
>> translating parent's whole-row expression to child's whole-row
>> expression as is will lead to using that index, when in reality the it
>> can not be used since the condition/ORDER BY clause (which originally
>> referred the parent's whole-row expression) require the child's
>> whole-row reassembled as parent's whole-row.
>> 2. Constraints on child'd whole-row expression, will be used to prune
>> a child when they can not be used since the original condition was on
>> parent' whole-row expression and not that of the child.
>> 3. Equivalence classes will think that a child whole-row expression
>> (without ConvertRowtypeExpr) is equivalent to an expression which is
>> part of the same equivalence class as the parent' whole-row
>> expression.
>
>
> Is that still possible when we only map a parent wholerow Var in the
> parent's tlist to a child wholerow Var in the child's tlist?

Yes. 1 will affect the choice of index-only scans. We use pathkey
comparisons at a number of places so tlist going out of sync with
equivalence classes can affect a number of places.

build_tlist_to_deparse() is used to deparse targetlist at the time of
creating paths,as well as during the planning. According to your
proposal we will build different tlists during path creation and plan
creation. That doesn't look good. Apart from that your proposal of
changing a child's targetlist at the time of plan creation to use CRE
doesn't help since the original problem as described in [1] happens at
the time of creating plans as described in [1].

The parent's whole-row reference has a different data type than
child's whole-row reference. If we do not cover child's whole-row
reference by CRE, the parent and child targetlists will go out of sync
as far as the type of individual columns are concerned. That too
doesn't look good to me.

[1] https://www.postgresql.org/message-id/CAFjFpRfD5=LsOqG1eUf-vMpm9KoEZBBGJo68D9wkUBJxop_emQ@mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


(2018/06/14 22:37), Ashutosh Bapat wrote:
> On Mon, Jun 11, 2018 at 4:05 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> (2018/06/07 19:42), Ashutosh Bapat wrote:
>>> On Wed, Jun 6, 2018 at 5:00 PM, Etsuro Fujita
>>> <fujita.etsuro@lab.ntt.co.jp>   wrote:
>> Yeah, but I mean we modify set_append_rel_size so that we only map a parent
>> wholerow Var in the parent tlist to a child wholerow Var in the child's
>> tlist; parent wholerow Vars in the parent's other expressions such as
>> conditions are transformed into CREs as-is.
>
> What happens to a PlaceHolderVar containing whole-row reference when
> that appears in a condition and/or targetlist.

Whole-row Vars contained in such PHV's expressions will be mapped into 
ConvertRowtypeExprs with adjust_appendrel_attrs.

>>>>    I don't think the
>>>> tlists for the children need to have their wholerows transformed into the
>>>> corresponding ConvertRowtypeExprs *at this point*, so what I'd like to
>>>> propose is to 1) map a parent wholerow Var simply to a child wholerow
>>>> Var,
>>>> instead (otherwise, the same as adjust_appendrel_attrs), when building
>>>> the
>>>> reltarget for a child in set_append_rel_size, 2) build the tlists for
>>>> child
>>>> joins using those children's wholerow Vars at path creation time, and 3)
>>>> adjust those wholerow Vars in the tlists for subpaths in the chosen
>>>> AppendPath so that those are transformed into the corresponding
>>>> ConvertRowtypeExprs, at plan creation time (ie, in
>>>> create_append_plan/create_merge_append_plan).  IIUC, this would not
>>>> require
>>>> any changes to pull_var_clause as proposed by the patch.  This would not
>>>> require any changes to postgres_fdw as proposed by the patch, either.  In
>>>> addition, this would make unnecessary the code added to setrefs.c by the
>>>> partitionwise-join patch.  Maybe I'm missing something though.
>>>
>>>
>>> Not translating whole-row expressions to ConvertRowtypeExpr before
>>> creating paths can lead to a number of anomalies. For example,
>>>
>>> 1. if there's an index on the whole-row expression of child,
>>> translating parent's whole-row expression to child's whole-row
>>> expression as is will lead to using that index, when in reality the it
>>> can not be used since the condition/ORDER BY clause (which originally
>>> referred the parent's whole-row expression) require the child's
>>> whole-row reassembled as parent's whole-row.
>>> 2. Constraints on child'd whole-row expression, will be used to prune
>>> a child when they can not be used since the original condition was on
>>> parent' whole-row expression and not that of the child.
>>> 3. Equivalence classes will think that a child whole-row expression
>>> (without ConvertRowtypeExpr) is equivalent to an expression which is
>>> part of the same equivalence class as the parent' whole-row
>>> expression.
>>
>>
>> Is that still possible when we only map a parent wholerow Var in the
>> parent's tlist to a child wholerow Var in the child's tlist?
>
> Yes. 1 will affect the choice of index-only scans. We use pathkey
> comparisons at a number of places so tlist going out of sync with
> equivalence classes can affect a number of places.

Sorry, I'm still not sure if #1 is really a problem.  Could you show me 
an example for that?

> build_tlist_to_deparse() is used to deparse targetlist at the time of
> creating paths,as well as during the planning. According to your
> proposal we will build different tlists during path creation and plan
> creation. That doesn't look good.

Maybe my explanation was not enough, but my proposal will guarantee that 
the FDW can build the same tlist at path creation time and plan creation 
time because the RelOptInfo's reltarget from which the tlist is created 
doesn't change at all.

> Apart from that your proposal of
> changing a child's targetlist at the time of plan creation to use CRE
> doesn't help since the original problem as described in [1] happens at
> the time of creating plans as described in [1].

I think my proposal will address the original issue.  Actually, I've 
created a patch implementing that proposal.  That's still WIP, but it 
works well even for these cases (and makes code much more simple, as 
mentioned above!)  But I think that patch needs more work, so I'm 
planning to post it next week.

Best regards,
Etsuro Fujita


(2018/06/15 20:56), Etsuro Fujita wrote:
> Actually, I've
> created a patch implementing that proposal.

> But I think that patch needs more work, so I'm
> planning to post it next week.

Here is a patch for that.

* As I said upthread, the patch makes code much more simple; I removed 
all the changes to setrefs.c added by the partitionwise-join patch.  I 
also simplified the logic for building a tlist for a child-join rel. 
The original PWJ computes attr_needed data even for child rels, and 
build the tlist for a child-join by passing to build_joinrel_tlist that 
data for input child rels for the child-join.  But I think that's 
redundant, and it's more straightforward to apply adjust_appendrel_attrs 
to the parent-join's tlist to get the child-join's tlist.  So, I changed 
that way, which made unnecessary all the changes to build_joinrel_tlist 
and placeholder.c added by the PWJ patch, so I removed those as well.

* The patch contains all of the regression tests in the original patch 
proposed by Ashutosh.

Best regards,
Etsuro Fujita

Attachment
On Tue, Jun 19, 2018 at 6:16 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
>
> * As I said upthread, the patch makes code much more simple; I removed all
> the changes to setrefs.c added by the partitionwise-join patch.  I also
> simplified the logic for building a tlist for a child-join rel. The original
> PWJ computes attr_needed data even for child rels, and build the tlist for a
> child-join by passing to build_joinrel_tlist that data for input child rels
> for the child-join.  But I think that's redundant, and it's more
> straightforward to apply adjust_appendrel_attrs to the parent-join's tlist
> to get the child-join's tlist.  So, I changed that way, which made
> unnecessary all the changes to build_joinrel_tlist and placeholder.c added
> by the PWJ patch, so I removed those as well.
>
> * The patch contains all of the regression tests in the original patch
> proposed by Ashutosh.

I have started reviewing the patch.
+       if (enable_partitionwise_join && rel->top_parent_is_partitioned)
+       {
+           build_childrel_tlist(root, rel, childrel, 1, &appinfo);
+       }

Why do we need rel->top_parent_is_partitioned? If a relation is
partitioned (if (rel->part_scheme), it's either the top parent or is
partition of some other partitioned table. In either case this
condition will be true.

+   /* No work if the child plan is an Append or MergeAppend */
+   if (IsA(subplan, Append) || IsA(subplan, MergeAppend))
+       return;

Why? Probably it's related to the answer to the first question, But I
don't see the connection. Given that partition-wise join will be
applicable level by level, we need to recurse in
adjust_subplan_tlist().

+   /* The child plan should be able to do projection */
+   Assert(is_projection_capable_plan(subplan));
+
Again why? A MergeAppend's subplan could be a Sort plan, which will
not be projection capable.

This is not a full review. I will continue reviewing it further.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


On Tue, Jun 19, 2018 at 8:46 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> Here is a patch for that.
>
> * As I said upthread, the patch makes code much more simple; I removed all
> the changes to setrefs.c added by the partitionwise-join patch.  I also
> simplified the logic for building a tlist for a child-join rel. The original
> PWJ computes attr_needed data even for child rels, and build the tlist for a
> child-join by passing to build_joinrel_tlist that data for input child rels
> for the child-join.  But I think that's redundant, and it's more
> straightforward to apply adjust_appendrel_attrs to the parent-join's tlist
> to get the child-join's tlist.  So, I changed that way, which made
> unnecessary all the changes to build_joinrel_tlist and placeholder.c added
> by the PWJ patch, so I removed those as well.
>
> * The patch contains all of the regression tests in the original patch
> proposed by Ashutosh.

I think this approach is going to run into trouble if the level at
which we have to apply the ConvertRowTypeExpr happens not to be a
projection-capable node.  And, in general, it seems to me that we want
to produce the right outputs at the lowest possible level of the plan
tree.  For instance, suppose that one of the relations being scanned
is not parallel-safe but the others are.  Then, we could end up with a
plan like this:

Append
-> Seq Scan on temp_rela
-> Gather
  -> Parallel Seq Scan on thing1
-> Gather
  -> Parallel Seq Scan on thing2

If a projection is required to convert the row type expression, we
certainly want that to get pushed below the Gather, not to happen at
the Gather level itself.  We certainly don't want it to happen at the
Append level, which can't even handle it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


(2018/06/22 22:54), Ashutosh Bapat wrote:
> I have started reviewing the patch.

Thanks for the review!

> +       if (enable_partitionwise_join&&  rel->top_parent_is_partitioned)
> +       {
> +           build_childrel_tlist(root, rel, childrel, 1,&appinfo);
> +       }
>
> Why do we need rel->top_parent_is_partitioned? If a relation is
> partitioned (if (rel->part_scheme), it's either the top parent or is
> partition of some other partitioned table. In either case this
> condition will be true.

This would be needed to avoid unnecessarily applying 
build_childrel_tlist to child rels of a partitioned table for which we 
don't consider partitionwise join.  Consider:

postgres=# create table lpt (c1 int, c2 text) partition by list (c1);
CREATE TABLE
postgres=# create table lpt_p1 partition of lpt for values in (1);
CREATE TABLE
postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text);
CREATE TABLE
postgres=# create table test (c1 int, c2 text);
CREATE TABLE
postgres=# explain verbose select * from (select * from lpt union all 
select * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1);
                                      QUERY PLAN

--------------------------------------------------------------------------------
----
  Merge Join  (cost=289.92..538.20 rows=16129 width=72)
    Output: lpt_p1.c1, lpt_p1.c2, test.c1, test.c2
    Merge Cond: (test.c1 = lpt_p1.c1)
    ->  Sort  (cost=88.17..91.35 rows=1270 width=36)
          Output: test.c1, test.c2
          Sort Key: test.c1
          ->  Seq Scan on public.test  (cost=0.00..22.70 rows=1270 width=36)
                Output: test.c1, test.c2
    ->  Sort  (cost=201.74..208.09 rows=2540 width=36)
          Output: lpt_p1.c1, lpt_p1.c2
          Sort Key: lpt_p1.c1
          ->  Append  (cost=0.00..58.10 rows=2540 width=36)
                ->  Seq Scan on public.lpt_p1  (cost=0.00..22.70 
rows=1270 width=
36)
                      Output: lpt_p1.c1, lpt_p1.c2
                ->  Seq Scan on public.lpt_p2  (cost=0.00..22.70 
rows=1270 width=
36)
                      Output: lpt_p2.c1, lpt_p2.c2
(16 rows)

In this example, the top parent is not a partitioned table and we don't 
need to consider partitionwise join for that, so we don't need to apply 
that to the child rel lpt_p1 of the partitioned table lpt (and the table 
lpt_p2).

> +   /* No work if the child plan is an Append or MergeAppend */
> +   if (IsA(subplan, Append) || IsA(subplan, MergeAppend))
> +       return;
>
> Why? Probably it's related to the answer to the first question, But I
> don't see the connection. Given that partition-wise join will be
> applicable level by level, we need to recurse in
> adjust_subplan_tlist().

I don't think so; I think if the child plan is an Append or MergeAppend, 
the tlist for each subplan of the Append or MergeAppend would have 
already been adjusted while create_plan_recurse before we are called here.

> +   /* The child plan should be able to do projection */
> +   Assert(is_projection_capable_plan(subplan));
> +
> Again why? A MergeAppend's subplan could be a Sort plan, which will
> not be projection capable.

IIUC, since we are called here right after create_plan_recurse, the 
child plan would be a scan or join unless it's neither Append nor 
MergeAppend.  So it should be projection-capable.  Maybe I'm missing 
something, though.

> This is not a full review. I will continue reviewing it further.

Thanks again.

Best regards,
Etsuro Fujita


(2018/06/22 23:58), Robert Haas wrote:
> I think this approach is going to run into trouble if the level at
> which we have to apply the ConvertRowTypeExpr happens not to be a
> projection-capable node.

Actually, the level we have to do that would be a child rel of a 
partitioned table or a child join of a partitionwise join, so the plan 
node would be a scan or join plan unless the child rel or child join is 
itself partitioned, in which case the plan node would be 
Append/MergeAppend and the proposed patch recursively would apply that 
conversion to child plans for the Append/MergeAppend).

> And, in general, it seems to me that we want
> to produce the right outputs at the lowest possible level of the plan
> tree.  For instance, suppose that one of the relations being scanned
> is not parallel-safe but the others are.  Then, we could end up with a
> plan like this:
>
> Append
> ->  Seq Scan on temp_rela
> ->  Gather
>    ->  Parallel Seq Scan on thing1
> ->  Gather
>    ->  Parallel Seq Scan on thing2
>
> If a projection is required to convert the row type expression, we
> certainly want that to get pushed below the Gather, not to happen at
> the Gather level itself.

IIUC, we currently don't consider such a plan for partition-wise join; 
we'll only consider gathering partial paths for the parent appendrel. 
So, I'm not sure we need to take into account that when applying the 
ConvertRowtypeExpr.  Maybe I'm missing something, though.

> We certainly don't want it to happen at the
> Append level, which can't even handle it.

I think so too.

Best regards,
Etsuro Fujita


Hi Ashutosh, Etsuro, Robert,

On 2018-06-22 10:58:28 -0400, Robert Haas wrote:
> On Tue, Jun 19, 2018 at 8:46 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
> > Here is a patch for that.
> >
> > * As I said upthread, the patch makes code much more simple; I removed all
> > the changes to setrefs.c added by the partitionwise-join patch.  I also
> > simplified the logic for building a tlist for a child-join rel. The original
> > PWJ computes attr_needed data even for child rels, and build the tlist for a
> > child-join by passing to build_joinrel_tlist that data for input child rels
> > for the child-join.  But I think that's redundant, and it's more
> > straightforward to apply adjust_appendrel_attrs to the parent-join's tlist
> > to get the child-join's tlist.  So, I changed that way, which made
> > unnecessary all the changes to build_joinrel_tlist and placeholder.c added
> > by the PWJ patch, so I removed those as well.
> >
> > * The patch contains all of the regression tests in the original patch
> > proposed by Ashutosh.
> 
> I think this approach is going to run into trouble if the level at
> which we have to apply the ConvertRowTypeExpr happens not to be a
> projection-capable node.

What's the plan forward here? This has been an open item for quite a
while. Robert, are you in agreement with this approach on a high level?
Who is going to drive this forward?

Greetings,

Andres Freund


(2018/07/04 1:35), Andres Freund wrote:
> On 2018-06-22 10:58:28 -0400, Robert Haas wrote:
>> On Tue, Jun 19, 2018 at 8:46 AM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>>> Here is a patch for that.

>> I think this approach is going to run into trouble if the level at
>> which we have to apply the ConvertRowTypeExpr happens not to be a
>> projection-capable node.
>
> What's the plan forward here?

I still think that this approach would be the right way to go, so I plan 
to polish the patch.

> This has been an open item for quite a
> while. Robert, are you in agreement with this approach on a high level?
> Who is going to drive this forward?

I'm willing to do that if it's OK, but I'd like to get more feedback 
from Robert, Ashutosh or anyone else.

Best regards,
Etsuro Fujita


On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> (2018/06/22 22:54), Ashutosh Bapat wrote:
>>
>> I have started reviewing the patch.
>
>
> Thanks for the review!
>
>> +       if (enable_partitionwise_join&&  rel->top_parent_is_partitioned)
>> +       {
>> +           build_childrel_tlist(root, rel, childrel, 1,&appinfo);
>> +       }
>>
>> Why do we need rel->top_parent_is_partitioned? If a relation is
>> partitioned (if (rel->part_scheme), it's either the top parent or is
>> partition of some other partitioned table. In either case this
>> condition will be true.
>
>
> This would be needed to avoid unnecessarily applying build_childrel_tlist to
> child rels of a partitioned table for which we don't consider partitionwise
> join.  Consider:
>
> postgres=# create table lpt (c1 int, c2 text) partition by list (c1);
> CREATE TABLE
> postgres=# create table lpt_p1 partition of lpt for values in (1);
> CREATE TABLE
> postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text);
> CREATE TABLE
> postgres=# create table test (c1 int, c2 text);
> CREATE TABLE
> postgres=# explain verbose select * from (select * from lpt union all select
> * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1);

--- plan clipped

>
> In this example, the top parent is not a partitioned table and we don't need
> to consider partitionwise join for that, so we don't need to apply that to
> the child rel lpt_p1 of the partitioned table lpt (and the table lpt_p2).

FWIW, the test is not sufficient. In the above example, a simple
select * from lpt inner join test where lpt.c1 = test.c1 would not use
partition-wise join technique. Why not to avoid build_childrel_tlist()
in that case as well? Worst we can change the criteria to use
partition-wise join in future e.g. above case would use partition-wise
join by 1. merging lpt_p1 into corresponding partition of lpt so that
ss is partitioned and 2. repartitioning test or joining test with each
partition of lpt separately. In those cases the changes you are doing
here need to be reverted and put somewhere else. There's already a
patch to reverse the order of join and grouping out there. How would
this work with that.

In general I think set_append_rel_size() or build_simple_rel() is not
the place where we should decide whether the given relation will
participate in a PWJ. Also we should not selectively add a
ConvertRowtypeExpr on top of a whole-row Var of some a child relation.
We should do it for all the child rels or shouldn't do it at all. An
in-between state will produce a hell lot of confusion for any further
optimization. Whenever we change the code around partition-wise
operations in future, we will have to check whether or not a given
child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I
have mentioned earlier, I am also not comfortable with the targetlist
of child relations being type inconsistent with that of the parent,
which is a fundamental rule in inheritance. Worst keep them
inconsistent during path creation and make them consistent at the time
of creating plans. A child's whole-row Var has datatype of the child
where as that of parent has datatype of parent. A ConvertRowtypeExpr
is used to fix this inconsistency. That's why I chose to use
pull_var_clause() as a place to fix the problem and not fix
ConvertRowtypeExpr in targetlist itself.

I am also not comfortable in just avoiding ConvertRowtypeExprs in
targetlist and leave them as is in the conditions and ECs etc. This
means searching a ConvertRowtypeExpr e.g. for creating pathkeys in
targetlist will fail at the time of path creation but will succeed at
the time of plan creation.

This is turning more invasive that my approach of fixing it through PVC.

>
>> +   /* No work if the child plan is an Append or MergeAppend */
>> +   if (IsA(subplan, Append) || IsA(subplan, MergeAppend))
>> +       return;
>>
>> Why? Probably it's related to the answer to the first question, But I
>> don't see the connection. Given that partition-wise join will be
>> applicable level by level, we need to recurse in
>> adjust_subplan_tlist().
>
>
> I don't think so; I think if the child plan is an Append or MergeAppend, the
> tlist for each subplan of the Append or MergeAppend would have already been
> adjusted while create_plan_recurse before we are called here.

Ok. Thanks for the clarification. I think we should add a comment.

>
>> +   /* The child plan should be able to do projection */
>> +   Assert(is_projection_capable_plan(subplan));
>> +
>> Again why? A MergeAppend's subplan could be a Sort plan, which will
>> not be projection capable.
>
>
> IIUC, since we are called here right after create_plan_recurse, the child
> plan would be a scan or join unless it's neither Append nor MergeAppend.  So
> it should be projection-capable.  Maybe I'm missing something, though.

That's not true. add_paths_to_append_rel() adds sort paths on scan if
necessary and creates merge append path.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


(2018/07/02 18:46), Etsuro Fujita wrote:
> (2018/06/22 23:58), Robert Haas wrote:
>> And, in general, it seems to me that we want
>> to produce the right outputs at the lowest possible level of the plan
>> tree. For instance, suppose that one of the relations being scanned
>> is not parallel-safe but the others are. Then, we could end up with a
>> plan like this:
>>
>> Append
>> -> Seq Scan on temp_rela
>> -> Gather
>> -> Parallel Seq Scan on thing1
>> -> Gather
>> -> Parallel Seq Scan on thing2
>>
>> If a projection is required to convert the row type expression, we
>> certainly want that to get pushed below the Gather, not to happen at
>> the Gather level itself.
>
> IIUC, we currently don't consider such a plan for partition-wise join;
> we'll only consider gathering partial paths for the parent appendrel.

While updating the patch, I noticed that I'm wrong here.  IIUC, 
apply_scanjoin_target_to_paths would allow us to consider such an Append 
for a partitioned relation when scanjoin_target_parallel_safe=false, as 
it generates new Append paths by recursively adjusting all partitions 
for which we call generate_gather_paths in that case as shown below:

     /*
      * If the scan/join target is not parallel-safe, partial paths cannot
      * generate it.
      */
     if (!scanjoin_target_parallel_safe)
     {
         /*
          * Since we can't generate the final scan/join target, this is our
          * last opportunity to use any partial paths that exist.  We 
don't do
          * this if the case where the target is parallel-safe, since we 
will
          * be able to generate superior paths by doing it after the final
          * scan/join target has been applied.
          *
          * Note that this may invalidate rel->cheapest_total_path, so 
we must
          * not rely on it after this point without first calling 
set_cheapest.
          */
         generate_gather_paths(root, rel, false);

         /* Can't use parallel query above this level. */
         rel->partial_pathlist = NIL;
         rel->consider_parallel = false;
     }

I don't produce a test case where the plan is an Append with Gather 
subplans, but I'm not sure that it's a good thing to allow us to 
consider such a plan because in that plan, each Gather would try to grab 
its own pool of workers.  Am I missing something?

Best regards,
Etsuro Fujita


On Wed, Jul 4, 2018 at 3:36 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
>
> I don't produce a test case where the plan is an Append with Gather
> subplans, but I'm not sure that it's a good thing to allow us to consider
> such a plan because in that plan, each Gather would try to grab its own pool
> of workers.  Am I missing something?

If the overall join can not use parallelism, having individual child
joins use parallel query might turn out to be efficient even if done
one child at a time. Parallel append drastically reduces the cases
where something like could be useful, but I don't think we can
theoretically eliminate the need for such a plan.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


(2018/07/04 19:04), Ashutosh Bapat wrote:
> On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> (2018/06/22 22:54), Ashutosh Bapat wrote:
>>> +       if (enable_partitionwise_join&&   rel->top_parent_is_partitioned)
>>> +       {
>>> +           build_childrel_tlist(root, rel, childrel, 1,&appinfo);
>>> +       }
>>>
>>> Why do we need rel->top_parent_is_partitioned? If a relation is
>>> partitioned (if (rel->part_scheme), it's either the top parent or is
>>> partition of some other partitioned table. In either case this
>>> condition will be true.
>>
>>
>> This would be needed to avoid unnecessarily applying build_childrel_tlist to
>> child rels of a partitioned table for which we don't consider partitionwise
>> join.  Consider:
>>
>> postgres=# create table lpt (c1 int, c2 text) partition by list (c1);
>> CREATE TABLE
>> postgres=# create table lpt_p1 partition of lpt for values in (1);
>> CREATE TABLE
>> postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text);
>> CREATE TABLE
>> postgres=# create table test (c1 int, c2 text);
>> CREATE TABLE
>> postgres=# explain verbose select * from (select * from lpt union all select
>> * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1);
>
> --- plan clipped
>
>>
>> In this example, the top parent is not a partitioned table and we don't need
>> to consider partitionwise join for that, so we don't need to apply that to
>> the child rel lpt_p1 of the partitioned table lpt (and the table lpt_p2).
>
> FWIW, the test is not sufficient. In the above example, a simple
> select * from lpt inner join test where lpt.c1 = test.c1 would not use
> partition-wise join technique. Why not to avoid build_childrel_tlist()
> in that case as well?

I might misunderstand your words, but in the above example the patch 
doesn't apply build_childrel_tlist to lpt_p1 and lpt_p2.  The reason for 
that is because we can avoid adjusting the tlists for the corresponding 
subplans at plan creation time so that whole-row Vars in the tlists are 
transformed into CREs.  I think the overhead of the adjustment is not 
that big, but not zero, so it would be worth avoiding applying 
build_childrel_tlist to partitions if the top parent won't participate 
in a partitionwise-join at all.

> Worst we can change the criteria to use
> partition-wise join in future e.g. above case would use partition-wise
> join by 1. merging lpt_p1 into corresponding partition of lpt so that
> ss is partitioned and 2. repartitioning test or joining test with each
> partition of lpt separately. In those cases the changes you are doing
> here need to be reverted and put somewhere else. There's already a
> patch to reverse the order of join and grouping out there. How would
> this work with that.

Interesting, but that seems like a matter of PG12+.

> In general I think set_append_rel_size() or build_simple_rel() is not
> the place where we should decide whether the given relation will
> participate in a PWJ. Also we should not selectively add a
> ConvertRowtypeExpr on top of a whole-row Var of some a child relation.
> We should do it for all the child rels or shouldn't do it at all.

One thing I thought was to apply build_childrel_tlist for all the child 
rels whether its top parent is a partitioned table or not, but as I 
mentioned above, that would incur needless overhead for adjusting the 
tlists for the child rels that don't involve in a partitionwise join 
such as lpt_p1 and lpt_p2, which I think is not good.

> An
> in-between state will produce a hell lot of confusion for any further
> optimization. Whenever we change the code around partition-wise
> operations in future, we will have to check whether or not a given
> child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I
> have mentioned earlier, I am also not comfortable with the targetlist
> of child relations being type inconsistent with that of the parent,
> which is a fundamental rule in inheritance. Worst keep them
> inconsistent during path creation and make them consistent at the time
> of creating plans. A child's whole-row Var has datatype of the child
> where as that of parent has datatype of parent.

I don't see any critical issue here.  Could you elaborate a bit more on 
that point?

> A ConvertRowtypeExpr
> is used to fix this inconsistency. That's why I chose to use
> pull_var_clause() as a place to fix the problem and not fix
> ConvertRowtypeExpr in targetlist itself.

I think the biggest issue in the original patch you proposed is that we 
spend extra cycles where partitioning is not involved, which is the 
biggest reason why I think the original patch isn't the right way to go.

> I am also not comfortable in just avoiding ConvertRowtypeExprs in
> targetlist and leave them as is in the conditions and ECs etc. This
> means searching a ConvertRowtypeExpr e.g. for creating pathkeys in
> targetlist will fail at the time of path creation but will succeed at
> the time of plan creation.
>
> This is turning more invasive that my approach of fixing it through PVC.

Sorry, I don't understand this.  Could you show me places where the 
problem happens?

>>> +   /* No work if the child plan is an Append or MergeAppend */
>>> +   if (IsA(subplan, Append) || IsA(subplan, MergeAppend))
>>> +       return;
>>>
>>> Why? Probably it's related to the answer to the first question, But I
>>> don't see the connection. Given that partition-wise join will be
>>> applicable level by level, we need to recurse in
>>> adjust_subplan_tlist().
>>
>>
>> I don't think so; I think if the child plan is an Append or MergeAppend, the
>> tlist for each subplan of the Append or MergeAppend would have already been
>> adjusted while create_plan_recurse before we are called here.
>
> Ok. Thanks for the clarification. I think we should add a comment.

Will do.

>>> +   /* The child plan should be able to do projection */
>>> +   Assert(is_projection_capable_plan(subplan));
>>> +
>>> Again why? A MergeAppend's subplan could be a Sort plan, which will
>>> not be projection capable.
>>
>>
>> IIUC, since we are called here right after create_plan_recurse, the child
>> plan would be a scan or join unless it's neither Append nor MergeAppend.  So
>> it should be projection-capable.  Maybe I'm missing something, though.
>
> That's not true. add_paths_to_append_rel() adds sort paths on scan if
> necessary and creates merge append path.

Really?  I think create_merge_append_path called from 
generate_mergeappend_paths called from add_paths_to_append_rel uses a 
dummy sort path just to compute the cost, but I don't think 
create_merge_append_path (or generate_mergeappend_paths or 
add_paths_to_append_rel) insert a sort path to a scan (or join) path.

Thanks for the comments!

Best regards,
Etsuro Fujita


On Wed, Jul 4, 2018 at 5:36 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> (2018/07/04 19:04), Ashutosh Bapat wrote:
>>
>> On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>>>
>>> (2018/06/22 22:54), Ashutosh Bapat wrote:
>>>>
>>>> +       if (enable_partitionwise_join&&
>>>> rel->top_parent_is_partitioned)
>>>> +       {
>>>> +           build_childrel_tlist(root, rel, childrel, 1,&appinfo);
>>>> +       }
>>>>
>>>> Why do we need rel->top_parent_is_partitioned? If a relation is
>>>> partitioned (if (rel->part_scheme), it's either the top parent or is
>>>> partition of some other partitioned table. In either case this
>>>> condition will be true.
>>>
>>>
>>>
>>> This would be needed to avoid unnecessarily applying build_childrel_tlist
>>> to
>>> child rels of a partitioned table for which we don't consider
>>> partitionwise
>>> join.  Consider:
>>>
>>> postgres=# create table lpt (c1 int, c2 text) partition by list (c1);
>>> CREATE TABLE
>>> postgres=# create table lpt_p1 partition of lpt for values in (1);
>>> CREATE TABLE
>>> postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text);
>>> CREATE TABLE
>>> postgres=# create table test (c1 int, c2 text);
>>> CREATE TABLE
>>> postgres=# explain verbose select * from (select * from lpt union all
>>> select
>>> * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1);
>>
>>
>> --- plan clipped
>>
>>>
>>> In this example, the top parent is not a partitioned table and we don't
>>> need
>>> to consider partitionwise join for that, so we don't need to apply that
>>> to
>>> the child rel lpt_p1 of the partitioned table lpt (and the table lpt_p2).
>>
>>
>> FWIW, the test is not sufficient. In the above example, a simple
>> select * from lpt inner join test where lpt.c1 = test.c1 would not use
>> partition-wise join technique. Why not to avoid build_childrel_tlist()
>> in that case as well?
>
>
> I might misunderstand your words, but in the above example the patch doesn't
> apply build_childrel_tlist to lpt_p1 and lpt_p2.  The reason for that is
> because we can avoid adjusting the tlists for the corresponding subplans at
> plan creation time so that whole-row Vars in the tlists are transformed into
> CREs.  I think the overhead of the adjustment is not that big, but not zero,
> so it would be worth avoiding applying build_childrel_tlist to partitions if
> the top parent won't participate in a partitionwise-join at all.

I don't think that answers my question. When we join lpt with test,
your patch will apply build_childrel_tlist() to lpt_p1 and lpt_p2 even
when join between lpt and test is not going to use partition-wise
join. Why? As per your explanation, the condition "if
(enable_partitionwise_join &&  rel->top_parent_is_partitioned)" is
used to avoid applying build_childrel_tlist() when partition-wise join
won't be possible. But it's not covering all the cases.

>
>> Worst we can change the criteria to use
>> partition-wise join in future e.g. above case would use partition-wise
>> join by 1. merging lpt_p1 into corresponding partition of lpt so that
>> ss is partitioned and 2. repartitioning test or joining test with each
>> partition of lpt separately. In those cases the changes you are doing
>> here need to be reverted and put somewhere else. There's already a
>> patch to reverse the order of join and grouping out there. How would
>> this work with that.
>
>
> Interesting, but that seems like a matter of PG12+.

Yes, and I don't think it's a good idea to change this fix for PG12+ again.

>
>> In general I think set_append_rel_size() or build_simple_rel() is not
>> the place where we should decide whether the given relation will
>> participate in a PWJ. Also we should not selectively add a
>> ConvertRowtypeExpr on top of a whole-row Var of some a child relation.
>> We should do it for all the child rels or shouldn't do it at all.
>
>
> One thing I thought was to apply build_childrel_tlist for all the child rels
> whether its top parent is a partitioned table or not, but as I mentioned
> above, that would incur needless overhead for adjusting the tlists for the
> child rels that don't involve in a partitionwise join such as lpt_p1 and
> lpt_p2, which I think is not good.
>
>> An
>> in-between state will produce a hell lot of confusion for any further
>> optimization. Whenever we change the code around partition-wise
>> operations in future, we will have to check whether or not a given
>> child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I
>> have mentioned earlier, I am also not comfortable with the targetlist
>> of child relations being type inconsistent with that of the parent,
>> which is a fundamental rule in inheritance. Worst keep them
>> inconsistent during path creation and make them consistent at the time
>> of creating plans. A child's whole-row Var has datatype of the child
>> where as that of parent has datatype of parent.
>
>
> I don't see any critical issue here.  Could you elaborate a bit more on that
> point?

I think breaking a fundamental rule like this itself is critical. But
interestingly I am not able to find a case where it becomes a problem.
But may be I haven't tried enough. And the question is if it's not
required to have the targetlists type consistent, why even bother with
ConvertRowtypeExpr addition there? We can use your approach of adding
ConvertRowtypeExpr at the end in all the cases.

>
>> A ConvertRowtypeExpr
>> is used to fix this inconsistency. That's why I chose to use
>> pull_var_clause() as a place to fix the problem and not fix
>> ConvertRowtypeExpr in targetlist itself.
>
>
> I think the biggest issue in the original patch you proposed is that we
> spend extra cycles where partitioning is not involved, which is the biggest
> reason why I think the original patch isn't the right way to go.

When there are no partitions involved, there won't be any
ConvertRowtypeExprs there which means the function
is_converted_whole_row_reference() would just return from the first
line checking IsA() and nullness of node. pull_var_clause() is
structured so that it always incurs IsA() cost whether or not the
corresponding type of node is present in the query or not. E.g. even
if there are no aggregates in the query, we will still incur some
cycles checking IsA(Aggref, node) when that function is called. We
could restructure the condition as IsA(node, ConvertRowtypeExpr) &&
is_converted_whole_row_reference(node) to avoid even the extra cycles
spent in function call.

>
>> I am also not comfortable in just avoiding ConvertRowtypeExprs in
>> targetlist and leave them as is in the conditions and ECs etc. This
>> means searching a ConvertRowtypeExpr e.g. for creating pathkeys in
>> targetlist will fail at the time of path creation but will succeed at
>> the time of plan creation.
>>
>> This is turning more invasive that my approach of fixing it through PVC.
>
>
> Sorry, I don't understand this.  Could you show me places where the problem
> happens?

I think breaking a fundamental rule like this itself is invasive.

>
>>>> +   /* The child plan should be able to do projection */
>>>> +   Assert(is_projection_capable_plan(subplan));
>>>> +
>>>> Again why? A MergeAppend's subplan could be a Sort plan, which will
>>>> not be projection capable.
>>>
>>>
>>>
>>> IIUC, since we are called here right after create_plan_recurse, the child
>>> plan would be a scan or join unless it's neither Append nor MergeAppend.
>>> So
>>> it should be projection-capable.  Maybe I'm missing something, though.
>>
>>
>> That's not true. add_paths_to_append_rel() adds sort paths on scan if
>> necessary and creates merge append path.
>
>
> Really?  I think create_merge_append_path called from
> generate_mergeappend_paths called from add_paths_to_append_rel uses a dummy
> sort path just to compute the cost, but I don't think
> create_merge_append_path (or generate_mergeappend_paths or
> add_paths_to_append_rel) insert a sort path to a scan (or join) path.

You are right. We add sort after adjust_subplan_tlist() has been
called. So, we are fine there.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


(2018/07/04 19:11), Ashutosh Bapat wrote:
> On Wed, Jul 4, 2018 at 3:36 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>>
>> I don't produce a test case where the plan is an Append with Gather
>> subplans, but I'm not sure that it's a good thing to allow us to consider
>> such a plan because in that plan, each Gather would try to grab its own pool
>> of workers.  Am I missing something?
>
> If the overall join can not use parallelism, having individual child
> joins use parallel query might turn out to be efficient even if done
> one child at a time. Parallel append drastically reduces the cases
> where something like could be useful, but I don't think we can
> theoretically eliminate the need for such a plan.

In the case where scanjoin_target_parallel_safe=false, we actually will 
consider such parallel paths using the existing partial paths for the 
parent appendrel in the code path shown in a previous email (note: we 
would already have done generate_partitionwise_join_paths or 
set_append_rel_pathlist for the parent appendrel in query_planner.)  For 
such a parallel path, we might need to do a projection in the Gather 
node for generating the SRF-free scan/join target, but I think that 
would be still efficient.  So, I'm not sure that we really need to 
create child Gathers and generate an Append with those Gathers, as you 
mentioned.

Another thing I noticed is: actually, we don't produce an Append with 
child Gathers in apply_scanjoin_target_to_paths, which I thought we 
would do that in the case of scanjoin_target_parallel_safe=false, but I 
noticed I was wrong.  Sorry for that.  The reason is because in that 
case, even if we create new partial Append paths with child Gathers, we 
don't run generate_gatehr_paths for the newly created partial paths at 
the end of that function shown below, since the parent's 
consider_parallel flag is set to false in that case:

     /*
      * Consider generating Gather or Gather Merge paths.  We must only 
do this
      * if the relation is parallel safe, and we don't do it for child 
rels to
      * avoid creating multiple Gather nodes within the same plan. We 
must do
      * this after all paths have been generated and before 
set_cheapest, since
      * one of the generated paths may turn out to be the cheapest one.
      */
     if (rel->consider_parallel && !IS_OTHER_REL(rel))
         generate_gather_paths(root, rel, false);

Best regards,
Etsuro Fujita


(2018/07/05 20:15), Etsuro Fujita wrote:
> (2018/07/04 19:11), Ashutosh Bapat wrote:
>> On Wed, Jul 4, 2018 at 3:36 PM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp> wrote:
>>>
>>> I don't produce a test case where the plan is an Append with Gather
>>> subplans, but I'm not sure that it's a good thing to allow us to
>>> consider
>>> such a plan because in that plan, each Gather would try to grab its
>>> own pool
>>> of workers. Am I missing something?
>>
>> If the overall join can not use parallelism, having individual child
>> joins use parallel query might turn out to be efficient even if done
>> one child at a time. Parallel append drastically reduces the cases
>> where something like could be useful, but I don't think we can
>> theoretically eliminate the need for such a plan.
>
> In the case where scanjoin_target_parallel_safe=false, we actually will
> consider such parallel paths using the existing partial paths for the
> parent appendrel in the code path shown in a previous email (note: we
> would already have done generate_partitionwise_join_paths or
> set_append_rel_pathlist for the parent appendrel in query_planner.) For
> such a parallel path, we might need to do a projection in the Gather
> node for generating the SRF-free scan/join target, but I think that
> would be still efficient. So, I'm not sure that we really need to create
> child Gathers and generate an Append with those Gathers, as you mentioned.
>
> Another thing I noticed is: actually, we don't produce an Append with
> child Gathers in apply_scanjoin_target_to_paths, which I thought we
> would do that in the case of scanjoin_target_parallel_safe=false, but I
> noticed I was wrong. Sorry for that. The reason is because in that case,
> even if we create new partial Append paths with child Gathers, we don't
> run generate_gatehr_paths for the newly created partial paths at the end
> of that function shown below, since the parent's consider_parallel flag
> is set to false in that case:
>
> /*
> * Consider generating Gather or Gather Merge paths. We must only do this
> * if the relation is parallel safe, and we don't do it for child rels to
> * avoid creating multiple Gather nodes within the same plan. We must do
> * this after all paths have been generated and before set_cheapest, since
> * one of the generated paths may turn out to be the cheapest one.
> */
> if (rel->consider_parallel && !IS_OTHER_REL(rel))
> generate_gather_paths(root, rel, false);

One thing to add is: ISTM we need some cleanup for 
apply_scanjoin_target_to_paths.

Best regards,
Etsuro Fujita


On Thu, Jul 5, 2018 at 4:45 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
>
>
> In the case where scanjoin_target_parallel_safe=false, we actually will
> consider such parallel paths using the existing partial paths for the parent
> appendrel in the code path shown in a previous email (note: we would already
> have done generate_partitionwise_join_paths or set_append_rel_pathlist for
> the parent appendrel in query_planner.)  For such a parallel path, we might
> need to do a projection in the Gather node for generating the SRF-free
> scan/join target, but I think that would be still efficient.  So, I'm not
> sure that we really need to create child Gathers and generate an Append with
> those Gathers, as you mentioned.

I don't think it will be Append of Gathers, but Append where one of
the subplans is Gather. I don't think that possibility is completely
eliminated.

>
> Another thing I noticed is: actually, we don't produce an Append with child
> Gathers in apply_scanjoin_target_to_paths, which I thought we would do that
> in the case of scanjoin_target_parallel_safe=false, but I noticed I was
> wrong.  Sorry for that.  The reason is because in that case, even if we
> create new partial Append paths with child Gathers, we don't run
> generate_gatehr_paths for the newly created partial paths at the end of that
> function shown below, since the parent's consider_parallel flag is set to
> false in that case:
>
>     /*
>      * Consider generating Gather or Gather Merge paths.  We must only do
> this
>      * if the relation is parallel safe, and we don't do it for child rels
> to
>      * avoid creating multiple Gather nodes within the same plan. We must do
>      * this after all paths have been generated and before set_cheapest,
> since
>      * one of the generated paths may turn out to be the cheapest one.
>      */
>     if (rel->consider_parallel && !IS_OTHER_REL(rel))
>         generate_gather_paths(root, rel, false);

Hmm. I don't think that's a great idea since such a plan might turn
out cheaper esp. when there are very few children which could use
parallel query and parallel append is possible at the top parent. But
anyway, that's what it is today. But I think, we shouldn't write code
assuming that an Append will never see a Gather below it. We might see
some plans like that in future and need to change your code.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


(2018/07/05 22:04), Ashutosh Bapat wrote:
> On Thu, Jul 5, 2018 at 4:45 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> Another thing I noticed is: actually, we don't produce an Append with child
>> Gathers in apply_scanjoin_target_to_paths, which I thought we would do that
>> in the case of scanjoin_target_parallel_safe=false, but I noticed I was
>> wrong.  Sorry for that.  The reason is because in that case, even if we
>> create new partial Append paths with child Gathers, we don't run
>> generate_gatehr_paths for the newly created partial paths at the end of that
>> function shown below, since the parent's consider_parallel flag is set to
>> false in that case:

> Hmm. I don't think that's a great idea since such a plan might turn
> out cheaper esp. when there are very few children which could use
> parallel query and parallel append is possible at the top parent.

Actually, I don't think we can create such a Parallel Append, because in 
the case where we have scanjoin_target_parallel_safe=false, child paths 
would not be parallel-safe anymore as those paths are rewritten to have 
parallel-unsafe targets in the recursion of apply_scanjoin_target_to_paths.

> anyway, that's what it is today. But I think, we shouldn't write code
> assuming that an Append will never see a Gather below it. We might see
> some plans like that in future and need to change your code.

I agree that we might allow such an Append in future, but I'm not sure 
we should write code for that in PG11, because 1) that would make code 
complicated than necessary and 2) that might cause overlooking of 
unexpected behavior of the planner; if the planner created such an 
Append erroneously, we would run that by that code without noticing 
that.  To reduce the maintenance burden and the overlooking risk, I 
think it's better to extend code for such a plan when we allow that plan 
for a partitionwise join.

Best regards,
Etsuro Fujita


(2018/07/04 21:37), Ashutosh Bapat wrote:
> On Wed, Jul 4, 2018 at 5:36 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> (2018/07/04 19:04), Ashutosh Bapat wrote:
>>> On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita
>>> <fujita.etsuro@lab.ntt.co.jp>   wrote:
>>>> (2018/06/22 22:54), Ashutosh Bapat wrote:
>>>>> +       if (enable_partitionwise_join&&
>>>>> rel->top_parent_is_partitioned)
>>>>> +       {
>>>>> +           build_childrel_tlist(root, rel, childrel, 1,&appinfo);
>>>>> +       }
>>>>>
>>>>> Why do we need rel->top_parent_is_partitioned? If a relation is
>>>>> partitioned (if (rel->part_scheme), it's either the top parent or is
>>>>> partition of some other partitioned table. In either case this
>>>>> condition will be true.

>>>> This would be needed to avoid unnecessarily applying build_childrel_tlist
>>>> to
>>>> child rels of a partitioned table for which we don't consider
>>>> partitionwise
>>>> join.  Consider:
>>>>
>>>> postgres=# create table lpt (c1 int, c2 text) partition by list (c1);
>>>> CREATE TABLE
>>>> postgres=# create table lpt_p1 partition of lpt for values in (1);
>>>> CREATE TABLE
>>>> postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text);
>>>> CREATE TABLE
>>>> postgres=# create table test (c1 int, c2 text);
>>>> CREATE TABLE
>>>> postgres=# explain verbose select * from (select * from lpt union all
>>>> select
>>>> * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1);

>> I might misunderstand your words, but in the above example the patch doesn't
>> apply build_childrel_tlist to lpt_p1 and lpt_p2.  The reason for that is
>> because we can avoid adjusting the tlists for the corresponding subplans at
>> plan creation time so that whole-row Vars in the tlists are transformed into
>> CREs.  I think the overhead of the adjustment is not that big, but not zero,
>> so it would be worth avoiding applying build_childrel_tlist to partitions if
>> the top parent won't participate in a partitionwise-join at all.
>
> I don't think that answers my question. When we join lpt with test,
> your patch will apply build_childrel_tlist() to lpt_p1 and lpt_p2 even
> when join between lpt and test is not going to use partition-wise
> join. Why?

Maybe my explanation including the example was not good.  Sorry about 
that, but my patch will *not* apply build_childrel_tlist to lpt_p1 and 
lpt_p2 since the top parent of lpt_p1 and lpt_p2 is the UNION ALL 
subquery and hence not a partitioned table (ie, we have 
rel->top_parent_is_partitioned=false for lpt_p1 and lpt_p2).

> As per your explanation, the condition "if
> (enable_partitionwise_join&&   rel->top_parent_is_partitioned)" is
> used to avoid applying build_childrel_tlist() when partition-wise join
> won't be possible. But it's not covering all the cases.

Let me explain about that: 1) my patch won't apply that function to a 
child if its top parent is an appendrel built from a UNION ALL subquery, 
even though the child is a partition of a partitioned table pulled up 
from a leaf subquery into the parent query, like lpt_p1, and 2) my patch 
will apply that function to a child if its top parent is a partitioned 
table, whether or not the partitioned table is involved in a 
partitionwise join.  By #1, we avoid the adjustment work at plan 
creation time, as explained above.  It might be worth trying to be 
smarter about #2 (for example, in the case of a join of a partitioned 
table and a non-partitioned table, since we don't consider a 
partitionwise join for that join, it's better to not apply that function 
to partitions of the partitioned table, to avoid the adjustment work at 
plan creation time), but ISTM we don't have enough information to be 
smarter.

>>> An
>>> in-between state will produce a hell lot of confusion for any further
>>> optimization. Whenever we change the code around partition-wise
>>> operations in future, we will have to check whether or not a given
>>> child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I
>>> have mentioned earlier, I am also not comfortable with the targetlist
>>> of child relations being type inconsistent with that of the parent,
>>> which is a fundamental rule in inheritance. Worst keep them
>>> inconsistent during path creation and make them consistent at the time
>>> of creating plans. A child's whole-row Var has datatype of the child
>>> where as that of parent has datatype of parent.
>>
>>
>> I don't see any critical issue here.  Could you elaborate a bit more on that
>> point?
>
> I think breaking a fundamental rule like this itself is critical. But
> interestingly I am not able to find a case where it becomes a problem.
> But may be I haven't tried enough. And the question is if it's not
> required to have the targetlists type consistent, why even bother with
> ConvertRowtypeExpr addition there? We can use your approach of adding
> ConvertRowtypeExpr at the end in all the cases.

I think that the tlist of a (not-the-topmost) child relation doesn't 
need to be type-consistent with that of the parent; it has only to 
include all Vars that are needed for higher joinquals and final output 
to the parent appendrel.  (In other words, I think we can build the 
tlist in the same manner as we build tlists of base or join relations in 
the main join tree.)  On the other hand, other expressions such as WHERE 
quals need to be type-consistent with those of the parent; else we would 
create a plan that produces the wrong result.  So, with my patch we 
carry out special handling for the tlist; it includes a child whole-row 
Var, if needed, instead of a CRE converting the Var to the parent 
rowtype.  That allows us to create/process child-join plans using the 
existing planner functions as-is.

>>> A ConvertRowtypeExpr
>>> is used to fix this inconsistency. That's why I chose to use
>>> pull_var_clause() as a place to fix the problem and not fix
>>> ConvertRowtypeExpr in targetlist itself.
>>
>>
>> I think the biggest issue in the original patch you proposed is that we
>> spend extra cycles where partitioning is not involved, which is the biggest
>> reason why I think the original patch isn't the right way to go.
>
> When there are no partitions involved, there won't be any
> ConvertRowtypeExprs there which means the function
> is_converted_whole_row_reference() would just return from the first
> line checking IsA() and nullness of node.

Really?  IIRC, with the original patch we would spend extra cycles in a 
non-partitioned inheritance processing [1].

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5AFC0865.8050802%40lab.ntt.co.jp


On Fri, Jul 6, 2018 at 4:29 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> (2018/07/04 21:37), Ashutosh Bapat wrote:
>>
>> On Wed, Jul 4, 2018 at 5:36 PM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>>>
>>> (2018/07/04 19:04), Ashutosh Bapat wrote:
>>>>
>>>> On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita
>>>> <fujita.etsuro@lab.ntt.co.jp>   wrote:
>>>>>
>>>>> (2018/06/22 22:54), Ashutosh Bapat wrote:
>>>>>>
>>>>>> +       if (enable_partitionwise_join&&
>>>>>> rel->top_parent_is_partitioned)
>>>>>> +       {
>>>>>> +           build_childrel_tlist(root, rel, childrel, 1,&appinfo);
>>>>>> +       }
>>>>>>
>>>>>> Why do we need rel->top_parent_is_partitioned? If a relation is
>>>>>> partitioned (if (rel->part_scheme), it's either the top parent or is
>>>>>> partition of some other partitioned table. In either case this
>>>>>> condition will be true.
>
>
>>>>> This would be needed to avoid unnecessarily applying
>>>>> build_childrel_tlist
>>>>> to
>>>>> child rels of a partitioned table for which we don't consider
>>>>> partitionwise
>>>>> join.  Consider:
>>>>>
>>>>> postgres=# create table lpt (c1 int, c2 text) partition by list (c1);
>>>>> CREATE TABLE
>>>>> postgres=# create table lpt_p1 partition of lpt for values in (1);
>>>>> CREATE TABLE
>>>>> postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text);
>>>>> CREATE TABLE
>>>>> postgres=# create table test (c1 int, c2 text);
>>>>> CREATE TABLE
>>>>> postgres=# explain verbose select * from (select * from lpt union all
>>>>> select
>>>>> * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1);
>
>
>>> I might misunderstand your words, but in the above example the patch
>>> doesn't
>>> apply build_childrel_tlist to lpt_p1 and lpt_p2.  The reason for that is
>>> because we can avoid adjusting the tlists for the corresponding subplans
>>> at
>>> plan creation time so that whole-row Vars in the tlists are transformed
>>> into
>>> CREs.  I think the overhead of the adjustment is not that big, but not
>>> zero,
>>> so it would be worth avoiding applying build_childrel_tlist to partitions
>>> if
>>> the top parent won't participate in a partitionwise-join at all.
>>
>>
>> I don't think that answers my question. When we join lpt with test,
>> your patch will apply build_childrel_tlist() to lpt_p1 and lpt_p2 even
>> when join between lpt and test is not going to use partition-wise
>> join. Why?
>
>
> Maybe my explanation including the example was not good.  Sorry about that,
> but my patch will *not* apply build_childrel_tlist to lpt_p1 and lpt_p2
> since the top parent of lpt_p1 and lpt_p2 is the UNION ALL subquery and
> hence not a partitioned table (ie, we have
> rel->top_parent_is_partitioned=false for lpt_p1 and lpt_p2).
>
>> As per your explanation, the condition "if
>> (enable_partitionwise_join&&   rel->top_parent_is_partitioned)" is
>> used to avoid applying build_childrel_tlist() when partition-wise join
>> won't be possible. But it's not covering all the cases.
>
>
> Let me explain about that: 1) my patch won't apply that function to a child
> if its top parent is an appendrel built from a UNION ALL subquery, even
> though the child is a partition of a partitioned table pulled up from a leaf
> subquery into the parent query, like lpt_p1, and 2) my patch will apply that
> function to a child if its top parent is a partitioned table, whether or not
> the partitioned table is involved in a partitionwise join.  By #1, we avoid
> the adjustment work at plan creation time, as explained above.  It might be
> worth trying to be smarter about #2 (for example, in the case of a join of a
> partitioned table and a non-partitioned table, since we don't consider a
> partitionwise join for that join, it's better to not apply that function to
> partitions of the partitioned table, to avoid the adjustment work at plan
> creation time), but ISTM we don't have enough information to be smarter.

That looks like a kludge to me rather than a proper fix. It's not
clear to me as to when somebody can expect ConvertRowtypeExpr in the
targetlist and when don't while creating paths and to an extent plans.
For example, inside add_paths_to_append_rel() or in
apply_scanjoin_target_to_paths() or for that matter any path creation
or plan creation function, we will sometimes get targetlists with
ConvertRowtypeExpr() and sometime not. How do we know which is when.


>
>>>> A ConvertRowtypeExpr
>>>> is used to fix this inconsistency. That's why I chose to use
>>>> pull_var_clause() as a place to fix the problem and not fix
>>>> ConvertRowtypeExpr in targetlist itself.
>>>
>>>
>>>
>>> I think the biggest issue in the original patch you proposed is that we
>>> spend extra cycles where partitioning is not involved, which is the
>>> biggest
>>> reason why I think the original patch isn't the right way to go.
>>
>>
>> When there are no partitions involved, there won't be any
>> ConvertRowtypeExprs there which means the function
>> is_converted_whole_row_reference() would just return from the first
>> line checking IsA() and nullness of node.
>
>
> Really?  IIRC, with the original patch we would spend extra cycles in a
> non-partitioned inheritance processing [1].

As I said, we do spend cycles in that function testing whether a node
is Aggref or not even when the query doesn't have aggregates or
grouping OR spend cycles in testing whether a node is a PlaceHolderVar
when the query doesn't produce any. So, I don't see any problem with
spending a few cycles testing whether a node is ConvertRowtypeExpr or
not when a ConvertRowtypeExpr is not in the query or command. That's
not a huge performance trouble. I would be happy to change my mind, if
you show me performance different with and without this patch in
planning. I haven't seen any.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


(2018/07/06 20:20), Ashutosh Bapat wrote:
> On Fri, Jul 6, 2018 at 4:29 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> (2018/07/04 21:37), Ashutosh Bapat wrote:
>>> On Wed, Jul 4, 2018 at 5:36 PM, Etsuro Fujita
>>> <fujita.etsuro@lab.ntt.co.jp>   wrote:

>> Let me explain about that: 1) my patch won't apply that function to a child
>> if its top parent is an appendrel built from a UNION ALL subquery, even
>> though the child is a partition of a partitioned table pulled up from a leaf
>> subquery into the parent query, like lpt_p1, and 2) my patch will apply that
>> function to a child if its top parent is a partitioned table, whether or not
>> the partitioned table is involved in a partitionwise join.  By #1, we avoid
>> the adjustment work at plan creation time, as explained above.  It might be
>> worth trying to be smarter about #2 (for example, in the case of a join of a
>> partitioned table and a non-partitioned table, since we don't consider a
>> partitionwise join for that join, it's better to not apply that function to
>> partitions of the partitioned table, to avoid the adjustment work at plan
>> creation time), but ISTM we don't have enough information to be smarter.
>
> That looks like a kludge to me rather than a proper fix. It's not
> clear to me as to when somebody can expect ConvertRowtypeExpr in the
> targetlist and when don't while creating paths and to an extent plans.
> For example, inside add_paths_to_append_rel() or in
> apply_scanjoin_target_to_paths() or for that matter any path creation
> or plan creation function, we will sometimes get targetlists with
> ConvertRowtypeExpr() and sometime not. How do we know which is when.

That is known from a new flag "has_child_wholerow" added to RelOptInfo 
to indicate whether an other relation's reltarget has child wholerow 
Vars instead of ConvertRowtypeExprs.  That flag is initialized to false 
and only set to true in build_childrel_tlist if it adds child wholerow 
Vars to the reltarget of a given other relation.  Though, I don't think 
we need to be conscious about whether the reltarget has child wholerow 
Vars or ConvertRowtypeExprs when creating paths; we would just create 
paths based on the reltarget.  What we need to be careful about is when 
creating an Append/MergeAppend plan; we have to adjust the subplan's 
tlist so child wholerow Vars in that tlist are converted to the 
corresponding ConvertRowtypeExprs, if that tlist has those Vars, which 
is also known from the new flag.

>>>> I think the biggest issue in the original patch you proposed is that we
>>>> spend extra cycles where partitioning is not involved, which is the
>>>> biggest
>>>> reason why I think the original patch isn't the right way to go.
>>>
>>>
>>> When there are no partitions involved, there won't be any
>>> ConvertRowtypeExprs there which means the function
>>> is_converted_whole_row_reference() would just return from the first
>>> line checking IsA() and nullness of node.
>>
>>
>> Really?  IIRC, with the original patch we would spend extra cycles in a
>> non-partitioned inheritance processing [1].
>
> As I said, we do spend cycles in that function testing whether a node
> is Aggref or not even when the query doesn't have aggregates or
> grouping OR spend cycles in testing whether a node is a PlaceHolderVar
> when the query doesn't produce any. So, I don't see any problem with
> spending a few cycles testing whether a node is ConvertRowtypeExpr or
> not when a ConvertRowtypeExpr is not in the query or command. That's
> not a huge performance trouble. I would be happy to change my mind, if
> you show me performance different with and without this patch in
> planning. I haven't seen any.

I have to admit that the case in [1] wouldn't affect the performance, 
but my concern is that there might be some cases where the test affects 
performance.  In contrast, in the approach I proposed, we wouldn't need 
to worry about that because the approach does not modify code that is 
not related to partitioning.  I think that's a good thing.

Best regards,
Etsuro Fujita


On Mon, Jul 9, 2018 at 4:33 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
>>
>>
>> As I said, we do spend cycles in that function testing whether a node
>> is Aggref or not even when the query doesn't have aggregates or
>> grouping OR spend cycles in testing whether a node is a PlaceHolderVar
>> when the query doesn't produce any. So, I don't see any problem with
>> spending a few cycles testing whether a node is ConvertRowtypeExpr or
>> not when a ConvertRowtypeExpr is not in the query or command. That's
>> not a huge performance trouble. I would be happy to change my mind, if
>> you show me performance different with and without this patch in
>> planning. I haven't seen any.
>
>
> I have to admit that the case in [1] wouldn't affect the performance, but my
> concern is that there might be some cases where the test affects
> performance.

What are those cases? Can you please provide any numbers supporting your claim?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


(2018/07/09 20:06), Ashutosh Bapat wrote:
> On Mon, Jul 9, 2018 at 4:33 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>>> As I said, we do spend cycles in that function testing whether a node
>>> is Aggref or not even when the query doesn't have aggregates or
>>> grouping OR spend cycles in testing whether a node is a PlaceHolderVar
>>> when the query doesn't produce any. So, I don't see any problem with
>>> spending a few cycles testing whether a node is ConvertRowtypeExpr or
>>> not when a ConvertRowtypeExpr is not in the query or command. That's
>>> not a huge performance trouble. I would be happy to change my mind, if
>>> you show me performance different with and without this patch in
>>> planning. I haven't seen any.
>>
>>
>> I have to admit that the case in [1] wouldn't affect the performance, but my
>> concern is that there might be some cases where the test affects
>> performance.
>
> What are those cases? Can you please provide any numbers supporting your claim?

I don't have any numbers right now, so that is nothing but a concern. 
But as I said in a previous email, in the approach I proposed, we don't 
need to spend extra cycles where partitioning is not involved.  I think 
that is a good thing in itself.  No?

Best regards,
Etsuro Fujita


>
> I don't have any numbers right now, so that is nothing but a concern. But as
> I said in a previous email, in the approach I proposed, we don't need to
> spend extra cycles where partitioning is not involved.  I think that is a
> good thing in itself.  No?

At the cost of having targetlist being type inconsistent. I don't have
any testcase either to show that that's a problem in practice. So,
it's a trade-off of a concern vs concern.

Apart from that, in your approach there are extra cycles spent in
traversing the targetlist to add ConvertRowtypeExpr, albeit only when
there is a whole-row expression in the targetlist, when creating
plans. That's not there in my patch.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


(2018/07/09 20:43), Ashutosh Bapat wrote:
>>
>> I don't have any numbers right now, so that is nothing but a concern. But as
>> I said in a previous email, in the approach I proposed, we don't need to
>> spend extra cycles where partitioning is not involved.  I think that is a
>> good thing in itself.  No?
>
> At the cost of having targetlist being type inconsistent. I don't have
> any testcase either to show that that's a problem in practice. So,
> it's a trade-off of a concern vs concern.

As I said before, I don't see any issue in having such a targetlist, but 
I might be missing something, so I'd like to discuss a bit more about 
that.  Could you tell me the logic/place in the PG code where you think 
the problem might occur.  (IIRC, you mentioned something about that 
before (pathkeys? or index-only scans?), but sorry, I don't understand 
that.)

> Apart from that, in your approach there are extra cycles spent in
> traversing the targetlist to add ConvertRowtypeExpr, albeit only when
> there is a whole-row expression in the targetlist, when creating
> plans. That's not there in my patch.

Right.  That's unfortunate, but I think that that would be still better 
than needing to spent extra cycles where partitioning is not involved. 
And ISTM that the traversing cost is not that large compared to the cost 
we pay before that for query planning for a partitionwise join.

Best regards,
Etsuro Fujita


On Tue, Jul 10, 2018 at 9:08 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> (2018/07/09 20:43), Ashutosh Bapat wrote:
>>>
>>>
>>> I don't have any numbers right now, so that is nothing but a concern. But
>>> as
>>> I said in a previous email, in the approach I proposed, we don't need to
>>> spend extra cycles where partitioning is not involved.  I think that is a
>>> good thing in itself.  No?
>>
>>
>> At the cost of having targetlist being type inconsistent. I don't have
>> any testcase either to show that that's a problem in practice. So,
>> it's a trade-off of a concern vs concern.
>
>
> As I said before, I don't see any issue in having such a targetlist, but I
> might be missing something, so I'd like to discuss a bit more about that.
> Could you tell me the logic/place in the PG code where you think the problem
> might occur.  (IIRC, you mentioned something about that before (pathkeys? or
> index-only scans?), but sorry, I don't understand that.)

IIUC, index-only scan will be used when the all the required columns
are covered by an index. If there is an index on the whole-row
reference of the parent, it will be translated into a
ConvertRowtypeExpr of the child when an index on the child is created.
If the targetlist doesn't have ConvertRowtypeExpr, as your patch does,
the planner won't be able to use such an index on the child table. But
I couldn't create an index with a whole-row reference in it. So, I
think this isn't possible right now. But in future if we allow
creating index on whole-row reference or

Pathkey points to an equivalence class, which contains equivalence
members. A parent equivalence class member containing a whole-row
reference gets translated into a child equivalence member containing a
ConvertRowtypeExpr. At places in planner we match equivalence members
to the targetlist entries. This matching will fail unexpectedly when
ConvertRowtypeExpr is removed from a child's targetlist. But again I
couldn't reproduce a problem when such a mismatch arises.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


(2018/07/10 19:58), Ashutosh Bapat wrote:
> On Tue, Jul 10, 2018 at 9:08 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> (2018/07/09 20:43), Ashutosh Bapat wrote:
>>> At the cost of having targetlist being type inconsistent. I don't have
>>> any testcase either to show that that's a problem in practice.

>> As I said before, I don't see any issue in having such a targetlist, but I
>> might be missing something, so I'd like to discuss a bit more about that.
>> Could you tell me the logic/place in the PG code where you think the problem
>> might occur.  (IIRC, you mentioned something about that before (pathkeys? or
>> index-only scans?), but sorry, I don't understand that.)
>
> IIUC, index-only scan will be used when the all the required columns
> are covered by an index. If there is an index on the whole-row
> reference of the parent, it will be translated into a
> ConvertRowtypeExpr of the child when an index on the child is created.

I think so too.

> If the targetlist doesn't have ConvertRowtypeExpr, as your patch does,
> the planner won't be able to use such an index on the child table. But
> I couldn't create an index with a whole-row reference in it. So, I
> think this isn't possible right now.

Actually, even if we could create such an index on the child table and 
the targetlist had the ConvertRowtypeExpr, the planner would still not 
be able to use an index-only scan with that index; because 
check_index_only would not consider that an index-only scan is possible 
for that index because that index is an expression index and that 
function currently does not consider that index expressions are able to 
be returned back in an index-only scan.  That behavior of the planner 
might be improved in future, though.

> Pathkey points to an equivalence class, which contains equivalence
> members. A parent equivalence class member containing a whole-row
> reference gets translated into a child equivalence member containing a
> ConvertRowtypeExpr.

I think so too.

> At places in planner we match equivalence members
> to the targetlist entries. This matching will fail unexpectedly when
> ConvertRowtypeExpr is removed from a child's targetlist. But again I
> couldn't reproduce a problem when such a mismatch arises.

IIUC, I don't think the planner assumes that for an equivalence member 
there is an matching entry for that member in the targetlist; what I 
think the planner assumes is: an equivalence member is able to be 
computed from expressions in the targetlist.  So, I think it is safe to 
have whole-row Vars instead of ConvertRowtypeExprs in the targetlist.

Thanks for the explanation!

Best regards,
Etsuro Fujita


On Wed, Jul 11, 2018 at 1:23 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
>
>
> Actually, even if we could create such an index on the child table and the
> targetlist had the ConvertRowtypeExpr, the planner would still not be able
> to use an index-only scan with that index; because check_index_only would
> not consider that an index-only scan is possible for that index because that
> index is an expression index and that function currently does not consider
> that index expressions are able to be returned back in an index-only scan.
> That behavior of the planner might be improved in future, though.
>
>> Pathkey points to an equivalence class, which contains equivalence
>> members. A parent equivalence class member containing a whole-row
>> reference gets translated into a child equivalence member containing a
>> ConvertRowtypeExpr.

Right and when we do so, not having ConvertRowtypeExpr in the
targetlist will be a problem.

>
>
> I think so too.
>
>> At places in planner we match equivalence members
>> to the targetlist entries. This matching will fail unexpectedly when
>> ConvertRowtypeExpr is removed from a child's targetlist. But again I
>> couldn't reproduce a problem when such a mismatch arises.
>
>
> IIUC, I don't think the planner assumes that for an equivalence member there
> is an matching entry for that member in the targetlist; what I think the
> planner assumes is: an equivalence member is able to be computed from
> expressions in the targetlist.

This is true. However,

>  So, I think it is safe to have whole-row
> Vars instead of ConvertRowtypeExprs in the targetlist.

when it's looking for an expression, it finds a whole-row expression
so it think it needs to add a ConvertRowtypeExpr on that. But when the
plan is created, there is ConvertRowtypeExpr already, but there is no
way to know that a new ConvertRowtypeExpr is not needed anymore. So,
we may have two ConvertRowtypeExprs giving wrong results.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


(2018/07/11 20:02), Ashutosh Bapat wrote:
> On Wed, Jul 11, 2018 at 1:23 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> Actually, even if we could create such an index on the child table and the
>> targetlist had the ConvertRowtypeExpr, the planner would still not be able
>> to use an index-only scan with that index; because check_index_only would
>> not consider that an index-only scan is possible for that index because that
>> index is an expression index and that function currently does not consider
>> that index expressions are able to be returned back in an index-only scan.
>> That behavior of the planner might be improved in future, though.

> Right and when we do so, not having ConvertRowtypeExpr in the
> targetlist will be a problem.

Yeah, but I don't think that that's unsolvable; because in that case the 
CRE as an index expression could be converted back to the whole-row 
Var's rowtype by adding another CRE to the index expression for that 
conversion, I suspect that that special handling could allow us to 
support an index-only scan even when having the whole-row Var instead of 
the CRE in the targetlist.  (Having said that, I'm not 100% sure we need 
to solve that problem when we improve the planner, because there doesn't 
seem to me to be enough use-case to justify making the code complicated 
for that.)  Anyway, I think that that would be a matter of future 
versions of PG.

>>> At places in planner we match equivalence members
>>> to the targetlist entries. This matching will fail unexpectedly when
>>> ConvertRowtypeExpr is removed from a child's targetlist. But again I
>>> couldn't reproduce a problem when such a mismatch arises.
>>
>>
>> IIUC, I don't think the planner assumes that for an equivalence member there
>> is an matching entry for that member in the targetlist; what I think the
>> planner assumes is: an equivalence member is able to be computed from
>> expressions in the targetlist.
>
> This is true. However,
>
>>   So, I think it is safe to have whole-row
>> Vars instead of ConvertRowtypeExprs in the targetlist.
>
> when it's looking for an expression, it finds a whole-row expression
> so it think it needs to add a ConvertRowtypeExpr on that. But when the
> plan is created, there is ConvertRowtypeExpr already, but there is no
> way to know that a new ConvertRowtypeExpr is not needed anymore. So,
> we may have two ConvertRowtypeExprs giving wrong results.

Maybe I'm missing something, but I don't think that we need to worry 
about that, because in the approach I proposed, we only add CREs above 
whole-row Vars in the targetlists for subplans of an Append/MergeAppend 
for a partitioned relation at plan creation time.

Best regards,
Etsuro Fujita


On Thu, Jul 12, 2018 at 9:02 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> (2018/07/11 20:02), Ashutosh Bapat wrote:
>>
>> On Wed, Jul 11, 2018 at 1:23 PM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>>>
>>> Actually, even if we could create such an index on the child table and
>>> the
>>> targetlist had the ConvertRowtypeExpr, the planner would still not be
>>> able
>>> to use an index-only scan with that index; because check_index_only would
>>> not consider that an index-only scan is possible for that index because
>>> that
>>> index is an expression index and that function currently does not
>>> consider
>>> that index expressions are able to be returned back in an index-only
>>> scan.
>>> That behavior of the planner might be improved in future, though.
>
>
>> Right and when we do so, not having ConvertRowtypeExpr in the
>> targetlist will be a problem.
>
>
> Yeah, but I don't think that that's unsolvable; because in that case the CRE
> as an index expression could be converted back to the whole-row Var's
> rowtype by adding another CRE to the index expression for that conversion, I
> suspect that that special handling could allow us to support an index-only
> scan even when having the whole-row Var instead of the CRE in the
> targetlist.

I am not able to understand this. Can you please provide an example?

> (Having said that, I'm not 100% sure we need to solve that
> problem when we improve the planner, because there doesn't seem to me to be
> enough use-case to justify making the code complicated for that.)  Anyway, I
> think that that would be a matter of future versions of PG.

I am afraid that a fix which just works only till we change the other
parts of the system is useful only till that time. At that time, it
needs to be replaced with a different fix. If that time is long
enough, that's ok. In this case, I agree that if we haven't heard from
users till now that they need to create indexes on whole-row
expressions, there's chance that we will never implement this feature.

>
>>>> At places in planner we match equivalence members
>>>> to the targetlist entries. This matching will fail unexpectedly when
>>>> ConvertRowtypeExpr is removed from a child's targetlist. But again I
>>>> couldn't reproduce a problem when such a mismatch arises.
>>>
>>>
>>>
>>> IIUC, I don't think the planner assumes that for an equivalence member
>>> there
>>> is an matching entry for that member in the targetlist; what I think the
>>> planner assumes is: an equivalence member is able to be computed from
>>> expressions in the targetlist.
>>
>>
>> This is true. However,
>>
>>>   So, I think it is safe to have whole-row
>>> Vars instead of ConvertRowtypeExprs in the targetlist.
>>
>>
>> when it's looking for an expression, it finds a whole-row expression
>> so it think it needs to add a ConvertRowtypeExpr on that. But when the
>> plan is created, there is ConvertRowtypeExpr already, but there is no
>> way to know that a new ConvertRowtypeExpr is not needed anymore. So,
>> we may have two ConvertRowtypeExprs giving wrong results.
>
>
> Maybe I'm missing something, but I don't think that we need to worry about
> that, because in the approach I proposed, we only add CREs above whole-row
> Vars in the targetlists for subplans of an Append/MergeAppend for a
> partitioned relation at plan creation time.

There's a patch in an adjacent thread started by David Rowley to rip
out Append/MergeAppend when there is only one subplan. So, your
solution won't work there.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


(2018/07/12 13:38), Ashutosh Bapat wrote:
> On Thu, Jul 12, 2018 at 9:02 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> (2018/07/11 20:02), Ashutosh Bapat wrote:
>>> On Wed, Jul 11, 2018 at 1:23 PM, Etsuro Fujita
>>> <fujita.etsuro@lab.ntt.co.jp>   wrote:
>>>> Actually, even if we could create such an index on the child table and
>>>> the
>>>> targetlist had the ConvertRowtypeExpr, the planner would still not be
>>>> able
>>>> to use an index-only scan with that index; because check_index_only would
>>>> not consider that an index-only scan is possible for that index because
>>>> that
>>>> index is an expression index and that function currently does not
>>>> consider
>>>> that index expressions are able to be returned back in an index-only
>>>> scan.
>>>> That behavior of the planner might be improved in future, though.
>>
>>
>>> Right and when we do so, not having ConvertRowtypeExpr in the
>>> targetlist will be a problem.
>>
>>
>> Yeah, but I don't think that that's unsolvable; because in that case the CRE
>> as an index expression could be converted back to the whole-row Var's
>> rowtype by adding another CRE to the index expression for that conversion, I
>> suspect that that special handling could allow us to support an index-only
>> scan even when having the whole-row Var instead of the CRE in the
>> targetlist.
>
> I am not able to understand this. Can you please provide an example?

Sorry, my explanation was not good.  Let me try.  We can't create an 
inherited index on the whole-row reference in a partitioned table, but 
actually, we can directly create the corresponding index on a child table:

postgres=# create table pt (a int, b text) partition by list (a);
CREATE TABLE
postgres=# create table ptp1 (b text, a int check (a in (1)));
CREATE TABLE
postgres=# alter table pt attach partition ptp1 for values in (1);
ALTER TABLE
postgres=# create index ptp1_conv_wr_idx on ptp1 ((ptp1::pt));
CREATE INDEX
postgres=# insert into pt values (1, 'foo');
INSERT 0 1
postgres=# select *, ptp1 as wr, ptp1::pt as conv_wr from ptp1;
   b  | a |   wr    | conv_wr
-----+---+---------+---------
  foo | 1 | (foo,1) | (1,foo)
(1 row)

In this example, the value of the whole-row reference to the child table 
ptp1 for that record is ('foo',1), and that of the index expression for 
that record is (1,'foo').  Those have different column orders, but the 
latter could be mapped to the former by a technique like 
do_convert_tuple.  So, what I suspect is: by producing the value of the 
whole-row reference from that of the index expression like that, we 
could support index-only scans with such an index in the case where we 
have the whole-row reference in the targetlist, not the index expression 
itself.

>> (Having said that, I'm not 100% sure we need to solve that
>> problem when we improve the planner, because there doesn't seem to me to be
>> enough use-case to justify making the code complicated for that.)  Anyway, I
>> think that that would be a matter of future versions of PG.
>
> I am afraid that a fix which just works only till we change the other
> parts of the system is useful only till that time. At that time, it
> needs to be replaced with a different fix.

I agree on that point.

> If that time is long
> enough, that's ok. In this case, I agree that if we haven't heard from
> users till now that they need to create indexes on whole-row
> expressions, there's chance that we will never implement this feature.

I don't object to adding that feature to future versions of PG if required.

>>>>    So, I think it is safe to have whole-row
>>>> Vars instead of ConvertRowtypeExprs in the targetlist.
>>>
>>>
>>> when it's looking for an expression, it finds a whole-row expression
>>> so it think it needs to add a ConvertRowtypeExpr on that. But when the
>>> plan is created, there is ConvertRowtypeExpr already, but there is no
>>> way to know that a new ConvertRowtypeExpr is not needed anymore. So,
>>> we may have two ConvertRowtypeExprs giving wrong results.
>>
>>
>> Maybe I'm missing something, but I don't think that we need to worry about
>> that, because in the approach I proposed, we only add CREs above whole-row
>> Vars in the targetlists for subplans of an Append/MergeAppend for a
>> partitioned relation at plan creation time.
>
> There's a patch in an adjacent thread started by David Rowley to rip
> out Append/MergeAppend when there is only one subplan. So, your
> solution won't work there.

Thanks for sharing that information!  I skimmed the thread.  I haven't 
yet caught up with all the discussions there, so I think I'm missing 
something, but it looks like that we haven't yet reached any consensus 
on the way to go.  In my opinion, I like the approach mentioned in [1]. 
  And if we go that way, my patch seems to fit into that, because in 
that approach the Append/MergeAppend could be removed after adjusting 
the targetlists for its subplans in 
create_append_plan/create_merge_append_plan.  Anyway, I'd like to join 
in that work for PG12.

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/1867.1509032831%40sss.pgh.pa.us


On Thu, Jul 12, 2018 at 4:32 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
>
> In this example, the value of the whole-row reference to the child table
> ptp1 for that record is ('foo',1), and that of the index expression for that
> record is (1,'foo').  Those have different column orders, but the latter
> could be mapped to the former by a technique like do_convert_tuple.  So,
> what I suspect is: by producing the value of the whole-row reference from
> that of the index expression like that,

The expression in this case would look like ptp1::pt::ptp1 which won't
match targetlist expression ptp1. I am also doubtful that the planner
will be able to deduce that it need to apply an inverse function of
::pt and what exactly such an inverse function is. So index only scan
won't be picked.

> we could support index-only scans
> with such an index in the case where we have the whole-row reference in the
> targetlist, not the index expression itself.

Can you please show an index only scan path being created in this case?


>
>> If that time is long
>> enough, that's ok. In this case, I agree that if we haven't heard from
>> users till now that they need to create indexes on whole-row
>> expressions, there's chance that we will never implement this feature.
>
>
> I don't object to adding that feature to future versions of PG if required.
>
>>>>>    So, I think it is safe to have whole-row
>>>>> Vars instead of ConvertRowtypeExprs in the targetlist.
>>>>
>>>>
>>>>
>>>> when it's looking for an expression, it finds a whole-row expression
>>>> so it think it needs to add a ConvertRowtypeExpr on that. But when the
>>>> plan is created, there is ConvertRowtypeExpr already, but there is no
>>>> way to know that a new ConvertRowtypeExpr is not needed anymore. So,
>>>> we may have two ConvertRowtypeExprs giving wrong results.
>>>
>>>
>>>
>>> Maybe I'm missing something, but I don't think that we need to worry
>>> about
>>> that, because in the approach I proposed, we only add CREs above
>>> whole-row
>>> Vars in the targetlists for subplans of an Append/MergeAppend for a
>>> partitioned relation at plan creation time.
>>
>>
>> There's a patch in an adjacent thread started by David Rowley to rip
>> out Append/MergeAppend when there is only one subplan. So, your
>> solution won't work there.
>
>
> Thanks for sharing that information!  I skimmed the thread.  I haven't yet
> caught up with all the discussions there, so I think I'm missing something,
> but it looks like that we haven't yet reached any consensus on the way to
> go.  In my opinion, I like the approach mentioned in [1].  And if we go that
> way, my patch seems to fit into that, because in that approach the
> Append/MergeAppend could be removed after adjusting the targetlists for its
> subplans in create_append_plan/create_merge_append_plan.  Anyway, I'd like
> to join in that work for PG12.

Whatever may be the outcome of that work, I think what we fix here
shouldn't require to reverted in a few months from now, just so that
that patch works.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


(2018/07/04 11:00), Etsuro Fujita wrote:
> (2018/07/04 1:35), Andres Freund wrote:
>> What's the plan forward here?
>
> I still think that this approach would be the right way to go, so I plan
> to polish the patch.

The approach would not require extra cycles where partitioning is not 
involved as discussed before, and makes code much simpler, which would 
make maintenance and the FDW author's life easy, so I still think that 
the approach would be the direction to go in.  So, I updated the patch. 
  Here are changes:

* I assumed that the child plan passed to adjust_subplan_tlist was a 
scan or join unless it was Append or MergeAppend, but I noticed I was 
wrong; if the topmost scan/join relation is partitioned, 
apply_scanjoin_target_to_paths might have pushed down ProjectSet or 
Result to each of partitions.  So, in that case the child's plan would 
have ProjectSet or Result atop the scan or join plan.  (I could not 
produce such an example, though.)  To address this, I modified 
build_childrel_tlist so that we just apply adjust_appendrel_attrs to the 
parent's targetlist to get the child's if the parent is the topmost 
scan/join relation.  So, we don't need to care about that in 
adjust_subplan_tlist.  I think this change would also increase the 
efficiency for that case.  I added regression test cases for that as well.

* I fixed an oversight in outfuncs.c.

* I renamed member variables I added to RelOptInfo before, and revised 
code/comments a bit.

Attached is an updated version of the patch.

Best regards,
Etsuro Fujita

Attachment
(2018/07/13 23:05), Ashutosh Bapat wrote:
> On Thu, Jul 12, 2018 at 4:32 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> In this example, the value of the whole-row reference to the child table
>> ptp1 for that record is ('foo',1), and that of the index expression for that
>> record is (1,'foo').  Those have different column orders, but the latter
>> could be mapped to the former by a technique like do_convert_tuple.

> The expression in this case would look like ptp1::pt::ptp1 which won't
> match targetlist expression ptp1. I am also doubtful that the planner
> will be able to deduce that it need to apply an inverse function of
> ::pt and what exactly such an inverse function is. So index only scan
> won't be picked.

>> we could support index-only scans
>> with such an index in the case where we have the whole-row reference in the
>> targetlist, not the index expression itself.
>
> Can you please show an index only scan path being created in this case?

We currently don't consider index-only scan with index expressions, so I 
haven't thought in detail yet about how the planner would work.  But 
once we have that index-only scan, I think we could extend that to the 
case mentioned above, by adding this to the planner: if the index 
expression is of the form var::parenttype, consider that (not only the 
expression itself but) var can be returned from the index.  I think the 
expression like ptp1::pt::ptp1 would be useful to get the value of ptp1 
from the index at execution time.

>>> There's a patch in an adjacent thread started by David Rowley to rip
>>> out Append/MergeAppend when there is only one subplan. So, your
>>> solution won't work there.
>>
>>
>> Thanks for sharing that information!  I skimmed the thread.  I haven't yet
>> caught up with all the discussions there, so I think I'm missing something,
>> but it looks like that we haven't yet reached any consensus on the way to
>> go.  In my opinion, I like the approach mentioned in [1].  And if we go that
>> way, my patch seems to fit into that, because in that approach the
>> Append/MergeAppend could be removed after adjusting the targetlists for its
>> subplans in create_append_plan/create_merge_append_plan.  Anyway, I'd like
>> to join in that work for PG12.
>
> Whatever may be the outcome of that work, I think what we fix here
> shouldn't require to reverted in a few months from now, just so that
> that patch works.

I think we could add that optimization without reverting this change 
because the essential part of this change is to make create_plan adjust 
the tlists of the subplans based on the instruction stored into the 
subplans' RelOptInfos (ie, need_adjust_tlist in the second version of 
the patch).  I think this technique could be extended even to the case 
where we have that optimization.

Best regards,
Etsuro Fujita


On Wed, Jul 18, 2018 at 8:00 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> [ new patch ]

+ /*
+ * If the child plan is an Append or MergeAppend, the targetlists of its
+ * subplans would have already been adjusted before we get here, so need
+ * to work here.
+ */
+ if (IsA(subplan, Append) || IsA(subplan, MergeAppend))
+ return;
+
+ /* The child plan should be a scan or join */
+ Assert(is_projection_capable_plan(subplan));

This looks like a terrible design to me.  If somebody in future
invents a new plan type that is not projection-capable, then this
could fail an assertion here and there won't be any simple fix.  And
if you reach here and the child targetlists somehow haven't been
adjusted, then you won't even fail an assertion, you'll just crash (or
maybe return wrong answers).

I'm going to study this some more now, but I really think this is
going in the wrong direction.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Fri, Jul 20, 2018 at 8:38 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> This looks like a terrible design to me.  If somebody in future
> invents a new plan type that is not projection-capable, then this
> could fail an assertion here and there won't be any simple fix.  And
> if you reach here and the child targetlists somehow haven't been
> adjusted, then you won't even fail an assertion, you'll just crash (or
> maybe return wrong answers).

To elaborate on this thought a bit, it is currently the case that all
scan and join types are projection-capable, and most likely we would
make any such node types added in the future projection-capable as
well, but it still doesn't seem like a good idea to me to have
tangentially-related parts of the system depend on that in subtle
ways.  Also, even today, this would fail if the subplan happened to be
a Sort, and it's not very obvious why that couldn't happen.  If it
can't happen today, it's not obvious why a future code change couldn't
introduce cases where it can happen.

More broadly, the central idea of this patch is that we can set the
target list for a child rel to something other than the target list
that we expect it will eventually produce, and then at plan creation
time fix it.  I think that's a bad idea.  The target list affects lots
of things, like costing.  If we don't insert a ConvertRowTypeExpr into
the child's target list, the costing will be wrong to the extent that
ConvertRowTypeExpr has any cost, which it does.  Any other current or
future property that is computed based on the target list --
parallel-safety, width, security-level, whatever -- could also
potentially end up with a wrong value if the target list as written
does not match the target list as will actually be produced.  It's
true that, in all of these areas, the ConvertRowTypeExpr isn't likely
to have a lot of impact; it probably won't have a big effect on
costing and may not actually cause a problem for the other things that
I mentioned.  On the other hand, if it does cause a serious problem,
what options would we have for fixing it other than to back out your
entire patch and solve the problem some other way?  The idea that
derived properties won't be correct unless they are based on the real
target list is pretty fundamental, and I think deviating from the idea
that we get the correct target list first and then compute the derived
properties afterward is likely to cause real headaches for future
developers.

In short, plan creation time is just the wrong time to change the
plan.  It's just a time to translate the plan from the form needed by
the planner to the form needed by the executor.  It would be fine if
the change you were making were only cosmetic, but it's not.

After looking over both patches, I think Ashutosh Bapat has basically
the right idea.  What he is proposing is a localized fix that doesn't
seem to make any changes to the way things work overall.  I do think
his patches need to be fixed up a bit to avoid conflating
ConvertRowtypeExpr with "converted whole-row reference."  The two are
certainly not equivalent; the latter is a narrow special case.  Some
of the comments could use some more wordsmithing, too, I think.  Apart
from those gripes, though, I think it's solving the problem the
correct way: don't build the wrong plan and try to fix it, just build
the right plan from the beginning.

There are definitely some things not to like about this approach.  In
particular, I definitely agree that treating a converted whole-row
reference specially is not very nice.  It feels like it might be
substantially cleaner to be able to represent this idea as a single
node rather than a combination of ConvertRowtypeExpr and var with
varattno = 0.  Perhaps in the future we ought to consider either (a)
trying to use a RowExpr for this rather than ConvertRowtypeExpr or (b)
inventing a new WholeRowExpr node that stores two RTIs, one for the
relation that's generating the whole-row reference and the other for
the relation that is controlling the column ordering of the result or
(c) allowing a Var to represent a whole-row expression that has to
produce its outputs according to the ordering of some other RTE.  But
I don't think it's wise or necessary to whack that around just to fix
this bug; it is refactoring or improvement work best left to a future
release.

Also, it is definitely a real shame that we have to build attr_needed
data for each child separately.  Right now, we've got to build a whole
lot of things for each child individually, and that's slow and
inefficient.  We're not going to scale nicely to large partitioning
hierarchies unless we can figure out some way of minimizing the work
we do for each partition.  So, the fact that Fujii-san's approach
avoids needing to compute attr_needed in some cases is very appealing.
However, in my opinion, it's nowhere near appealing enough to justify
trying to do surgery on the target list at plan-creation time.  I
think we need to leave optimizing this to a future effort.
Partition-wise join/aggregate, and partitioning in general, need a lot
of optimization in a lot of places, and fortunately there are people
working on that, but our goal here should just be to fix things up
well enough that we can ship it.  I don't see anything in Ashutosh's
patch that is so ugly that we can't live with it for the time being.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


(2018/07/21 0:26), Robert Haas wrote:
> On Fri, Jul 20, 2018 at 8:38 AM, Robert Haas<robertmhaas@gmail.com>  wrote:
>> This looks like a terrible design to me.  If somebody in future
>> invents a new plan type that is not projection-capable, then this
>> could fail an assertion here and there won't be any simple fix.  And
>> if you reach here and the child targetlists somehow haven't been
>> adjusted, then you won't even fail an assertion, you'll just crash (or
>> maybe return wrong answers).
>
> To elaborate on this thought a bit, it is currently the case that all
> scan and join types are projection-capable, and most likely we would
> make any such node types added in the future projection-capable as
> well, but it still doesn't seem like a good idea to me to have
> tangentially-related parts of the system depend on that in subtle
> ways.

I have to admit that that is not a good idea.  So, I'll update the patch 
so that we don't assume the projection capability of the subplan 
anymore, if we go this way.

> Also, even today, this would fail if the subplan happened to be
> a Sort, and it's not very obvious why that couldn't happen.

You mean the MergeAppend case?  In that case, the patch will adjust the 
subplan before adding a Sort above the subplan, so that could not happen.

> More broadly, the central idea of this patch is that we can set the
> target list for a child rel to something other than the target list
> that we expect it will eventually produce, and then at plan creation
> time fix it.

Yeah, the patch would fix the the final output to the appendrel parent 
at plan creation time if necessary, but it would produces the tlist of 
an intermediate child scan/join node as written in the child relation's 
reltarget at that time.

> I think that's a bad idea.  The target list affects lots
> of things, like costing.  If we don't insert a ConvertRowTypeExpr into
> the child's target list, the costing will be wrong to the extent that
> ConvertRowTypeExpr has any cost, which it does.

Actually, this is not true at least currently, because 
set_append_rel_size doesn't do anything about the costing:

          * NB: the resulting childrel->reltarget->exprs may contain 
arbitrary
          * expressions, which otherwise would not occur in a rel's 
targetlist.
          * Code that might be looking at an appendrel child must cope with
          * such.  (Normally, a rel's targetlist would only include Vars and
          * PlaceHolderVars.)  XXX we do not bother to update the cost 
or width
          * fields of childrel->reltarget; not clear if that would be 
useful.

> Any other current or
> future property that is computed based on the target list --
> parallel-safety, width, security-level, whatever -- could also
> potentially end up with a wrong value if the target list as written
> does not match the target list as will actually be produced.  It's
> true that, in all of these areas, the ConvertRowTypeExpr isn't likely
> to have a lot of impact; it probably won't have a big effect on
> costing and may not actually cause a problem for the other things that
> I mentioned.  On the other hand, if it does cause a serious problem,
> what options would we have for fixing it other than to back out your
> entire patch and solve the problem some other way?  The idea that
> derived properties won't be correct unless they are based on the real
> target list is pretty fundamental, and I think deviating from the idea
> that we get the correct target list first and then compute the derived
> properties afterward is likely to cause real headaches for future
> developers.
>
> In short, plan creation time is just the wrong time to change the
> plan.  It's just a time to translate the plan from the form needed by
> the planner to the form needed by the executor.  It would be fine if
> the change you were making were only cosmetic, but it's not.

Some createplan.c routines already change the tlists of their nodes. 
For example, create_merge_append_plan would add sort columns to each of 
its subplans if necessary.  I think it would be similar to that to add a 
ConvertRowtypeExpr above a child whole-row Var in the subplan's tlist.

> After looking over both patches, I think Ashutosh Bapat has basically
> the right idea.  What he is proposing is a localized fix that doesn't
> seem to make any changes to the way things work overall.

I reviewed his patch, and thought that that would fix the issue, but 
this is about the current design on the child tlist handling in 
partitionise join rather than that patch: it deviates from the 
assumption that we had for the scan/join planning till PG10 that "a 
rel's targetlist would only include Vars and PlaceHolderVars", and turns 
out that there are a lot of places where we need to modify code to 
suppress that deviation, as partly shown in that patch.  So, ISTM that 
the current design in itself is not that localized.

> Partition-wise join/aggregate, and partitioning in general, need a lot
> of optimization in a lot of places, and fortunately there are people
> working on that, but our goal here should just be to fix things up
> well enough that we can ship it.

I agree on that point.

Best regards,
Etsuro Fujita


On Fri, Jul 20, 2018 at 8:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:
[ ... clipped portion ...]
>
> In short, plan creation time is just the wrong time to change the
> plan.  It's just a time to translate the plan from the form needed by
> the planner to the form needed by the executor.  It would be fine if
> the change you were making were only cosmetic, but it's not.
>

Agree with all that, including the clipped paras.

> After looking over both patches, I think Ashutosh Bapat has basically
> the right idea.  What he is proposing is a localized fix that doesn't
> seem to make any changes to the way things work overall.  I do think
> his patches need to be fixed up a bit to avoid conflating
> ConvertRowtypeExpr with "converted whole-row reference."  The two are
> certainly not equivalent; the latter is a narrow special case.  Some
> of the comments could use some more wordsmithing, too, I think.  Apart
> from those gripes, though, I think it's solving the problem the
> correct way: don't build the wrong plan and try to fix it, just build
> the right plan from the beginning.

I will work on that if everyone agrees that that's the right way to
go. Fujita-san seems to have some arguments still. I have argued on
the same lines as yours but your explanation is better. I don't have
anything more to add. I will wait for that to be resolved.

>
> There are definitely some things not to like about this approach.  In
> particular, I definitely agree that treating a converted whole-row
> reference specially is not very nice.  It feels like it might be
> substantially cleaner to be able to represent this idea as a single
> node rather than a combination of ConvertRowtypeExpr and var with
> varattno = 0.  Perhaps in the future we ought to consider either (a)
> trying to use a RowExpr for this rather than ConvertRowtypeExpr or (b)
> inventing a new WholeRowExpr node that stores two RTIs, one for the
> relation that's generating the whole-row reference and the other for
> the relation that is controlling the column ordering of the result or
> (c) allowing a Var to represent a whole-row expression that has to
> produce its outputs according to the ordering of some other RTE.

I never liked representing a whole-row expression as a Var worst with
varattno = 0. We should have always casted it as RowExpr(set of vars,
one var per column). This problem wouldn't arise then. Many other
problems wouldn't arise then, I think. I think we do that so that we
can convert a tuple from buffer into a datum and put it in place of
Var with varattno = 0. Given that I prefer (a) or (b) in all the
cases. If not that, then c. But I agree that we have to avoid
ConvertRowtypeExpr being used in this case.

> But
> I don't think it's wise or necessary to whack that around just to fix
> this bug; it is refactoring or improvement work best left to a future
> release.
>
> Also, it is definitely a real shame that we have to build attr_needed
> data for each child separately.  Right now, we've got to build a whole
> lot of things for each child individually, and that's slow and
> inefficient.  We're not going to scale nicely to large partitioning
> hierarchies unless we can figure out some way of minimizing the work
> we do for each partition.  So, the fact that Fujii-san's approach
> avoids needing to compute attr_needed in some cases is very appealing.
> However, in my opinion, it's nowhere near appealing enough to justify
> trying to do surgery on the target list at plan-creation time.  I
> think we need to leave optimizing this to a future effort.
> Partition-wise join/aggregate, and partitioning in general, need a lot
> of optimization in a lot of places, and fortunately there are people
> working on that, but our goal here should just be to fix things up
> well enough that we can ship it.

I agree.

> I don't see anything in Ashutosh's
> patch that is so ugly that we can't live with it for the time being.

Thanks.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


On Mon, Jul 23, 2018 at 3:49 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> I have to admit that that is not a good idea.  So, I'll update the patch so
> that we don't assume the projection capability of the subplan anymore, if we
> go this way.

Isn't that assumption fundamental to your whole approach?

>> Also, even today, this would fail if the subplan happened to be
>> a Sort, and it's not very obvious why that couldn't happen.
>
> You mean the MergeAppend case?  In that case, the patch will adjust the
> subplan before adding a Sort above the subplan, so that could not happen.

That would only help if create_merge_append_plan() itself inserted a
Sort node.  It wouldn't help if the Sort node came from a child path
manufactured by create_sort_path().

>> I think that's a bad idea.  The target list affects lots
>> of things, like costing.  If we don't insert a ConvertRowTypeExpr into
>> the child's target list, the costing will be wrong to the extent that
>> ConvertRowTypeExpr has any cost, which it does.
>
> Actually, this is not true at least currently, because set_append_rel_size
> doesn't do anything about the costing:

Why would it?  Append can't project, so the cost of any expressions
that appear in its target list is irrelevant.  What is affected is the
cost of the scans below the Append -- see e.g. cost_seqscan(), which
uses the data produced by set_pathtarget_cost_width().

> Some createplan.c routines already change the tlists of their nodes. For
> example, create_merge_append_plan would add sort columns to each of its
> subplans if necessary.  I think it would be similar to that to add a
> ConvertRowtypeExpr above a child whole-row Var in the subplan's tlist.

You have a point, but I think that code is actually not a very good
idea, and I'd like to see us do less of that sort of thing, not more.
Any case in which we change the plan while creating it has many of the
same problems that I discussed in the previous email.  For example,
create_merge_append_path() has to know that a Sort node might get
inserted and set the costing accordingly.  If the callers guaranteed
that the necessary sort path had already been inserted, then we
wouldn't need that special handling.

Also, that code is adding additional columns, computed from the
columns we have available, so that we can sort.  Those extra columns
then get discarded at the next level of the Plan tree.  What you're
trying to do is different.  Perhaps this is too harsh a judgement, but
it looks to me like you're using a deliberately-wrong representation
of the value that you ultimately want to produce and then patching it
up after the fact.  That seems quite a bit worse than what the
existing code is doing.

> I reviewed his patch, and thought that that would fix the issue, but this is
> about the current design on the child tlist handling in partitionise join
> rather than that patch: it deviates from the assumption that we had for the
> scan/join planning till PG10 that "a rel's targetlist would only include
> Vars and PlaceHolderVars", and turns out that there are a lot of places
> where we need to modify code to suppress that deviation, as partly shown in
> that patch.  So, ISTM that the current design in itself is not that
> localized.

I used to think that was the assumption, too, but it seems to me that
Tom told me a while back that it was never really true, and that
assumption was in my head.  Unfortunately, I don't have a link to that
email handy.  Either way, I think the solution is to get the tlist
right from the start and cope with the consequences downstream, not
start with the wrong thing and try to fix it later.  To see an example
of why I think that's a valuable approach, see
11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5, especially the changes in
the regression test outputs.  The old code discovered after it had
generated an Append that it had the wrong tlist, but since the Append
can't project, it had to add a Result node.  With the new code, we get
the children of the Append to produce the right output from the start,
and then the Append just needs to concatenate all that output, so no
Result node is needed.  As noted in the commit message, it also made
the costing more accurate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


(2018/07/24 3:09), Robert Haas wrote:
> On Mon, Jul 23, 2018 at 3:49 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> I have to admit that that is not a good idea.  So, I'll update the patch so
>> that we don't assume the projection capability of the subplan anymore, if we
>> go this way.
>
> Isn't that assumption fundamental to your whole approach?

I don't think so.  What I mean here is: currently the subplan would be a 
scan/join node, but in future we might have eg, a Sort node atop the 
scan/join node, so it would be better to update the patch to handle such 
a case as well.

>>> Also, even today, this would fail if the subplan happened to be
>>> a Sort, and it's not very obvious why that couldn't happen.
>>
>> You mean the MergeAppend case?  In that case, the patch will adjust the
>> subplan before adding a Sort above the subplan, so that could not happen.
>
> That would only help if create_merge_append_plan() itself inserted a
> Sort node.  It wouldn't help if the Sort node came from a child path
> manufactured by create_sort_path().

As I mentioned above, I think we could handle such a case as well.

>>> I think that's a bad idea.  The target list affects lots
>>> of things, like costing.  If we don't insert a ConvertRowTypeExpr into
>>> the child's target list, the costing will be wrong to the extent that
>>> ConvertRowTypeExpr has any cost, which it does.
>>
>> Actually, this is not true at least currently, because set_append_rel_size
>> doesn't do anything about the costing:
>
> Why would it?  Append can't project, so the cost of any expressions
> that appear in its target list is irrelevant.  What is affected is the
> cost of the scans below the Append -- see e.g. cost_seqscan(), which
> uses the data produced by set_pathtarget_cost_width().

By set_rel_size()?

>> Some createplan.c routines already change the tlists of their nodes. For
>> example, create_merge_append_plan would add sort columns to each of its
>> subplans if necessary.  I think it would be similar to that to add a
>> ConvertRowtypeExpr above a child whole-row Var in the subplan's tlist.
>
> You have a point, but I think that code is actually not a very good
> idea, and I'd like to see us do less of that sort of thing, not more.
> Any case in which we change the plan while creating it has many of the
> same problems that I discussed in the previous email.  For example,
> create_merge_append_path() has to know that a Sort node might get
> inserted and set the costing accordingly.  If the callers guaranteed
> that the necessary sort path had already been inserted, then we
> wouldn't need that special handling.

I'm not sure that's a good idea, because I think we have a trade-off 
relation; the more we make create_plan simple, the more we need to make 
earlier states of the planner complicated.

And it looks to me like the partitionwise join code is making earlier 
(and later) stages of the planner too complicated, to make create_plan 
simple.

> Also, that code is adding additional columns, computed from the
> columns we have available, so that we can sort.  Those extra columns
> then get discarded at the next level of the Plan tree.  What you're
> trying to do is different.  Perhaps this is too harsh a judgement, but
> it looks to me like you're using a deliberately-wrong representation
> of the value that you ultimately want to produce and then patching it
> up after the fact.

When considering paritionwise joining, it would make things complicated 
to have a ConvertRowtypeExpr in a partrel's targetlist, because as 
discussed upthread, it deviates from the planner's assumption that a 
rel's targetlist would only include Vars and PHVs.  So, I propose to 
include a child whole-row Var in the targetlist instead, in which case, 
we need to fix the Var after the fact, but can avoid making many other 
parts of the planner complicated.

> That seems quite a bit worse than what the
> existing code is doing.

One idea to address that concern would be to adjust the pathtarget of 
each path created in generate_partitionwise_join_paths accordingly.  I'm 
missing something, though.

>> I reviewed his patch, and thought that that would fix the issue, but this is
>> about the current design on the child tlist handling in partitionise join
>> rather than that patch: it deviates from the assumption that we had for the
>> scan/join planning till PG10 that "a rel's targetlist would only include
>> Vars and PlaceHolderVars", and turns out that there are a lot of places
>> where we need to modify code to suppress that deviation, as partly shown in
>> that patch.  So, ISTM that the current design in itself is not that
>> localized.
>
> I used to think that was the assumption, too, but it seems to me that
> Tom told me a while back that it was never really true, and that
> assumption was in my head.  Unfortunately, I don't have a link to that
> email handy.  Either way, I think the solution is to get the tlist
> right from the start and cope with the consequences downstream, not
> start with the wrong thing and try to fix it later.

I'm not fully convinced that's really the right solution, but I think 
you know better about partitioning (as well as many other things) than 
me, so I think it's better for you to decide what to do for this issue.

Thanks for working on this!

Best regards,
Etsuro Fujita


On Tue, Jul 24, 2018 at 7:51 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
>> Isn't that assumption fundamental to your whole approach?
>
> I don't think so.  What I mean here is: currently the subplan would be a
> scan/join node, but in future we might have eg, a Sort node atop the
> scan/join node, so it would be better to update the patch to handle such a
> case as well.

But how would you do that?

>>>> I think that's a bad idea.  The target list affects lots
>>>> of things, like costing.  If we don't insert a ConvertRowTypeExpr into
>>>> the child's target list, the costing will be wrong to the extent that
>>>> ConvertRowTypeExpr has any cost, which it does.
>>>
>>>
>>> Actually, this is not true at least currently, because
>>> set_append_rel_size
>>> doesn't do anything about the costing:
>>
>>
>> Why would it?  Append can't project, so the cost of any expressions
>> that appear in its target list is irrelevant.  What is affected is the
>> cost of the scans below the Append -- see e.g. cost_seqscan(), which
>> uses the data produced by set_pathtarget_cost_width().
>
> By set_rel_size()?

Sorry, I don't understand what you mean by this.

> I'm not sure that's a good idea, because I think we have a trade-off
> relation; the more we make create_plan simple, the more we need to make
> earlier states of the planner complicated.
>
> And it looks to me like the partitionwise join code is making earlier (and
> later) stages of the planner too complicated, to make create_plan simple.

I think that create_plan is *supposed* to be simple.  Its purpose is
to prune away data that's only needed during planning and add things
that can be computed at the last minute which are needed at execution
time.  Making it do anything else is, in my opinion, not good.

> When considering paritionwise joining, it would make things complicated to
> have a ConvertRowtypeExpr in a partrel's targetlist, because as discussed
> upthread, it deviates from the planner's assumption that a rel's targetlist
> would only include Vars and PHVs.  So, I propose to include a child
> whole-row Var in the targetlist instead, in which case, we need to fix the
> Var after the fact, but can avoid making many other parts of the planner
> complicated.

Well, I could have the wrong idea here, but I tend to think allowing
for ConvertRowTypeExpr elsewhere won't be that bad.

Does anyone else want to weigh in on this issue?  Tom, perhaps?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jul 24, 2018 at 7:51 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> I'm not sure that's a good idea, because I think we have a trade-off
>> relation; the more we make create_plan simple, the more we need to make
>> earlier states of the planner complicated.
>> 
>> And it looks to me like the partitionwise join code is making earlier (and
>> later) stages of the planner too complicated, to make create_plan simple.

> I think that create_plan is *supposed* to be simple.  Its purpose is
> to prune away data that's only needed during planning and add things
> that can be computed at the last minute which are needed at execution
> time.  Making it do anything else is, in my opinion, not good.

I tend to agree with Robert on this.  In particular, if you put anything
into create_plan or later that affects cost estimates, you're really doing
it wrong, because changing those estimates might've changed the plan
entirely.  (As I recall, we have a couple of cheats like that now,
associated with dropping no-op projection and subquery scan nodes.
But let's not add more.)

TBH, I think this entire discussion is proving what sheer folly it was
to allow partitions to have rowtypes not identical to their parent.
But I suppose we're stuck with that now.

            regards, tom lane


On Thu, Jul 26, 2018 at 2:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Jul 24, 2018 at 7:51 AM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp> wrote:
>>> I'm not sure that's a good idea, because I think we have a trade-off
>>> relation; the more we make create_plan simple, the more we need to make
>>> earlier states of the planner complicated.
>>>
>>> And it looks to me like the partitionwise join code is making earlier (and
>>> later) stages of the planner too complicated, to make create_plan simple.
>
>> I think that create_plan is *supposed* to be simple.  Its purpose is
>> to prune away data that's only needed during planning and add things
>> that can be computed at the last minute which are needed at execution
>> time.  Making it do anything else is, in my opinion, not good.
>
> I tend to agree with Robert on this.  In particular, if you put anything
> into create_plan or later that affects cost estimates, you're really doing
> it wrong, because changing those estimates might've changed the plan
> entirely.  (As I recall, we have a couple of cheats like that now,
> associated with dropping no-op projection and subquery scan nodes.
> But let's not add more.)

+1

>
> TBH, I think this entire discussion is proving what sheer folly it was
> to allow partitions to have rowtypes not identical to their parent.
> But I suppose we're stuck with that now.
>

Every table created has a different rowtype and when the partition's
rowtype is different from that of the partitioned table, we add a
ConvertRowtypeExpr node on top of the whole-row expression.

2153                     if (appinfo->parent_reltype != appinfo->child_reltype)
2154                     {
2155                         ConvertRowtypeExpr *r =
makeNode(ConvertRowtypeExpr);

We may be able to set rowtype of a partition to be same as the rowtype
of a partitioned table when a partition table is created using an
ALTER TABLE syntax. But are you suggesting that we change the reltype
of a partition being attached? What happens when we detach a partition
and let the table live independent of the partitioned table? Do we
create a new reltype?

I am not even going into the problem when we try to attach a partition
with different tuple layout. I do believe that such usecases arise and
rewriting the table being attached is a viable option.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


(2018/07/26 5:27), Robert Haas wrote:
> On Tue, Jul 24, 2018 at 7:51 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>>> Isn't that assumption fundamental to your whole approach?
>>
>> I don't think so.  What I mean here is: currently the subplan would be a
>> scan/join node, but in future we might have eg, a Sort node atop the
>> scan/join node, so it would be better to update the patch to handle such a
>> case as well.
>
> But how would you do that?

What I had in mind was to insert a Rusult node with 
inject_projection_plan and adjust the tlist of the Result, as done for 
adding sort columns to a tlist in prepare_sort_from_pathkeys.

>>>>> I think that's a bad idea.  The target list affects lots
>>>>> of things, like costing.  If we don't insert a ConvertRowTypeExpr into
>>>>> the child's target list, the costing will be wrong to the extent that
>>>>> ConvertRowTypeExpr has any cost, which it does.
>>>>
>>>>
>>>> Actually, this is not true at least currently, because
>>>> set_append_rel_size
>>>> doesn't do anything about the costing:
>>>
>>>
>>> Why would it?  Append can't project, so the cost of any expressions
>>> that appear in its target list is irrelevant.  What is affected is the
>>> cost of the scans below the Append -- see e.g. cost_seqscan(), which
>>> uses the data produced by set_pathtarget_cost_width().
>>
>> By set_rel_size()?
>
> Sorry, I don't understand what you mean by this.

I think the data used by such a costing function is computed by 
set_rel_size in set_append_rel_size, not set_pathtarget_cost_width; in 
the case of a plain partition, for example, set_rel_size would call 
set_plain_rel_size, and then set_plain_rel_size would eventually call 
set_rel_width, which sets reltarget->cost, which I think would be used 
by e.g., cost_seqscan.  cost_qual_eval_node, which is called in 
set_rel_width for computing the cost of ConvertRowTypeExpr, ignores that 
expression, so currently, we don't charge any cost for it to the 
partition's reltarget->cost, and to the cost of a scan below the Append.

>> I'm not sure that's a good idea, because I think we have a trade-off
>> relation; the more we make create_plan simple, the more we need to make
>> earlier states of the planner complicated.
>>
>> And it looks to me like the partitionwise join code is making earlier (and
>> later) stages of the planner too complicated, to make create_plan simple.
>
> I think that create_plan is *supposed* to be simple.  Its purpose is
> to prune away data that's only needed during planning and add things
> that can be computed at the last minute which are needed at execution
> time.  Making it do anything else is, in my opinion, not good.

I agree on that point.

>> When considering paritionwise joining, it would make things complicated to
>> have a ConvertRowtypeExpr in a partrel's targetlist, because as discussed
>> upthread, it deviates from the planner's assumption that a rel's targetlist
>> would only include Vars and PHVs.  So, I propose to include a child
>> whole-row Var in the targetlist instead, in which case, we need to fix the
>> Var after the fact, but can avoid making many other parts of the planner
>> complicated.
>
> Well, I could have the wrong idea here, but I tend to think allowing
> for ConvertRowTypeExpr elsewhere won't be that bad.

I still don't like that because in my opinion, changes needed for that 
would not be localized, and that would make code complicated more than 
necessary.

As I mentioned in a previous email, another idea to avoid that would be 
to adjust tlists for children at path creation time, not plan creation 
time; we could adjust the tlist for each of subpaths accumulated for an 
Append/MergeAppend path in add_paths_to_append_rel called from 
set_append_rel_pathlist or generate_partitionwise_join_paths, with 
create_projection_path adding ConvertRowTypeExpr.  It seems unlikely 
that performing create_projection_path to such a subpath would change 
its property of being the cheapest, so it would be safe to adjust the 
tlists that way.  This would not require making create_plan complicated 
anymore.  I might be missing something, though.

Best regards,
Etsuro Fujita


Hi,

On 2018-07-20 08:38:09 -0400, Robert Haas wrote:
> I'm going to study this some more now, but I really think this is
> going in the wrong direction.

We're going to have to get somewhere on this topic soon. This thread has
been open for nearly half a year, and we're late in the beta phase now.
What can we do to get to an agreement soon?

Greetings,

Andres Freund


(2018/07/31 4:06), Andres Freund wrote:
> On 2018-07-20 08:38:09 -0400, Robert Haas wrote:
>> I'm going to study this some more now, but I really think this is
>> going in the wrong direction.
>
> We're going to have to get somewhere on this topic soon. This thread has
> been open for nearly half a year, and we're late in the beta phase now.
> What can we do to get to an agreement soon?

I'd like to propose an updated patch as proposed in [1].

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5B59BA55.30200%40lab.ntt.co.jp


On 2018-Jul-31, Etsuro Fujita wrote:

> (2018/07/31 4:06), Andres Freund wrote:
> > On 2018-07-20 08:38:09 -0400, Robert Haas wrote:
> > > I'm going to study this some more now, but I really think this is
> > > going in the wrong direction.
> > 
> > We're going to have to get somewhere on this topic soon. This thread has
> > been open for nearly half a year, and we're late in the beta phase now.
> > What can we do to get to an agreement soon?
> 
> I'd like to propose an updated patch as proposed in [1].

But there is no patch there, and neither there is agreement from the
other party that the approach described there is sound.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


(2018/07/26 21:11), Etsuro Fujita wrote:
> (2018/07/26 5:27), Robert Haas wrote:
>> Well, I could have the wrong idea here, but I tend to think allowing
>> for ConvertRowTypeExpr elsewhere won't be that bad.
>
> I still don't like that because in my opinion, changes needed for that
> would not be localized, and that would make code complicated more than
> necessary.
>
> As I mentioned in a previous email, another idea to avoid that would be
> to adjust tlists for children at path creation time, not plan creation
> time; we could adjust the tlist for each of subpaths accumulated for an
> Append/MergeAppend path in add_paths_to_append_rel called from
> set_append_rel_pathlist or generate_partitionwise_join_paths, with
> create_projection_path adding ConvertRowTypeExpr. It seems unlikely that
> performing create_projection_path to such a subpath would change its
> property of being the cheapest, so it would be safe to adjust the tlists
> that way. This would not require making create_plan complicated anymore.
> I might be missing something, though.

I updated the patch that way.  Updated patch attached.  I fixed a bug 
and did a bit of cleanups as well.

Best regards,
Etsuro Fujita

Attachment
(2018/08/01 5:31), Alvaro Herrera wrote:
> On 2018-Jul-31, Etsuro Fujita wrote:
>> (2018/07/31 4:06), Andres Freund wrote:
>>> On 2018-07-20 08:38:09 -0400, Robert Haas wrote:
>>>> I'm going to study this some more now, but I really think this is
>>>> going in the wrong direction.
>>>
>>> We're going to have to get somewhere on this topic soon. This thread has
>>> been open for nearly half a year, and we're late in the beta phase now.
>>> What can we do to get to an agreement soon?
>>
>> I'd like to propose an updated patch as proposed in [1].
>
> But there is no patch there, and neither there is agreement from the
> other party that the approach described there is sound.

I posted the updated patch [1].

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5B619D23.4010000%40lab.ntt.co.jp


On Wed, Aug 1, 2018 at 7:46 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> I posted the updated patch [1].

Etsuro-san:

I really think we should just go with what Ashutosh had proposed.
Tom, Ashutosh, and I all seem to agree that we shouldn't try to
re-jigger things at create-plan time.  I think that a 3-1 consensus
against your proposal is sufficient to say we shouldn't go that way.

Now, here you've got a new approach which avoids that, which I have
not yet reviewed.  I'll try to spend some time on it this afternoon,
but really, I think it's too late for this.  This bug was reported in
February, and we're supposed to be pushing 11 final out the door in
not much more than a month.  Proposing a new approach in August is not
good.  Can't we just do what Ashutosh proposed for now and revisit
this for v12?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Wed, Aug 1, 2018 at 7:44 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> I updated the patch that way.  Updated patch attached.  I fixed a bug and
> did a bit of cleanups as well.

Looking this over from a technical point of view, I think it's better
than what you proposed before but I still don't agree with the
approach.  I don't think there is any getting around the fact that
converted whole-row vars are going to result in some warts.  Ashutosh
inserted most of the necessary warts in the original partition-wise
join patch, but missed some cases in postgres_fdw; his approach is,
basically, to add the matching warts there.  Your approach is to rip
out all of those warts and insert different ones.  You've simplified
build_tlist_index_other_vars() and add_placeholders_to_joinrel() and
build_joinrel_tlist() to basically what they were before
partition-wise join went in.  On the other hand, RelOptInfo grows
three new fields, set_append_rel_size() ends up more complicated than
it was before when you include the node code added in the
build_childrel_tlist function it calls, build_child_joinrel() has a
different set of complications, and most importantly
create_append_path() and create_merge_append_path() now do surgery on
their input paths that knows specifically about the
converted-whole-row var case.  I would be glad to be rid of the stuff
that you're ripping out, but in terms of complexity, I don't think we
really come out ahead with the stuff you're adding.

I'm also still concerned about the design.  In your old approach, you
were creating the paths with with the wrong target list and then
fixing it when you turned the paths into a plan.  In your new
approach, you're still creating the paths with the wrong target list,
but now you're fixing it when you put an Append or MergeAppend path on
top of them.  I still think it's a bad idea to have anything other
than the correct paths in the target list from the beginning.  For one
thing, what happens if no Append or MergeAppend is ever put on top of
them?  One way that can happen today is partition-wise aggregate,
although I haven't been able to demonstrate a bug, so maybe that's
covered somehow.  But in general, with your approach, any code that
looks at the tlist for a child-join has to know that the tlist is to
be used as-is *except that* ConvertRowtypeExpr may need to be
inserted.  I think that special case could be easy to overlook, and it
seems pretty arbitrary.  A tlist is a list of arbitrary expressions to
be produced; with this patch, we'd end up with the tlist being the
list of expressions to be produced in every case *except for*
child-joins containing whole-row-vars.  I just can't get behind a
strange exception like that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


(2018/08/02 2:44), Robert Haas wrote:
> Tom, Ashutosh, and I all seem to agree that we shouldn't try to
> re-jigger things at create-plan time.  I think that a 3-1 consensus
> against your proposal is sufficient to say we shouldn't go that way.

Agreed.

> Now, here you've got a new approach which avoids that, which I have
> not yet reviewed.  I'll try to spend some time on it this afternoon,
> but really, I think it's too late for this.  This bug was reported in
> February, and we're supposed to be pushing 11 final out the door in
> not much more than a month.  Proposing a new approach in August is not
> good.

Agreed.

> Can't we just do what Ashutosh proposed for now and revisit
> this for v12?

I think that may be possible.

Best regards,
Etsuro Fujita


(2018/08/02 4:30), Robert Haas wrote:
> On Wed, Aug 1, 2018 at 7:44 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> I updated the patch that way.  Updated patch attached.  I fixed a bug and
>> did a bit of cleanups as well.
>
> Looking this over from a technical point of view, I think it's better
> than what you proposed before but I still don't agree with the
> approach.  I don't think there is any getting around the fact that
> converted whole-row vars are going to result in some warts.  Ashutosh
> inserted most of the necessary warts in the original partition-wise
> join patch, but missed some cases in postgres_fdw; his approach is,
> basically, to add the matching warts there.  Your approach is to rip
> out all of those warts and insert different ones.  You've simplified
> build_tlist_index_other_vars() and add_placeholders_to_joinrel() and
> build_joinrel_tlist() to basically what they were before
> partition-wise join went in.  On the other hand, RelOptInfo grows
> three new fields,

Sorry, I forgot to remove one of the fields that isn't needed anymore. 
RelOptInfo needs two new fields, in the new approach.

> set_append_rel_size() ends up more complicated than
> it was before when you include the node code added in the
> build_childrel_tlist function it calls, build_child_joinrel() has a
> different set of complications, and most importantly
> create_append_path() and create_merge_append_path() now do surgery on
> their input paths that knows specifically about the
> converted-whole-row var case.  I would be glad to be rid of the stuff
> that you're ripping out, but in terms of complexity, I don't think we
> really come out ahead with the stuff you're adding.

The new approach seems to me more localized than what Ashutosh had 
proposed.  One obvious advantage of the new approach is that we no 
longer need changes to postgres_fdw for it to work for partitionwise 
joins, which would apply for any other FDWs, making the FDW authors' 
life easy.

> I'm also still concerned about the design.  In your old approach, you
> were creating the paths with with the wrong target list and then
> fixing it when you turned the paths into a plan.  In your new
> approach, you're still creating the paths with the wrong target list,
> but now you're fixing it when you put an Append or MergeAppend path on
> top of them.  I still think it's a bad idea to have anything other
> than the correct paths in the target list from the beginning.

I don't think so; actually, the child's targetlist would not have to be 
the same as the parent's until the child gets in under the parent's 
Append or MergeAppend.  So, I modified the child's target list so as to 
make it easy to join the child relations.

> For one
> thing, what happens if no Append or MergeAppend is ever put on top of
> them?  One way that can happen today is partition-wise aggregate,
> although I haven't been able to demonstrate a bug, so maybe that's
> covered somehow.

Right.  In the new approach, the targetlist for the topmost child 
scan/join is guaranteed to be the same as that for the topmost parent 
scan/join by the planner work that I added.

> But in general, with your approach, any code that
> looks at the tlist for a child-join has to know that the tlist is to
> be used as-is *except that* ConvertRowtypeExpr may need to be
> inserted.  I think that special case could be easy to overlook, and it
> seems pretty arbitrary.

I think we can check to see whether the conversion is needed, from the 
flags added to RelOptInfo.

> A tlist is a list of arbitrary expressions to
> be produced; with this patch, we'd end up with the tlist being the
> list of expressions to be produced in every case *except for*
> child-joins containing whole-row-vars.  I just can't get behind a
> strange exception like that.

I have to admit that it's weird to adjust the child's targetlist in that 
case when putting the child under the parent's Append/MergeAppend, but 
IMHO I think that would be much localized.

Thanks for the comments!

Best regards,
Etsuro Fujita


On Thu, Aug 2, 2018 at 7:20 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> The new approach seems to me more localized than what Ashutosh had proposed.
> One obvious advantage of the new approach is that we no longer need changes
> to postgres_fdw for it to work for partitionwise joins, which would apply
> for any other FDWs, making the FDW authors' life easy.

Well, I don't know what to say here expect that I don't agree.  I
think Ashutosh's approach is a pretty natural extension of the system
that we have now.  It involves needing to handle converted whole row
vars in some new places, most of which were handled in the original
patch, but postgres_fdw was missed.  Your approach involves changing
the meaning of the target list, but only in narrow corner cases.  I
completely disagree that we can say it's less invasive.  It may indeed
be less work for extension authors, though, though perhaps at the
expense of moving the conversion from the remote server to the local
one.

>> But in general, with your approach, any code that
>> looks at the tlist for a child-join has to know that the tlist is to
>> be used as-is *except that* ConvertRowtypeExpr may need to be
>> inserted.  I think that special case could be easy to overlook, and it
>> seems pretty arbitrary.
>
> I think we can check to see whether the conversion is needed, from the flags
> added to RelOptInfo.

Sure, I don't dispute that it can be made to work.  I just think it's
an ugly kludge.

> I have to admit that it's weird to adjust the child's targetlist in that
> case when putting the child under the parent's Append/MergeAppend, but IMHO
> I think that would be much localized.

Yeah, I just don't agree at all.

Does anyone else want to weigh in on this?  It seems to me that there
are several people who are quite willing to complain about the fact
that this hasn't been fixed, but not willing to express an opinion
about the shape of the fix.  Either the RMT needs to take executive
action, or we need some more votes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Hi,

On 2018-08-02 15:19:49 -0400, Robert Haas wrote:
> Does anyone else want to weigh in on this?  It seems to me that there
> are several people who are quite willing to complain about the fact
> that this hasn't been fixed, but not willing to express an opinion
> about the shape of the fix.  Either the RMT needs to take executive
> action, or we need some more votes.

(taking my RMT hat off, as it's not been coordinated)

Well, I think it's enough in the weeds that not too many people are
going to have an informed opinion on this, and I'm not sure uninformed
opinions help. But I do think the fact that one of the authors and the
committer are on one side, and Tom seems to tentatively agree, weighs
quite a bit here.  As far as I can tell, it's "just" Etsuro on the other
side - obviously his opinion counts, but it doesn't outweigh others.  I
think if you feel confident, you should be ok just commit the approach
you favor, and I read you as being confident that that's the better
approach.

Greetings,

Andres Freund


Robert Haas <robertmhaas@gmail.com> writes:
> Does anyone else want to weigh in on this?  It seems to me that there
> are several people who are quite willing to complain about the fact
> that this hasn't been fixed, but not willing to express an opinion
> about the shape of the fix.  Either the RMT needs to take executive
> action, or we need some more votes.

[ reads thread ... ]

Well, my vote is that I've got zero faith in either of these patches.

I do not trust Ashutosh's patch because of the point you noted that it
will kick in on ConvertRowtypeExprs that are not related to partitioning.
It's also got a rather fundamental conceptual (or at least documentation)
error in that it says it's making pull_var_clause do certain things to
all ConvertRowtypeExprs, but that's not what the code actually does.
I think the tension around that has a lot to do with the fact that the
patch went through so many revisions, and I don't have any faith that
it's right even yet.  As you mentioned upthread, this might be insoluble
without some representational change for converted whole-row Vars.

As for Etsuro-san's patch, I don't like that for the same reasons you
gave.  Also, I note that back towards the beginning of July it was
said that that patch depends on the assumption of no Gather below
an Append.  That's broken *now*, not in some hypothetical future.
(See the nearby "FailedAssertion on partprune" thread, wherein we're
trying to fix the planner's handling of exactly such plans.)

What I'm thinking might be a more appropriate thing, at least for
getting v11 out the door, is to refuse to generate partitionwise
joins when any whole-row vars are involved.  (Perhaps that's not
enough to dodge all the problems, though?)

Now, that's a bit of a problem for postgres_fdw, because it seems to
insist on injecting WRVs even when the query text does not require any.
Why is that, and can't we get rid of it?  It's horrid for performance
even without any of these other considerations.  But if we fail to get
rid of that in time for v11, it would mean that postgres_fdw can't
participate in PWJ, which isn't great but I think we could live with it
for now.

BTW, what exactly do we think the production status of PWJ is, anyway?
I notice that upthread it was stated that enable_partitionwise_join
defaults to off, as indeed it still does, and we'd turn it on later
when we'd gotten rid of some memory-hogging problems.  If that hasn't
happened yet (and I don't see any open item about considering enabling
PWJ by default for v11), then I have exactly no hesitation about
lobotomizing PWJ as hard as we need to to mask this problem for v11.
I'd certainly prefer a solution along those lines to either of these
patches.

            regards, tom lane


(2018/08/03 5:25), Tom Lane wrote:
> What I'm thinking might be a more appropriate thing, at least for
> getting v11 out the door, is to refuse to generate partitionwise
> joins when any whole-row vars are involved.

Agreed.

> (Perhaps that's not
> enough to dodge all the problems, though?)
>
> Now, that's a bit of a problem for postgres_fdw, because it seems to
> insist on injecting WRVs even when the query text does not require any.
> Why is that, and can't we get rid of it?  It's horrid for performance
> even without any of these other considerations.  But if we fail to get
> rid of that in time for v11, it would mean that postgres_fdw can't
> participate in PWJ, which isn't great but I think we could live with it
> for now.

Sorry, I don't understand this.  Could you elaborate on that a bit more?

> BTW, what exactly do we think the production status of PWJ is, anyway?
> I notice that upthread it was stated that enable_partitionwise_join
> defaults to off, as indeed it still does, and we'd turn it on later
> when we'd gotten rid of some memory-hogging problems.  If that hasn't
> happened yet (and I don't see any open item about considering enabling
> PWJ by default for v11), then I have exactly no hesitation about
> lobotomizing PWJ as hard as we need to to mask this problem for v11.

That hasn't happened yet; I think we left that for PG12, IIRC.

Here is a patch for refusing to generate PWJ paths when WRVs are involved:

* We no longer need to handle WRVs, so I've simplified 
build_joinrel_tlist() and setrefs.c to what they were before 
partition-wise join went in, as in the previous patch.

* attr_needed data for each child is used for building child-joins' 
tlists, but I think we can build those by applying 
adjust_appendrel_attrs to the parent's tlists, without attr_needed.  So, 
I've also removed that as in the previous patch.

Maybe I'm missing something, though.

Best regards,
Etsuro Fujita

Attachment
(2018/08/03 22:18), Etsuro Fujita wrote:
> Here is a patch for refusing to generate PWJ paths when WRVs are involved:
>
> * We no longer need to handle WRVs, so I've simplified
> build_joinrel_tlist() and setrefs.c to what they were before
> partition-wise join went in, as in the previous patch.
>
> * attr_needed data for each child is used for building child-joins'
> tlists, but I think we can build those by applying
> adjust_appendrel_attrs to the parent's tlists, without attr_needed. So,
> I've also removed that as in the previous patch.

One thing to add: as for the latter, we don't need the changes to 
placeholder.c either, so I've also simplified that to what they were 
before partition-wise join went in.

Best regards,
Etsuro Fujita


On Thu, Aug 2, 2018 at 4:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I do not trust Ashutosh's patch because of the point you noted that it
> will kick in on ConvertRowtypeExprs that are not related to partitioning.

I don't remember making that point -- can you clarify?

> It's also got a rather fundamental conceptual (or at least documentation)
> error in that it says it's making pull_var_clause do certain things to
> all ConvertRowtypeExprs, but that's not what the code actually does.
> I think the tension around that has a lot to do with the fact that the
> patch went through so many revisions, and I don't have any faith that
> it's right even yet.  As you mentioned upthread, this might be insoluble
> without some representational change for converted whole-row Vars.

Insoluble is a strong word....

> What I'm thinking might be a more appropriate thing, at least for
> getting v11 out the door, is to refuse to generate partitionwise
> joins when any whole-row vars are involved.  (Perhaps that's not
> enough to dodge all the problems, though?)

It's enough to dodge the problem being discussed on this thread.

> Now, that's a bit of a problem for postgres_fdw, because it seems to
> insist on injecting WRVs even when the query text does not require any.
> Why is that, and can't we get rid of it?  It's horrid for performance
> even without any of these other considerations.  But if we fail to get
> rid of that in time for v11, it would mean that postgres_fdw can't
> participate in PWJ, which isn't great but I think we could live with it
> for now.

I don't quite know what you mean here -- postgres_fdw does use
whole-row vars for EPQ handling, which may be what you're thinking
about.

> BTW, what exactly do we think the production status of PWJ is, anyway?
> I notice that upthread it was stated that enable_partitionwise_join
> defaults to off, as indeed it still does, and we'd turn it on later
> when we'd gotten rid of some memory-hogging problems.  If that hasn't
> happened yet (and I don't see any open item about considering enabling
> PWJ by default for v11), then I have exactly no hesitation about
> lobotomizing PWJ as hard as we need to to mask this problem for v11.
> I'd certainly prefer a solution along those lines to either of these
> patches.

Yeah, that hasn't been done yet.  Partition-wise join probably needs a
good bit of work in a number of areas to do all the things that people
will want it to do -- see the thread on advanced partition-matching,
which is an improvement but still doesn't cover every case that could
come up.  In terms of memory usage, I think that we need some
discussion of the approach that should be taken.  I see a couple of
possible things we could do, not necessarily mutually exclusive.

1. We could try to avoid translating all of the parent's data
structures for every child RelOptInfo, either by rejiggering things so
that the child doesn't need all of that data, or by making it able to
use the parent's copy of the data.

2. We could try to recycle memory more quickly.  For example, if we're
planning a partition-wise join of A-B-C-D, first plan paths for
A1-B1-C1-D1 (and all proper subsets of those rels), then throw away
most of the memory, then plan paths for A2-B2-C2-D2, etc.

3. We could generate paths for on "representative" children (e.g. the
biggest ones) and use that to estimate the cost of the partition-wise
plan.  If that plan is selected, then go back and figure out real
paths for the other children.

All of these things help in different ways, and Ashutosh had code for
some version of all of them at various points, but the problem is
quite difficult.  If you have three tables with 1000 partitions each,
then without partition-wise join you need 3000 (partitions) + 3
(baserels) + 3 (level-2 joinrels) + 1 (level-3 joinrel) RelOptInfos.
If you do a partition-wise join, then you get 3000 level-2
child-joinrels and 1000 level-3 child joinrels.  Those RelOptInfo
structures, and the paths attached to them, cannot help but take up
memory.  Perhaps that's OK, and we ought to just say "well, if you
want better plans, you're going to have to allow more memory for
planning".  If not, we have to decide how hard we want to work in
which areas and how good the result needs to be in order to have this
turned on by difficult.

Honestly, I'm pretty impressed that we have added not one but two
members to the RelOptKind enum without as little collateral damage as
there has been.  This issue is annoying and the discussion has gone on
longer than anyone would probably prefer, but it's really pretty minor
in the grand scheme of things, at least IMHO.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Aug 2, 2018 at 4:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Now, that's a bit of a problem for postgres_fdw, because it seems to
>> insist on injecting WRVs even when the query text does not require any.
>> Why is that, and can't we get rid of it?

> I don't quite know what you mean here -- postgres_fdw does use
> whole-row vars for EPQ handling, which may be what you're thinking
> about.

As far as I can see from the example that started this thread,
postgres_fdw injects WRVs into a PWJ whether or not the query involves
FOR UPDATE; that's why this bug is reproducible in a query without FOR
UPDATE.  But we shouldn't need any EPQ support in that case.

> Honestly, I'm pretty impressed that we have added not one but two
> members to the RelOptKind enum without as little collateral damage as
> there has been.

Color me a bit more skeptical about the bug density in that, given
that enable_partitionwise_join is off by default; that means you're
not getting a lot of testing.

            regards, tom lane


On Fri, Aug 3, 2018 at 10:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Aug 2, 2018 at 4:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Now, that's a bit of a problem for postgres_fdw, because it seems to
>>> insist on injecting WRVs even when the query text does not require any.
>>> Why is that, and can't we get rid of it?
>
>> I don't quite know what you mean here -- postgres_fdw does use
>> whole-row vars for EPQ handling, which may be what you're thinking
>> about.
>
> As far as I can see from the example that started this thread,
> postgres_fdw injects WRVs into a PWJ whether or not the query involves
> FOR UPDATE; that's why this bug is reproducible in a query without FOR
> UPDATE.  But we shouldn't need any EPQ support in that case.

I might be missing something, but the original reproducer involves a
FOR UPDATE clause, and the expected regression test output in
postgres_fdw.out appears to show a whole-row var in the foreign scan's
target list only in those examples where a locking clause is present.

>> Honestly, I'm pretty impressed that we have added not one but two
>> members to the RelOptKind enum without as little collateral damage as
>> there has been.
>
> Color me a bit more skeptical about the bug density in that, given
> that enable_partitionwise_join is off by default; that means you're
> not getting a lot of testing.

You're hard to please.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Aug 3, 2018 at 10:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> As far as I can see from the example that started this thread,
>> postgres_fdw injects WRVs into a PWJ whether or not the query involves
>> FOR UPDATE; that's why this bug is reproducible in a query without FOR
>> UPDATE.  But we shouldn't need any EPQ support in that case.

> I might be missing something, but the original reproducer involves a
> FOR UPDATE clause, and the expected regression test output in
> postgres_fdw.out appears to show a whole-row var in the foreign scan's
> target list only in those examples where a locking clause is present.

Oh, sigh, never mind that --- I was thinking that only the first of
the two example queries involved FOR UPDATE, but they both do.
So no mystery there after all.

Anyway, what I'm basically suggesting is that we just disable support for
PWJ in the problematic cases in v11.  As long as PWJ isn't even on by
default, that's not much of a loss.  Obviously we'll want to fix it in the
future, but the hour grows late for v11, and I think either of these
patches would need quite a bit more work to be committable.

BTW, I forgot to respond to this:

>> I do not trust Ashutosh's patch because of the point you noted that it
>> will kick in on ConvertRowtypeExprs that are not related to partitioning.
> I don't remember making that point -- can you clarify?

I'm looking at
https://www.postgresql.org/message-id/CA%2BTgmoZSaKq-fYALn5jf6c_X3%3D%3DRb2s8eqLDwGpV%3DLNNhTXYwg%40mail.gmail.com
wherein you said

> There are definitely some things not to like about this approach.  In
> particular, I definitely agree that treating a converted whole-row
> reference specially is not very nice.  It feels like it might be
> substantially cleaner to be able to represent this idea as a single
> node rather than a combination of ConvertRowtypeExpr and var with
> varattno = 0.  Perhaps in the future we ought to consider either (a)
> trying to use a RowExpr for this rather than ConvertRowtypeExpr or (b)
> inventing a new WholeRowExpr node that stores two RTIs, one for the
> relation that's generating the whole-row reference and the other for
> the relation that is controlling the column ordering of the result or
> (c) allowing a Var to represent a whole-row expression that has to
> produce its outputs according to the ordering of some other RTE.  But
> I don't think it's wise or necessary to whack that around just to fix
> this bug; it is refactoring or improvement work best left to a future
> release.

I agree with all of that except your conclusion that it's unnecessary
to do it to fix this bug.  I find that entirely unsupported and over-
optimistic, especially in view of the number of iterations Ashutosh
went through trying to fix the fallout from not making a clear
distinction.

            regards, tom lane


On Fri, Aug 3, 2018 at 10:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Anyway, what I'm basically suggesting is that we just disable support for
> PWJ in the problematic cases in v11.  As long as PWJ isn't even on by
> default, that's not much of a loss.  Obviously we'll want to fix it in the
> future, but the hour grows late for v11, and I think either of these
> patches would need quite a bit more work to be committable.

That's not my first choice, but I'm OK with it.

>> There are definitely some things not to like about this approach.  In
>> particular, I definitely agree that treating a converted whole-row
>> reference specially is not very nice.  It feels like it might be
>> substantially cleaner to be able to represent this idea as a single
>> node rather than a combination of ConvertRowtypeExpr and var with
>> varattno = 0.  Perhaps in the future we ought to consider either (a)
>> trying to use a RowExpr for this rather than ConvertRowtypeExpr or (b)
>> inventing a new WholeRowExpr node that stores two RTIs, one for the
>> relation that's generating the whole-row reference and the other for
>> the relation that is controlling the column ordering of the result or
>> (c) allowing a Var to represent a whole-row expression that has to
>> produce its outputs according to the ordering of some other RTE.  But
>> I don't think it's wise or necessary to whack that around just to fix
>> this bug; it is refactoring or improvement work best left to a future
>> release.
>
> I agree with all of that except your conclusion that it's unnecessary
> to do it to fix this bug.  I find that entirely unsupported and over-
> optimistic, especially in view of the number of iterations Ashutosh
> went through trying to fix the fallout from not making a clear
> distinction.

I don't think that the question of how a converted whole-row expr can
really make a difference between being able to fix this bug and not
being able to fix this bug.  It's just a representational choice.  If
we decide that a converted whole-row expr is represented as
ConvertRowTypeExpr on top of a Var, then we have to check for that.
If we revise the representation to use a Var node directly, or to use
a new WholeRowVar node, then we'll just need to check for those things
instead.  I think that the shape of the fix itself does not change.
Moreover, it's pretty much a 1:1 mapping.  It's not like, say,
replacing plans with paths in the whole upper half of the planner,
where the representation is different enough that things that would
have been very difficult to manage in the old representation become
relatively simple in the new one.  It's basically just a question of
how you spell it.  Writing a test for ConvertRowTypeExpr atop Var and
using it a bunch of places is not beautiful, and it's certainly going
to be marginally slower than a direct test for WholeRowVar or some
other representation that bundles the whole thing into one node, but I
think it should work just fine, and the representation can be revised
later in a separate patch once we agree on an approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


(2018/08/03 22:28), Etsuro Fujita wrote:
> (2018/08/03 22:18), Etsuro Fujita wrote:
>> Here is a patch for refusing to generate PWJ paths when WRVs are
>> involved:
>>
>> * We no longer need to handle WRVs, so I've simplified
>> build_joinrel_tlist() and setrefs.c to what they were before
>> partition-wise join went in, as in the previous patch.
>>
>> * attr_needed data for each child is used for building child-joins'
>> tlists, but I think we can build those by applying
>> adjust_appendrel_attrs to the parent's tlists, without attr_needed. So,
>> I've also removed that as in the previous patch.
>
> One thing to add: as for the latter, we don't need the changes to
> placeholder.c either, so I've also simplified that to what they were
> before partition-wise join went in.

*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
***************
*** 3960,3965 **** create_ordinary_grouping_paths(PlannerInfo *root, 
RelOptInfo *input_rel,
--- 3960,3967 ----
      */
     if (extra->patype != PARTITIONWISE_AGGREGATE_NONE &&
         input_rel->part_scheme && input_rel->part_rels &&
+       (input_rel->reloptkind == RELOPT_BASEREL ||
+        input_rel->consider_partitionwise_join) &&
         !IS_DUMMY_REL(input_rel))
     {
         /*
***************
*** 6913,6919 **** apply_scanjoin_target_to_paths(PlannerInfo *root,
      * projection-capable, that might save a separate Result node, and 
it also
      * is important for partitionwise aggregate.
      */
!   if (rel->part_scheme && rel->part_rels)
     {
         int         partition_idx;
         List       *live_children = NIL;
--- 6915,6923 ----
      * projection-capable, that might save a separate Result node, and 
it also
      * is important for partitionwise aggregate.
      */
!   if (rel->part_scheme && rel->part_rels &&
!       (rel->reloptkind == RELOPT_BASEREL ||
!        rel->consider_partitionwise_join))
     {
         int         partition_idx;
         List       *live_children = NIL;

In the above I used the test whether the relation's reloptkind is 
RELOPT_BASEREL or not, but I noticed that I had overlooked the case of a 
multi-level partitioned table.  So I fixed that and added regression 
test cases for that.  I also revised comments a bit.  Attached is an 
updated version of the patch.

Best regards,
Etsuro Fujita

Attachment
On Mon, Aug 6, 2018 at 8:30 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> In the above I used the test whether the relation's reloptkind is
> RELOPT_BASEREL or not, but I noticed that I had overlooked the case of a
> multi-level partitioned table.  So I fixed that and added regression test
> cases for that.  I also revised comments a bit.  Attached is an updated
> version of the patch.

+       /* If so, consider partitionwise joins for that join. */
+       if (IS_PARTITIONED_REL(joinrel))
+               joinrel->consider_partitionwise_join = true;

Maybe this should assert that the inner and outer rels have
consider_partitionwise_join set.  There is an Assert quite a bit
earlier in the function that the parent join have it set, but I think
it might make sense to check the children have it set whenever we set
the flag.

Aside from that I don't really have any suggestions on this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


(2018/08/13 11:57), Robert Haas wrote:
> On Mon, Aug 6, 2018 at 8:30 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> In the above I used the test whether the relation's reloptkind is
>> RELOPT_BASEREL or not, but I noticed that I had overlooked the case of a
>> multi-level partitioned table.  So I fixed that and added regression test
>> cases for that.  I also revised comments a bit.  Attached is an updated
>> version of the patch.
>
> +       /* If so, consider partitionwise joins for that join. */
> +       if (IS_PARTITIONED_REL(joinrel))
> +               joinrel->consider_partitionwise_join = true;
>
> Maybe this should assert that the inner and outer rels have
> consider_partitionwise_join set.  There is an Assert quite a bit
> earlier in the function that the parent join have it set, but I think
> it might make sense to check the children have it set whenever we set
> the flag.

Agreed.  Done.

One thing I noticed might be an improvement is to skip 
build_joinrel_partition_info if the given joinrel will be to have 
consider_partitionwise_join=false; in the previous patch, that function 
created the joinrel's partition info such as part_scheme and part_rels 
if the joinrel is considered as partitioned, independently of the flag 
consider_partitionwise_join for it, but if that flag is false, we don't 
generate PWJ paths for the joinrel, so we would not need to create that 
partition info at all.  This would not only avoid unnecessary processing 
in that function, but also make unnecessary the changes I made to 
try_partitionwise_join, generate_partitionwise_join_paths, 
apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths.  So 
I updated the patch that way.  Please find attached an updated version 
of the patch.

Thanks for the review!

Best regards,
Etsuro Fujita

Attachment
On Mon, Aug 13, 2018 at 12:32 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> One thing I noticed might be an improvement is to skip
> build_joinrel_partition_info if the given joinrel will be to have
> consider_partitionwise_join=false; in the previous patch, that function
> created the joinrel's partition info such as part_scheme and part_rels if
> the joinrel is considered as partitioned, independently of the flag
> consider_partitionwise_join for it, but if that flag is false, we don't
> generate PWJ paths for the joinrel, so we would not need to create that
> partition info at all.  This would not only avoid unnecessary processing in
> that function, but also make unnecessary the changes I made to
> try_partitionwise_join, generate_partitionwise_join_paths,
> apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths.  So I
> updated the patch that way.  Please find attached an updated version of the
> patch.

I guess the question is whether there are (or might be in the future)
other dependencies on part_scheme.  For example, it looks like
partition pruning uses it.  I'm not sure whether partition pruning
supports a plan like:

Append
-> Nested Loop
  -> Seq Scan on p1
  -> Index Scan on q1
<repeat the above for p2/q2 etc.>

If it doesn't, that's just an implementation restriction; somebody
might want to fix things so it works, if it doesn't already.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


> On Aug 14, 2018, at 11:51 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Aug 13, 2018 at 12:32 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> One thing I noticed might be an improvement is to skip
>> build_joinrel_partition_info if the given joinrel will be to have
>> consider_partitionwise_join=false; in the previous patch, that function
>> created the joinrel's partition info such as part_scheme and part_rels if
>> the joinrel is considered as partitioned, independently of the flag
>> consider_partitionwise_join for it, but if that flag is false, we don't
>> generate PWJ paths for the joinrel, so we would not need to create that
>> partition info at all.  This would not only avoid unnecessary processing in
>> that function, but also make unnecessary the changes I made to
>> try_partitionwise_join, generate_partitionwise_join_paths,
>> apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths.  So I
>> updated the patch that way.  Please find attached an updated version of the
>> patch.
>
> I guess the question is whether there are (or might be in the future)
> other dependencies on part_scheme.  For example, it looks like
> partition pruning uses it.  I'm not sure whether partition pruning
> supports a plan like:
>
> Append
> -> Nested Loop
>  -> Seq Scan on p1
>  -> Index Scan on q1
> <repeat the above for p2/q2 etc.>
>
> If it doesn't, that's just an implementation restriction; somebody
> might want to fix things so it works, if it doesn't already.

While we (the RMT) are happy to see there has been progress on this thread,
11 Beta 3 was released containing this problem and the time to adequately
test this feature prior to GA release is rapidly dwindling.

The RMT would like to see this prioritized and resolved. At this point we have
considered the option of having partitionwise joins completely disabled in
PostgreSQL 11. This is preferably not the path we want to choose, but this
option is now on the table and we will issue an ultimatum soon if we do not see
further progress.

In the previous discussion, the generally accepted solution for PostgreSQL 11
seemed to be to disable the problematic case and any further work would be for
PostgreSQL 12 only. If something is different, please indicate why ASAP and
work to resolve. The RMT will also conduct some additional technical analysis
as well.

Sincerely,

Alvaro, Andres, Jonathan

Attachment
(2018/08/15 0:51), Robert Haas wrote:
> On Mon, Aug 13, 2018 at 12:32 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> One thing I noticed might be an improvement is to skip
>> build_joinrel_partition_info if the given joinrel will be to have
>> consider_partitionwise_join=false; in the previous patch, that function
>> created the joinrel's partition info such as part_scheme and part_rels if
>> the joinrel is considered as partitioned, independently of the flag
>> consider_partitionwise_join for it, but if that flag is false, we don't
>> generate PWJ paths for the joinrel, so we would not need to create that
>> partition info at all.  This would not only avoid unnecessary processing in
>> that function, but also make unnecessary the changes I made to
>> try_partitionwise_join, generate_partitionwise_join_paths,
>> apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths.  So I
>> updated the patch that way.  Please find attached an updated version of the
>> patch.
>
> I guess the question is whether there are (or might be in the future)
> other dependencies on part_scheme.  For example, it looks like
> partition pruning uses it.  I'm not sure whether partition pruning
> supports a plan like:
>
> Append
> ->  Nested Loop
>    ->  Seq Scan on p1
>    ->  Index Scan on q1
> <repeat the above for p2/q2 etc.>

I'm not sure that either, but if a join relation doesn't have 
part_scheme set, it means that that relation is considered as 
non-partitioned, as in the case when enable_partitionwise_join is off, 
so there would be no PWJ paths generated for it, to begin with.  So in 
that case, ISTM that we don't need to worry about that at least for 
partition pruning.

Best regards,
Etsuro Fujita


On 2018/08/15 12:25, Etsuro Fujita wrote:
> (2018/08/15 0:51), Robert Haas wrote:
>> On Mon, Aug 13, 2018 at 12:32 PM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>>> One thing I noticed might be an improvement is to skip
>>> build_joinrel_partition_info if the given joinrel will be to have
>>> consider_partitionwise_join=false; in the previous patch, that function
>>> created the joinrel's partition info such as part_scheme and part_rels if
>>> the joinrel is considered as partitioned, independently of the flag
>>> consider_partitionwise_join for it, but if that flag is false, we don't
>>> generate PWJ paths for the joinrel, so we would not need to create that
>>> partition info at all.  This would not only avoid unnecessary
>>> processing in
>>> that function, but also make unnecessary the changes I made to
>>> try_partitionwise_join, generate_partitionwise_join_paths,
>>> apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths.  So I
>>> updated the patch that way.  Please find attached an updated version of
>>> the
>>> patch.
>>
>> I guess the question is whether there are (or might be in the future)
>> other dependencies on part_scheme.  For example, it looks like
>> partition pruning uses it.  I'm not sure whether partition pruning
>> supports a plan like:
>>
>> Append
>> ->  Nested Loop
>>    ->  Seq Scan on p1
>>    ->  Index Scan on q1
>> <repeat the above for p2/q2 etc.>
> 
> I'm not sure that either, but if a join relation doesn't have part_scheme
> set, it means that that relation is considered as non-partitioned, as in
> the case when enable_partitionwise_join is off, so there would be no PWJ
> paths generated for it, to begin with.  So in that case, ISTM that we
> don't need to worry about that at least for partition pruning.

Fwiw, partition pruning works only for base rels, which applies to both
planning-time pruning (pruning is performed only during base rel size
estimation) and run-time pruning (we'll add pruning info to the Append
plan only if the source AppendPath's parent rel is a base rel).

Thanks,
Amit



(2018/08/15 13:04), Amit Langote wrote:
> On 2018/08/15 12:25, Etsuro Fujita wrote:
>> (2018/08/15 0:51), Robert Haas wrote:
>>> On Mon, Aug 13, 2018 at 12:32 PM, Etsuro Fujita
>>> <fujita.etsuro@lab.ntt.co.jp>   wrote:
>>>> One thing I noticed might be an improvement is to skip
>>>> build_joinrel_partition_info if the given joinrel will be to have
>>>> consider_partitionwise_join=false; in the previous patch, that function
>>>> created the joinrel's partition info such as part_scheme and part_rels if
>>>> the joinrel is considered as partitioned, independently of the flag
>>>> consider_partitionwise_join for it, but if that flag is false, we don't
>>>> generate PWJ paths for the joinrel, so we would not need to create that
>>>> partition info at all.  This would not only avoid unnecessary
>>>> processing in
>>>> that function, but also make unnecessary the changes I made to
>>>> try_partitionwise_join, generate_partitionwise_join_paths,
>>>> apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths.  So I
>>>> updated the patch that way.  Please find attached an updated version of
>>>> the
>>>> patch.
>>>
>>> I guess the question is whether there are (or might be in the future)
>>> other dependencies on part_scheme.  For example, it looks like
>>> partition pruning uses it.  I'm not sure whether partition pruning
>>> supports a plan like:
>>>
>>> Append
>>> ->   Nested Loop
>>>     ->   Seq Scan on p1
>>>     ->   Index Scan on q1
>>> <repeat the above for p2/q2 etc.>
>>
>> I'm not sure that either, but if a join relation doesn't have part_scheme
>> set, it means that that relation is considered as non-partitioned, as in
>> the case when enable_partitionwise_join is off, so there would be no PWJ
>> paths generated for it, to begin with.  So in that case, ISTM that we
>> don't need to worry about that at least for partition pruning.
>
> Fwiw, partition pruning works only for base rels, which applies to both
> planning-time pruning (pruning is performed only during base rel size
> estimation) and run-time pruning (we'll add pruning info to the Append
> plan only if the source AppendPath's parent rel is a base rel).

Thanks for that, Amit!

I looked into the question for the join or upper rel case, but couldn't 
find any places in the PG11 code where we assume that a join or upper 
rel has non-NULL part_scheme, as described below:

* Both try_partitionwise_join() and generate_partitionwise_join_paths() 
check whether a join rel to be considered has non-NULL part_scheme, 
through the IS_PARTITIONED_REL macro:

#define IS_PARTITIONED_REL(rel) \
     ((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
      (rel)->part_rels && !(IS_DUMMY_REL(rel)))

If IS_PARTITIONED_REL, the former adds paths to child-joins, and the 
latter builds PWJ paths; but both don't assume non-NULL part_scheme.

* apply_scanjoin_target_to_paths() checks whether the topmost scan/join 
rel has non-NULL part_scheme, and if so, it recursively adjusts all 
partitions; it doesn't assume non-NULL part_scheme.

* create_ordinary_grouping_paths() also checks whether the topmost 
scan/join rel adjusted by apply_scanjoin_target_to_paths() has non-NULL 
part_scheme, and if so, it considers PWA paths; it doesn't assume 
non-NULL part_scheme either.

* add_paths_to_append_rel(), which is called from each of the above (ie, 
generate_partitionwise_join_paths(), apply_scanjoin_target_to_paths(), 
and create_partitionwise_grouping_paths() in 
create_ordinary_grouping_paths() for creating PWJ paths, adjusting PWJ 
paths, and creating PWA paths respectively) also checks whether a rel to 
be considered has non-NULL part_scheme, but never assumes that.  And 
actually, if the rel's part_scheme is NULL, add_paths_to_append_rel() 
wouldn't be called for the rel, because of the reason described above.

Thanks for the comments, Robert!

Best regards,
Etsuro Fujita


On Wed, Aug 15, 2018 at 7:35 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> Thanks for the comments, Robert!

Given the comments from the RMT, and also on general principle, it
seems like we really need to get on with committing something here.
It's my understanding you plan to do that, since it's your patch.
When?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


(2018/08/15 23:40), Robert Haas wrote:
> Given the comments from the RMT, and also on general principle, it
> seems like we really need to get on with committing something here.
> It's my understanding you plan to do that, since it's your patch.
> When?

I plan to do that late next week as I'll go on leave until next Tuesday.

Best regards,
Etsuro Fujita


(2018/08/16 12:00), Etsuro Fujita wrote:
> (2018/08/15 23:40), Robert Haas wrote:
>> Given the comments from the RMT, and also on general principle, it
>> seems like we really need to get on with committing something here.
>> It's my understanding you plan to do that, since it's your patch.
>> When?
>
> I plan to do that late next week as I'll go on leave until next Tuesday.

I added the commit message.  Please find attached an updated version of 
the patch.  Does that make sense?  If there are not objections, I'll 
push this tomorrow.

Best regards,
Etsuro Fujita

Attachment
(2018/08/22 20:08), Etsuro Fujita wrote:
> (2018/08/16 12:00), Etsuro Fujita wrote:
>> (2018/08/15 23:40), Robert Haas wrote:
>>> Given the comments from the RMT, and also on general principle, it
>>> seems like we really need to get on with committing something here.
>>> It's my understanding you plan to do that, since it's your patch.
>>> When?
>>
>> I plan to do that late next week as I'll go on leave until next Tuesday.
>
> I added the commit message. Please find attached an updated version of
> the patch. Does that make sense? If there are not objections, I'll push
> this tomorrow.

I tried this today, but doing git behind the corporate firewall doesn't 
work.  I don't know the clear cause of that, so I'll investigate that 
tomorrow.

Best regards,
Etsuro Fujita


On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote:
> I tried this today, but doing git behind the corporate firewall doesn't
> work.  I don't know the clear cause of that, so I'll investigate that
> tomorrow.

You may be able to tweak that by using https as origin point or proper
git proxy settings?
--
Michael

Attachment
(2018/08/24 11:47), Michael Paquier wrote:
> On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote:
>> I tried this today, but doing git behind the corporate firewall doesn't
>> work.  I don't know the clear cause of that, so I'll investigate that
>> tomorrow.
>
> You may be able to tweak that by using https as origin point or proper
> git proxy settings?

Yeah, my proxy settings were not correct.  With the help of my 
colleagues Horiguchi-san and Yamada-san, I corrected them but still 
can't clone the master repository.  Running git with GIT_CURL_VERBOSE 
shows that there is another issue in my terminal environment, so I'm 
trying to resolve that.

Thanks for the advice!

Best regards,
Etsuro Fujita


> On Aug 24, 2018, at 8:38 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
>
> (2018/08/24 11:47), Michael Paquier wrote:
>> On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote:
>>> I tried this today, but doing git behind the corporate firewall doesn't
>>> work.  I don't know the clear cause of that, so I'll investigate that
>>> tomorrow.
>>
>> You may be able to tweak that by using https as origin point or proper
>> git proxy settings?
>
> Yeah, my proxy settings were not correct.  With the help of my colleagues Horiguchi-san and Yamada-san, I corrected
thembut still can't clone the master repository.  Running git with GIT_CURL_VERBOSE shows that there is another issue
inmy terminal environment, so I'm trying to resolve that. 

Are there any updates on getting this patch committed?

Thanks,

Jonathan

Attachment
(2018/08/29 0:21), Jonathan S. Katz wrote:
>> On Aug 24, 2018, at 8:38 AM, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp>  wrote:
>> (2018/08/24 11:47), Michael Paquier wrote:
>>> On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote:
>>>> I tried this today, but doing git behind the corporate firewall doesn't
>>>> work.  I don't know the clear cause of that, so I'll investigate that
>>>> tomorrow.
>>>
>>> You may be able to tweak that by using https as origin point or proper
>>> git proxy settings?
>>
>> Yeah, my proxy settings were not correct.  With the help of my colleagues Horiguchi-san and Yamada-san, I corrected
thembut still can't clone the master repository.  Running git with GIT_CURL_VERBOSE shows that there is another issue
inmy terminal environment, so I'm trying to resolve that.
 
>
> Are there any updates on getting this patch committed?

That investigation has shown that the cause is my company firewall, not 
my terminal environment; that firewall has to be configured to allow me 
to access to that repository.  So, I'm asking my company about that.

Best regards,
Etsuro Fujita


(2018/08/29 18:40), Etsuro Fujita wrote:
> (2018/08/29 0:21), Jonathan S. Katz wrote:
>>> On Aug 24, 2018, at 8:38 AM, Etsuro
>>> Fujita<fujita.etsuro@lab.ntt.co.jp> wrote:
>>> (2018/08/24 11:47), Michael Paquier wrote:
>>>> On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote:
>>>>> I tried this today, but doing git behind the corporate firewall
>>>>> doesn't
>>>>> work. I don't know the clear cause of that, so I'll investigate that
>>>>> tomorrow.
>>>>
>>>> You may be able to tweak that by using https as origin point or proper
>>>> git proxy settings?
>>>
>>> Yeah, my proxy settings were not correct. With the help of my
>>> colleagues Horiguchi-san and Yamada-san, I corrected them but still
>>> can't clone the master repository. Running git with GIT_CURL_VERBOSE
>>> shows that there is another issue in my terminal environment, so I'm
>>> trying to resolve that.
>>
>> Are there any updates on getting this patch committed?
>
> That investigation has shown that the cause is my company firewall, not
> my terminal environment; that firewall has to be configured to allow me
> to access to that repository. So, I'm asking my company about that.

I got the approval from my boss today, so I think that I can have that 
access soon.

Best regards,
Etsuro Fujita


(2018/08/30 20:25), Etsuro Fujita wrote:
> (2018/08/29 18:40), Etsuro Fujita wrote:
>> (2018/08/29 0:21), Jonathan S. Katz wrote:
>>>> On Aug 24, 2018, at 8:38 AM, Etsuro
>>>> Fujita<fujita.etsuro@lab.ntt.co.jp> wrote:
>>>> (2018/08/24 11:47), Michael Paquier wrote:
>>>>> On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote:
>>>>>> I tried this today, but doing git behind the corporate firewall
>>>>>> doesn't
>>>>>> work. I don't know the clear cause of that, so I'll investigate that
>>>>>> tomorrow.
>>>>>
>>>>> You may be able to tweak that by using https as origin point or proper
>>>>> git proxy settings?
>>>>
>>>> Yeah, my proxy settings were not correct. With the help of my
>>>> colleagues Horiguchi-san and Yamada-san, I corrected them but still
>>>> can't clone the master repository. Running git with GIT_CURL_VERBOSE
>>>> shows that there is another issue in my terminal environment, so I'm
>>>> trying to resolve that.
>>>
>>> Are there any updates on getting this patch committed?
>>
>> That investigation has shown that the cause is my company firewall, not
>> my terminal environment; that firewall has to be configured to allow me
>> to access to that repository. So, I'm asking my company about that.
>
> I got the approval from my boss today, so I think that I can have that
> access soon.

I fixed typos in the commit message, which Alvaro pointed out off-list, 
and revised the message a little bit.  Also, I reread the patch and 
noticed that the latest version includes now-unnecessary regression test 
cases; those were added to the previous version 
(refuse-pwj-when-wrvs-involved-2.patch in [1]) to check that 
apply_scanjoin_target_to_paths() and create_ordinary_grouping_paths() 
work for cases where whole-row Vars are involved, because I modified 
those functions, but the latest version doesn't touch those functions 
anymore, so those test cases seems unnecessary.  So I removed those (no 
other changes), and committed the patch.

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5B683F60.2070806%40lab.ntt.co.jp


On 8/31/18 7:54 AM, Etsuro Fujita wrote:
> (2018/08/30 20:25), Etsuro Fujita wrote:
>> (2018/08/29 18:40), Etsuro Fujita wrote:
>>> (2018/08/29 0:21), Jonathan S. Katz wrote:
>>>>> On Aug 24, 2018, at 8:38 AM, Etsuro
>>>>> Fujita<fujita.etsuro@lab.ntt.co.jp> wrote:
>>>>> (2018/08/24 11:47), Michael Paquier wrote:
>>>>>> On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote:
>>>>>>> I tried this today, but doing git behind the corporate firewall
>>>>>>> doesn't
>>>>>>> work. I don't know the clear cause of that, so I'll investigate
>>>>>>> that
>>>>>>> tomorrow.
>>>>>>
>>>>>> You may be able to tweak that by using https as origin point or
>>>>>> proper
>>>>>> git proxy settings?
>>>>>
>>>>> Yeah, my proxy settings were not correct. With the help of my
>>>>> colleagues Horiguchi-san and Yamada-san, I corrected them but still
>>>>> can't clone the master repository. Running git with GIT_CURL_VERBOSE
>>>>> shows that there is another issue in my terminal environment, so I'm
>>>>> trying to resolve that.
>>>>
>>>> Are there any updates on getting this patch committed?
>>>
>>> That investigation has shown that the cause is my company firewall, not
>>> my terminal environment; that firewall has to be configured to allow me
>>> to access to that repository. So, I'm asking my company about that.
>>
>> I got the approval from my boss today, so I think that I can have that
>> access soon.
>
> I fixed typos in the commit message, which Alvaro pointed out
> off-list, and revised the message a little bit.  Also, I reread the
> patch and noticed that the latest version includes now-unnecessary
> regression test cases; those were added to the previous version
> (refuse-pwj-when-wrvs-involved-2.patch in [1]) to check that
> apply_scanjoin_target_to_paths() and create_ordinary_grouping_paths()
> work for cases where whole-row Vars are involved, because I modified
> those functions, but the latest version doesn't touch those functions
> anymore, so those test cases seems unnecessary.  So I removed those
> (no other changes), and committed the patch.

Thank you for taking care of that and for committing the patch. I have
now closed this issues on the open items list.

Jonathan



Attachment
(2018/08/31 21:30), Jonathan S. Katz wrote:
> On 8/31/18 7:54 AM, Etsuro Fujita wrote:
>> (2018/08/30 20:25), Etsuro Fujita wrote:
>>> (2018/08/29 18:40), Etsuro Fujita wrote:
>>>> (2018/08/29 0:21), Jonathan S. Katz wrote:
>>>>>> On Aug 24, 2018, at 8:38 AM, Etsuro
>>>>>> Fujita<fujita.etsuro@lab.ntt.co.jp>  wrote:
>>>>>> (2018/08/24 11:47), Michael Paquier wrote:
>>>>>>> On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote:
>>>>>>>> I tried this today, but doing git behind the corporate firewall
>>>>>>>> doesn't
>>>>>>>> work. I don't know the clear cause of that, so I'll investigate
>>>>>>>> that
>>>>>>>> tomorrow.
>>>>>>>
>>>>>>> You may be able to tweak that by using https as origin point or
>>>>>>> proper
>>>>>>> git proxy settings?
>>>>>>
>>>>>> Yeah, my proxy settings were not correct. With the help of my
>>>>>> colleagues Horiguchi-san and Yamada-san, I corrected them but still
>>>>>> can't clone the master repository. Running git with GIT_CURL_VERBOSE
>>>>>> shows that there is another issue in my terminal environment, so I'm
>>>>>> trying to resolve that.
>>>>>
>>>>> Are there any updates on getting this patch committed?
>>>>
>>>> That investigation has shown that the cause is my company firewall, not
>>>> my terminal environment; that firewall has to be configured to allow me
>>>> to access to that repository. So, I'm asking my company about that.
>>>
>>> I got the approval from my boss today, so I think that I can have that
>>> access soon.
>>
>> I fixed typos in the commit message, which Alvaro pointed out
>> off-list, and revised the message a little bit.  Also, I reread the
>> patch and noticed that the latest version includes now-unnecessary
>> regression test cases; those were added to the previous version
>> (refuse-pwj-when-wrvs-involved-2.patch in [1]) to check that
>> apply_scanjoin_target_to_paths() and create_ordinary_grouping_paths()
>> work for cases where whole-row Vars are involved, because I modified
>> those functions, but the latest version doesn't touch those functions
>> anymore, so those test cases seems unnecessary.  So I removed those
>> (no other changes), and committed the patch.
>
> Thank you for taking care of that and for committing the patch. I have
> now closed this issues on the open items list.

Thanks!

Best regards,
Etsuro Fujita


On 2018/08/31 21:40, Etsuro Fujita wrote:
> (2018/08/31 21:30), Jonathan S. Katz wrote:
>> Thank you for taking care of that and for committing the patch. I have
>> now closed this issues on the open items list.
> 
> Thanks!

I noticed that the CF entry for this was not closed.  As of this morning,
it's been moved to the next 2018-11 CF:

https://commitfest.postgresql.org/20/1554/

I tried to close it as being committed, but couldn't do so, because I
can't find Fujita-san's name in the list of committers in the CF app's
drop down box that lists all committers.

Thanks,
Amit



On Tue, Oct 02, 2018 at 04:11:54PM +0900, Amit Langote wrote:
> I tried to close it as being committed, but couldn't do so, because I
> can't find Fujita-san's name in the list of committers in the CF app's
> drop down box that lists all committers.

Indeed, Fujita-san has been added to the list.  And I switched this CF
entry at the same time.
--
Michael

Attachment
(2018/10/02 16:45), Michael Paquier wrote:
> On Tue, Oct 02, 2018 at 04:11:54PM +0900, Amit Langote wrote:
>> I tried to close it as being committed, but couldn't do so, because I
>> can't find Fujita-san's name in the list of committers in the CF app's
>> drop down box that lists all committers.
>
> Indeed, Fujita-san has been added to the list.  And I switched this CF
> entry at the same time.

Thank you guys!

Best regards,
Etsuro Fujita


Partition-wise join with whole row vars

From
Alexander Pyhalov
Date:
Hello.

I was looking at enabling partition-wise join with whole row vars. My 
main motivation
was to enable push down DML with partition-wise join in postgres_fdw. 
The work is
based on the earlier patches of Ashutosh Bapat [1].

Partition-wise joins are disabled when whole row vars from relations, 
participating in join, are required to be selected.
Using whole row vars is not frequent by itself, but happens when FDW is 
used in DML or SELECT
FOR UPDATE/ FOR SHARE queries. I'd like to remove this restriction, 
because this will make it possible
to do direct modify operations DML joins in postgres_fdw.
Currently there's an issue in PostgreSQL that when we want to get whole 
row var of child table, we should
translate the record type of child table to the record type of the 
parent table. So, there appear additional
ConvertRowtyeExpressions to cast types and different places in planner 
(and in postgres_fdw) now should know about this.

Here's the series of patches, originally proposed by Ashutosh Bapat, 
rebased and updated to work with current master.

What was done:
1) Reverting changes, that forbid PWJ when whole row vars are required. 
This also restores logic in setrefs.c, necessary to
deal with ConvertRowtyeExpressions.

2) Applying modified patch from the original series to handle 
ConvertRowtypeExprs in pull_vars_clause(). Unlike original patch,
default pull_var_clause_walker() behavior is unchanged. However, when 
PVC_INCLUDE_CONVERTROWTYPES flag is specified, converted
whole row references are returned as is and are not recursed to. This is 
done in such a way to avoid modifying all function consumers.

3) Modified one of the original patches to handle ConvertRowtypeExpr in 
find_computable_ec_member().

4) The next patch modifies logic to search for ConvertRowtypeExpr in 
search_indexed_tlist_for_non_var() - at least for rowid vars 
varnullingrels
can be omitted, so we avoid looking at them when comparing different 
ConvertRowtypeExprs. It seems to be working as expected, but I'm not
deadly sure about this. Perhaps, we also should consider 
context->nrm_match?

5) The next patch is the original one - pulling ConvertRowtypeExpr in  
build_tlist_to_deparse() and deparsing ConvertRowtypeExpr().

6) The rest is fix for postgres_fdw to correctly deparse DML.
The first tricky part here is that in UPDATE SET clause arbitrary 
relations can appear, and currently there's no API to handle this.
So I've modified get_translated_update_targetlist() to do proper 
adjustments.
The second is work with returning lists - we should set tableoids for 
the returned values. This is done by modifying returning filters
to save this information and later to set it in returned tuples.

What this gives us - is the posibility of partition-wise joins for 
foreign DML, like following:

EXPLAIN (COSTS OFF, VERBOSE)
DELETE FROM fprt1 USING fprt2 WHERE fprt1.a = fprt2.b AND fprt2.a % 30 = 
29;

                                                             QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
  Delete on public.fprt1
    Foreign Delete on public.ftprt1_p1 fprt1_1
    Foreign Delete on public.ftprt1_p2 fprt1_2
     ->  Append
           ->  Foreign Delete
                Remote SQL: DELETE FROM public.fprt1_p1 r3 USING 
public.fprt2_p1 r5 WHERE ((r3.a = r5.b)) AND (((r5.a % 30) = 29))
           ->  Foreign Delete
                Remote SQL: DELETE FROM public.fprt1_p2 r4 USING 
public.fprt2_p2 r6 WHERE ((r4.a = r6.b)) AND (((r6.a % 30) = 29))


[1] 
https://www.postgresql.org/message-id/CAFjFpRc8ZoDm0%2Bzhx%2BMckwGyEqkOzWcpVqbvjaxwdGarZSNrmA%40mail.gmail.com
-- 
Best regards,
Alexander Pyhalov,
Postgres Professional
Attachment