Thread: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Rajkumar Raghuwanshi
Date:
Hi,
I am getting "ERROR: unexpected expression in subquery output" and "ERROR: variable not found in subplan target lists" errors, for "FOR UPDATE" with postgres_fdw. (when set enable_partition_wise_join to true);Attached patch have queries which are throwing mentioned error on running make check in contrib/postgres_fdw folder.
An independent test case to reproduce this is given below.
SET enable_partition_wise_join TO true;
CREATE EXTENSION postgres_fdw;
CREATE SERVER fdw_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname 'postgres', port '5432', use_remote_estimate 'true');
CREATE USER MAPPING FOR PUBLIC SERVER fdw_server;
CREATE TABLE pt1 ( c1 int NOT NULL, c2 int NOT NULL, c3 text)PARTITION BY RANGE(c1);
CREATE TABLE pt1p1 (like pt1);
CREATE TABLE pt1p2 (like pt1);
CREATE TABLE pt2 (c1 int NOT NULL, c2 int NOT NULL, c3 text) PARTITION BY RANGE(c1);
CREATE TABLE pt2p1 (like pt2);
CREATE TABLE pt2p2 (like pt2);
INSERT INTO pt1p1 SELECT id, id + 1,'AAA' || to_char(id, 'FM000') FROM generate_series(-100, -1) id;
INSERT INTO pt1p2 SELECT id, id + 1,'AAA' || to_char(id, 'FM000') FROM generate_series(1, 99) id;
INSERT INTO pt2p1 SELECT id, id + 1,'AAA' || to_char(id, 'FM000') FROM generate_series(-100, -1) id;
INSERT INTO pt2p2 SELECT id, id + 1,'AAA' || to_char(id, 'FM000') FROM generate_series(1, 99) id;
CREATE FOREIGN TABLE ft1p1 PARTITION OF pt1 FOR VALUES FROM (MINVALUE) TO (0) SERVER fdw_server OPTIONS (table_name 'pt1p1');
CREATE FOREIGN TABLE ft1p2 PARTITION OF pt1 FOR VALUES FROM (0) TO (MAXVALUE) SERVER fdw_server OPTIONS (table_name 'pt1p2');
CREATE FOREIGN TABLE ft2p1 PARTITION OF pt2 FOR VALUES FROM (MINVALUE) TO (0) SERVER fdw_server OPTIONS (table_name 'pt2p1');
CREATE FOREIGN TABLE ft2p2 PARTITION OF pt2 FOR VALUES FROM (0) TO (MAXVALUE) SERVER fdw_server OPTIONS (table_name 'pt2p2');
ANALYZE pt1;
ANALYZE pt2;
ANALYZE ft1p1;
ANALYZE ft1p2;
ANALYZE ft2p1;
ANALYZE ft2p2;
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM pt1 WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM pt1 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM pt2 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
ERROR: unexpected expression in subquery output
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM pt1) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM pt1) t2 FULL JOIN (SELECT c1 FROM pt1) t3 ON (t2.c1 = t3.c1)) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
ERROR: variable not found in subplan target lists
SET enable_partition_wise_join TO true;
CREATE EXTENSION postgres_fdw;
CREATE SERVER fdw_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname 'postgres', port '5432', use_remote_estimate 'true');
CREATE USER MAPPING FOR PUBLIC SERVER fdw_server;
CREATE TABLE pt1 ( c1 int NOT NULL, c2 int NOT NULL, c3 text)PARTITION BY RANGE(c1);
CREATE TABLE pt1p1 (like pt1);
CREATE TABLE pt1p2 (like pt1);
CREATE TABLE pt2 (c1 int NOT NULL, c2 int NOT NULL, c3 text) PARTITION BY RANGE(c1);
CREATE TABLE pt2p1 (like pt2);
CREATE TABLE pt2p2 (like pt2);
INSERT INTO pt1p1 SELECT id, id + 1,'AAA' || to_char(id, 'FM000') FROM generate_series(-100, -1) id;
INSERT INTO pt1p2 SELECT id, id + 1,'AAA' || to_char(id, 'FM000') FROM generate_series(1, 99) id;
INSERT INTO pt2p1 SELECT id, id + 1,'AAA' || to_char(id, 'FM000') FROM generate_series(-100, -1) id;
INSERT INTO pt2p2 SELECT id, id + 1,'AAA' || to_char(id, 'FM000') FROM generate_series(1, 99) id;
CREATE FOREIGN TABLE ft1p1 PARTITION OF pt1 FOR VALUES FROM (MINVALUE) TO (0) SERVER fdw_server OPTIONS (table_name 'pt1p1');
CREATE FOREIGN TABLE ft1p2 PARTITION OF pt1 FOR VALUES FROM (0) TO (MAXVALUE) SERVER fdw_server OPTIONS (table_name 'pt1p2');
CREATE FOREIGN TABLE ft2p1 PARTITION OF pt2 FOR VALUES FROM (MINVALUE) TO (0) SERVER fdw_server OPTIONS (table_name 'pt2p1');
CREATE FOREIGN TABLE ft2p2 PARTITION OF pt2 FOR VALUES FROM (0) TO (MAXVALUE) SERVER fdw_server OPTIONS (table_name 'pt2p2');
ANALYZE pt1;
ANALYZE pt2;
ANALYZE ft1p1;
ANALYZE ft1p2;
ANALYZE ft2p1;
ANALYZE ft2p2;
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM pt1 WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM pt1 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM pt2 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
ERROR: unexpected expression in subquery output
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM pt1) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM pt1) t2 FULL JOIN (SELECT c1 FROM pt1) t3 ON (t2.c1 = t3.c1)) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
ERROR: variable not found in subplan target lists
Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation
Attachment
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Mon, Feb 12, 2018 at 6:00 PM, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote: > Hi, > > I am getting "ERROR: unexpected expression in subquery output" and "ERROR: > variable not found in subplan target lists" errors, for "FOR UPDATE" with > postgres_fdw. (when set enable_partition_wise_join to true); > > Attached patch have queries which are throwing mentioned error on running > make check in contrib/postgres_fdw folder. > Thanks a lot for the patch. I can apply it and reproduce the error. Here's my analysis of the bug. The node for which this error comes is a ConvertRowtypeExpr node with Var::varattno = 0 under it. Whole row reference of the parent is converted to ConvertRowtypeExpr with whole row of child as an argument. When partition-wise join is used, targetlist of child-joins contain such ConvertRowtypeExpr when the parent-join's targetlist has whole-row references of joining partitioned tables. When we deparse the targetlist of join pushed down by postgres FDW, build_tlist_to_deparse() pulls only Var nodes nodes from the join's targetlist. So it pulls Var reprensenting a whole-row reference of a child from a ConvertRowtypeExpr, when building targetlist to be deparsed for a child-join with whole-row references. This targetlist is then saved as fdw_scan_tlist in ForeignScanPlan. This causes two problems shown by the two queries below > > EXPLAIN (VERBOSE, COSTS OFF) > SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM pt1 WHERE c1 = 50) t1 INNER > JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM pt1 WHERE c1 between 50 and > 60) t2 FULL JOIN (SELECT c1 FROM pt2 WHERE c1 between 50 and 60) t3 ON > (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) > ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1; > ERROR: unexpected expression in subquery output get_relation_column_alias_ids() uses foreignrel's targetlist (foreignrel->reltarget->exprs) as it is to locate given node to be deparsed. If the joining relation corresponding to ConvertRowtypeExpr is deparsed as a subquery, this function is called with whole-row reference node (Var node with varattno = 0). But the relation's targetlist doesn't contain its whole-row reference directly but has it embedded in ConvertRowtypeExpr. So, the function doesn't find the given node and throws error. > > EXPLAIN (VERBOSE, COSTS OFF) > SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM pt1) t1 INNER JOIN (SELECT > t2.c1, t3.c1 FROM (SELECT c1 FROM pt1) t2 FULL JOIN (SELECT c1 FROM pt1) t3 > ON (t2.c1 = t3.c1)) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE > OF t1; > ERROR: variable not found in subplan target lists When there is possibility of EvalPlanQual being called, we construct local join plan matching the pushed down foreign join. In postgresGetForeignPlan() after we have built the local join plan, the topmost plan node's targetlist is changed to fdw_scan_tlist to match the output of the ForeignScan node. As explained above, this targetlist contains a bare reference to whole-row reference of a child relation if the child-join's targetlist contains a ConvertRowtypeExpr. When changing the topmost plan node's targetlist, we do not modify the targetlists of its left and right tree nodes. The left/right plan involving corresponding child relation will have ConvertRowtypeExpr expression in its targetlist, but not whole-row reference directly. When the topmost local join plan node's targetlist is processed by set_plan_ref(), it throws error "variable not found in subplan target lists" since it doesn't find bare whole-row reference of the child relation in subplan's targetlists. The problem can be solved in two ways: 1. Push down ConvertRowtypeExpr and include it in the pushed down targetlist. This would solve both the problems described above. Both set_plan_ref() and get_relation_column_alias_ids() will find ConvertRowtypeExpr, they are looking for and won't throw an error. This requires two parts a. In build_tlist_to_deparse(), instead of pulling Var node from ConvertRowtypeExpr, we pull whole ConvertRowtypeExpr and include it in the targetlist being deparsed which is also used as to set fdw_scan_tlist. In order to pull any ConvertRowtypeExpr's in the local quals, which may be hidden in the expression tree, we will add two more options to flags viz. PVC_INCLUDE_CONVERTROWTYPEEXPR and PVC_RECURSE_CONVERTROWTYPEEXPR. Unlike the other PVC_* options, which do not default to a value, we may want to default to PVC_RECURSE_CONVERTROWTYPEEXPR if nothing is specified. That will avoid, possibly updating every pull_var_clause call with PVC_RECURSE_CONVERTROWTYPEEXPR. b. deparse ConvertRowtypeExpr For this we need to get the conversion map between the parent and child. We then deparse ConvertRowtypeExpr as a ROW() with the attributes of child rearranged per the conversion map. A multi-level partitioned table will have nested ConvertRowtypeExpr. To deparse such expressions, we need to find the conversion map between the topmost parent and the child, by ignoring any intermediate parents. 2. Modify the local join plan entirely to contain whole row reference of child relations instead of ConvertRowtypeExpr and deparse a ConvertRowtypeExpr as a whole-row reference in a subquery. Again we need two part solution: a. For this we need to write a walker which walks the plan tree distributing the Vars in the topmost targetlist to the left and right subtrees. Thus we replace ConvertRowtypeExpr with corresponding whole-row references in the whole plan tree. b. When get_relation_column_alias_ids() encounters a ConvertRowtypeExpr, it pulls out the embedded whole row reference and returns the corresponding column id. deparseExpr() calls deparseVar() by pulling out the embedded whole-row reference Var when it encouters ConvertRowtypeExpr. Comments, thoughts any other solutions? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/02/13 21:51), Ashutosh Bapat wrote: > Here's my analysis of the bug. > > The node for which this error comes is a ConvertRowtypeExpr node with > Var::varattno = 0 under it. Whole row reference of the parent is converted to > ConvertRowtypeExpr with whole row of child as an argument. When partition-wise > join is used, targetlist of child-joins contain such ConvertRowtypeExpr when > the parent-join's targetlist has whole-row references of joining > partitioned tables. > > When we deparse the targetlist of join pushed down by postgres FDW, > build_tlist_to_deparse() pulls only Var nodes nodes from the join's targetlist. > So it pulls Var reprensenting a whole-row reference of a child from a > ConvertRowtypeExpr, when building targetlist to be deparsed for a child-join > with whole-row references. This targetlist is then saved as fdw_scan_tlist in > ForeignScanPlan. > > This causes two problems shown by the two queries below >> EXPLAIN (VERBOSE, COSTS OFF) >> SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM pt1 WHERE c1 = 50) t1 INNER >> JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM pt1 WHERE c1 between 50 and >> 60) t2 FULL JOIN (SELECT c1 FROM pt2 WHERE c1 between 50 and 60) t3 ON >> (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) >> ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1; >> ERROR: unexpected expression in subquery output > > get_relation_column_alias_ids() uses foreignrel's targetlist > (foreignrel->reltarget->exprs) as it is to locate given node to be deparsed. > If the joining relation corresponding to ConvertRowtypeExpr is deparsed as a > subquery, this function is called with whole-row reference node (Var node with > varattno = 0). But the relation's targetlist doesn't contain its whole-row > reference directly but has it embedded in ConvertRowtypeExpr. So, the function > doesn't find the given node and throws error. >> EXPLAIN (VERBOSE, COSTS OFF) >> SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM pt1) t1 INNER JOIN (SELECT >> t2.c1, t3.c1 FROM (SELECT c1 FROM pt1) t2 FULL JOIN (SELECT c1 FROM pt1) t3 >> ON (t2.c1 = t3.c1)) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE >> OF t1; >> ERROR: variable not found in subplan target lists > > When there is possibility of EvalPlanQual being called, we construct local > join plan matching the pushed down foreign join. In postgresGetForeignPlan() > after we have built the local join plan, the topmost plan node's targetlist is > changed to fdw_scan_tlist to match the output of the ForeignScan node. As > explained above, this targetlist contains a bare reference to whole-row > reference of a child relation if the child-join's targetlist contains a > ConvertRowtypeExpr. When changing the topmost plan node's targetlist, we do not > modify the targetlists of its left and right tree nodes. The left/right plan > involving corresponding child relation will have ConvertRowtypeExpr expression > in its targetlist, but not whole-row reference directly. When the topmost local > join plan node's targetlist is processed by set_plan_ref(), it throws error > "variable not found in subplan target lists" since it doesn't find bare > whole-row reference of the child relation in subplan's targetlists. Thanks for the analysis! > The problem can be solved in two ways: > > 1. Push down ConvertRowtypeExpr and include it in the pushed down targetlist. > This would solve both the problems described above. Both set_plan_ref() and > get_relation_column_alias_ids() will find ConvertRowtypeExpr, they are looking > for and won't throw an error. > > This requires two parts > a. In build_tlist_to_deparse(), instead of pulling > Var node from ConvertRowtypeExpr, we pull whole ConvertRowtypeExpr and include > it in the targetlist being deparsed which is also used as to set > fdw_scan_tlist. In order to pull any ConvertRowtypeExpr's in the local quals, > which may be hidden in the expression tree, we will add two more options to > flags viz. PVC_INCLUDE_CONVERTROWTYPEEXPR and PVC_RECURSE_CONVERTROWTYPEEXPR. > Unlike the other PVC_* options, which do not default to a value, we may want to > default to PVC_RECURSE_CONVERTROWTYPEEXPR if nothing is specified. That will > avoid, possibly updating every pull_var_clause call with > PVC_RECURSE_CONVERTROWTYPEEXPR. > b. deparse ConvertRowtypeExpr > For this we need to get the conversion map between the parent and child. We > then deparse ConvertRowtypeExpr as a ROW() with the attributes of child > rearranged per the conversion map. A multi-level partitioned table will have > nested ConvertRowtypeExpr. To deparse such expressions, we need to find the > conversion map between the topmost parent and the child, by ignoring any > intermediate parents. > > 2. Modify the local join plan entirely to contain whole row reference of > child relations instead of ConvertRowtypeExpr and deparse a ConvertRowtypeExpr > as a whole-row reference in a subquery. > Again we need two part solution: > a. For this we need to write a walker which walks the plan tree distributing the > Vars in the topmost targetlist to the left and right subtrees. Thus we replace > ConvertRowtypeExpr with corresponding whole-row references in the whole plan > tree. > b. When get_relation_column_alias_ids() encounters a > ConvertRowtypeExpr, it pulls > out the embedded whole row reference and returns the corresponding column id. > deparseExpr() calls deparseVar() by pulling out the embedded whole-row > reference Var when it encouters ConvertRowtypeExpr. I'd vote for #1, but ISTM that that's more like a feature, not a fix. Pushing down ConvertRowtypeExprs to the remote seems to me to be the same problem as pushing down PHVs to the remote, which is definitely a feature development. I think a fix for this would be to just give up pushing down a child join in foreign_join_ok or somewhere else if the reltarget of the child-join relation contains any ConvertRowtypeExprs. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Wed, Feb 14, 2018 at 2:50 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > > I'd vote for #1, but ISTM that that's more like a feature, not a fix. > Pushing down ConvertRowtypeExprs to the remote seems to me to be the same > problem as pushing down PHVs to the remote, which is definitely a feature > development. I think a fix for this would be to just give up pushing down a > child join in foreign_join_ok or somewhere else if the reltarget of the > child-join relation contains any ConvertRowtypeExprs. That means we won't be able to push down any scans under a DML on a partitioned table. That will be too restrictive. I think the case of PHV and ConvertRowtypeExprs are different. The the former we need a subquery to push PHVs, whereas to deparse ConvertRowtypeExpr on the nullable side we can use the same strategy as whole-row expression deparsing. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Tue, Feb 13, 2018 at 6:21 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > > 1. Push down ConvertRowtypeExpr and include it in the pushed down targetlist. > This would solve both the problems described above. Both set_plan_ref() and > get_relation_column_alias_ids() will find ConvertRowtypeExpr, they are looking > for and won't throw an error. > > This requires two parts > a. In build_tlist_to_deparse(), instead of pulling > Var node from ConvertRowtypeExpr, we pull whole ConvertRowtypeExpr and include > it in the targetlist being deparsed which is also used as to set > fdw_scan_tlist. In order to pull any ConvertRowtypeExpr's in the local quals, > which may be hidden in the expression tree, we will add two more options to > flags viz. PVC_INCLUDE_CONVERTROWTYPEEXPR and PVC_RECURSE_CONVERTROWTYPEEXPR. > Unlike the other PVC_* options, which do not default to a value, we may want to > default to PVC_RECURSE_CONVERTROWTYPEEXPR if nothing is specified. That will > avoid, possibly updating every pull_var_clause call with > PVC_RECURSE_CONVERTROWTYPEEXPR. > b. deparse ConvertRowtypeExpr > For this we need to get the conversion map between the parent and child. We > then deparse ConvertRowtypeExpr as a ROW() with the attributes of child > rearranged per the conversion map. A multi-level partitioned table will have > nested ConvertRowtypeExpr. To deparse such expressions, we need to find the > conversion map between the topmost parent and the child, by ignoring any > intermediate parents. > > Here's patchset implementing this solution. 0001 adds PVC_*_CONVERTROWTYPEEXPR to pull_var_clause() and adjusts its callers. 0002 fixes a similar bug for regular partitioned tables. The patch has testcase. The commit message explains the bug in more detail. 0003 has postgres_fdw fixes as outlined above with tests. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/02/20 18:13), Ashutosh Bapat wrote: > Here's patchset implementing this solution. > > 0001 adds PVC_*_CONVERTROWTYPEEXPR to pull_var_clause() and adjusts its callers. > > 0002 fixes a similar bug for regular partitioned tables. The patch has > testcase. The commit message explains the bug in more detail. > > 0003 has postgres_fdw fixes as outlined above with tests. The patch 0003 doesn't applies to the latest HEAD any more, so please update the patch. I think this should be an open item for PG11. My apologizes for not having reviewed the patch in the last commitfest. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Tue, Apr 17, 2018 at 2:05 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2018/02/20 18:13), Ashutosh Bapat wrote: >> >> Here's patchset implementing this solution. >> >> 0001 adds PVC_*_CONVERTROWTYPEEXPR to pull_var_clause() and adjusts its >> callers. >> >> 0002 fixes a similar bug for regular partitioned tables. The patch has >> testcase. The commit message explains the bug in more detail. >> >> 0003 has postgres_fdw fixes as outlined above with tests. > > > The patch 0003 doesn't applies to the latest HEAD any more, so please update > the patch. I think this should be an open item for PG11. My apologizes for > not having reviewed the patch in the last commitfest. Here's updated patch-set. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/04/17 18:43), Ashutosh Bapat wrote: > On Tue, Apr 17, 2018 at 2:05 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> (2018/02/20 18:13), Ashutosh Bapat wrote: >>> >>> Here's patchset implementing this solution. >>> >>> 0001 adds PVC_*_CONVERTROWTYPEEXPR to pull_var_clause() and adjusts its >>> callers. >>> >>> 0002 fixes a similar bug for regular partitioned tables. The patch has >>> testcase. The commit message explains the bug in more detail. >>> >>> 0003 has postgres_fdw fixes as outlined above with tests. >> >> >> The patch 0003 doesn't applies to the latest HEAD any more, so please update >> the patch. I think this should be an open item for PG11. My apologizes for >> not having reviewed the patch in the last commitfest. > > Here's updated patch-set. Thanks for updating the patch set! Will review. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/04/17 19:00), Etsuro Fujita wrote: > (2018/04/17 18:43), Ashutosh Bapat wrote: >> Here's updated patch-set. > > Will review. I started reviewing this. o 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch: + else if (IsA(node, ConvertRowtypeExpr)) + { +#ifdef USE_ASSERT_CHECKING + ConvertRowtypeExpr *cre = castNode(ConvertRowtypeExpr, node); + Var *var; + + /* + * ConvertRowtypeExprs only result when parent's whole-row reference is + * translated for a child using adjust_appendrel_attrs(). That function + * does not handle any upper level Var references. + */ + while (IsA(cre->arg, ConvertRowtypeExpr)) + cre = castNode(ConvertRowtypeExpr, cre->arg); + var = castNode(Var, cre->arg); + Assert (var->varlevelsup == 0); +#endif /* USE_ASSERT_CHECKING */ Isn't it better to throw ERROR as in other cases in pull_vars_clause()? Another thing I noticed about this patch is this: postgres=# create table prt1 (a int, b int, c varchar) partition by range (a); postgres=# create table prt1_p1 partition of prt1 FOR VALUES FROM (0) TO (250); postgres=# create table prt1_p2 partition of prt1 FOR VALUES FROM (250) TO (500) ; postgres=# insert into prt1 select i, i % 25, to_char(i, 'FM0000') from generate _series(0, 499) i where i % 2 = 0; postgres=# analyze prt1; postgres=# create table prt2 (a int, b int, c varchar) partition by range (b); postgres=# create table prt2_p1 partition of prt2 FOR VALUES FROM (0) TO (250); postgres=# create table prt2_p2 partition of prt2 FOR VALUES FROM (250) TO (500) ; postgres=# insert into prt2 select i % 25, i, to_char(i, 'FM0000') from generate _series(0, 499) i where i % 3 = 0; postgres=# analyze prt2; postgres=# update prt1 t1 set c = 'foo' from prt2 t2 where t1::text = t2::text a nd t1.a = t2.b; ERROR: ConvertRowtypeExpr found where not expected To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES to pull_vars_clause() in distribute_qual_to_rels() and generate_base_implied_equalities_no_const() as well. That's all I have for now. Will continue the review. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Tue, Apr 24, 2018 at 4:49 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2018/04/17 19:00), Etsuro Fujita wrote: >> >> (2018/04/17 18:43), Ashutosh Bapat wrote: >>> >>> Here's updated patch-set. >> >> >> Will review. > > > I started reviewing this. > > o 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch: > > + else if (IsA(node, ConvertRowtypeExpr)) > + { > +#ifdef USE_ASSERT_CHECKING > + ConvertRowtypeExpr *cre = castNode(ConvertRowtypeExpr, node); > + Var *var; > + > + /* > + * ConvertRowtypeExprs only result when parent's whole-row reference > is > + * translated for a child using adjust_appendrel_attrs(). That > function > + * does not handle any upper level Var references. > + */ > + while (IsA(cre->arg, ConvertRowtypeExpr)) > + cre = castNode(ConvertRowtypeExpr, cre->arg); > + var = castNode(Var, cre->arg); > + Assert (var->varlevelsup == 0); > +#endif /* USE_ASSERT_CHECKING */ > > Isn't it better to throw ERROR as in other cases in pull_vars_clause()? Actually I noticed that ConvertRowtypeExpr are used to cast a child's whole row reference expression (not just a Var node) into that of its parent and not. For example a cast like NULL::child::parent produces a ConvertRowtypeExpr encapsulating a NULL constant node and not a Var node. We are interested only in ConvertRowtypeExprs embedding Var nodes with Var::varattno = 0. I have changed this code to use function is_converted_whole_row_reference() instead of the above code with Assert. In order to use that function, I had to take it out of setrefs.c and put it in nodeFuncs.c which seemed to be a better fit. > Another thing I noticed about this patch is this: > > postgres=# create table prt1 (a int, b int, c varchar) partition by range > (a); > postgres=# create table prt1_p1 partition of prt1 FOR VALUES FROM (0) TO > (250); > postgres=# create table prt1_p2 partition of prt1 FOR VALUES FROM (250) TO > (500) > ; > postgres=# insert into prt1 select i, i % 25, to_char(i, 'FM0000') from > generate > _series(0, 499) i where i % 2 = 0; > postgres=# analyze prt1; > postgres=# create table prt2 (a int, b int, c varchar) partition by range > (b); > postgres=# create table prt2_p1 partition of prt2 FOR VALUES FROM (0) TO > (250); > postgres=# create table prt2_p2 partition of prt2 FOR VALUES FROM (250) TO > (500) > ; > postgres=# insert into prt2 select i % 25, i, to_char(i, 'FM0000') from > generate > _series(0, 499) i where i % 3 = 0; > postgres=# analyze prt2; > > postgres=# update prt1 t1 set c = 'foo' from prt2 t2 where t1::text = > t2::text a > nd t1.a = t2.b; > ERROR: ConvertRowtypeExpr found where not expected > > To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES to > pull_vars_clause() in distribute_qual_to_rels() and > generate_base_implied_equalities_no_const() as well. Thanks for the catch. I have updated patch with your suggested fix. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/04/25 18:51), Ashutosh Bapat wrote: > On Tue, Apr 24, 2018 at 4:49 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> o 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch: >> Another thing I noticed about this patch is this: >> >> postgres=# create table prt1 (a int, b int, c varchar) partition by range >> (a); >> postgres=# create table prt1_p1 partition of prt1 FOR VALUES FROM (0) TO >> (250); >> postgres=# create table prt1_p2 partition of prt1 FOR VALUES FROM (250) TO >> (500) >> ; >> postgres=# insert into prt1 select i, i % 25, to_char(i, 'FM0000') from >> generate >> _series(0, 499) i where i % 2 = 0; >> postgres=# analyze prt1; >> postgres=# create table prt2 (a int, b int, c varchar) partition by range >> (b); >> postgres=# create table prt2_p1 partition of prt2 FOR VALUES FROM (0) TO >> (250); >> postgres=# create table prt2_p2 partition of prt2 FOR VALUES FROM (250) TO >> (500) >> ; >> postgres=# insert into prt2 select i % 25, i, to_char(i, 'FM0000') from >> generate >> _series(0, 499) i where i % 3 = 0; >> postgres=# analyze prt2; >> >> postgres=# update prt1 t1 set c = 'foo' from prt2 t2 where t1::text = >> t2::text a >> nd t1.a = t2.b; >> ERROR: ConvertRowtypeExpr found where not expected >> >> To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES to >> pull_vars_clause() in distribute_qual_to_rels() and >> generate_base_implied_equalities_no_const() as well. > > Thanks for the catch. I have updated patch with your suggested fix. Thanks, but I don't think that's enough. Consider: postgres=# create table base_tbl (a int, b int); postgres=# insert into base_tbl select i % 25, i from generate_series(0, 499) i where i % 6 = 0; postgres=# update prt1 t1 set c = 'foo' from prt2 t2 left join (select a, b, 1 from base_tbl) ss(c1, c2, c3) on (t2.b = ss.c2) where t1::text = t2::text and t1.a = t2.b; ERROR: ConvertRowtypeExpr found where not expected To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES to pull_var_clause() in find_placeholders_in_expr() as well. The patch doesn't pass that to pull_var_clause() in other places such as fix_placeholder_input_needed_levels() or planner.c, but I don't understand the reason why that's OK. Sorry, I've not finished the review yet, so I'll continue that. Thanks for updating the patch! Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Thu, Apr 26, 2018 at 5:36 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2018/04/25 18:51), Ashutosh Bapat wrote: >> >> On Tue, Apr 24, 2018 at 4:49 PM, Etsuro Fujita >> <fujita.etsuro@lab.ntt.co.jp> wrote: >>> >>> o 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch: > > >>> Another thing I noticed about this patch is this: >>> >>> postgres=# create table prt1 (a int, b int, c varchar) partition by range >>> (a); >>> postgres=# create table prt1_p1 partition of prt1 FOR VALUES FROM (0) TO >>> (250); >>> postgres=# create table prt1_p2 partition of prt1 FOR VALUES FROM (250) >>> TO >>> (500) >>> ; >>> postgres=# insert into prt1 select i, i % 25, to_char(i, 'FM0000') from >>> generate >>> _series(0, 499) i where i % 2 = 0; >>> postgres=# analyze prt1; >>> postgres=# create table prt2 (a int, b int, c varchar) partition by range >>> (b); >>> postgres=# create table prt2_p1 partition of prt2 FOR VALUES FROM (0) TO >>> (250); >>> postgres=# create table prt2_p2 partition of prt2 FOR VALUES FROM (250) >>> TO >>> (500) >>> ; >>> postgres=# insert into prt2 select i % 25, i, to_char(i, 'FM0000') from >>> generate >>> _series(0, 499) i where i % 3 = 0; >>> postgres=# analyze prt2; >>> >>> postgres=# update prt1 t1 set c = 'foo' from prt2 t2 where t1::text = >>> t2::text a >>> nd t1.a = t2.b; >>> ERROR: ConvertRowtypeExpr found where not expected >>> >>> To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES to >>> pull_vars_clause() in distribute_qual_to_rels() and >>> generate_base_implied_equalities_no_const() as well. >> >> >> Thanks for the catch. I have updated patch with your suggested fix. > > > Thanks, but I don't think that's enough. Consider: > > postgres=# create table base_tbl (a int, b int); > postgres=# insert into base_tbl select i % 25, i from generate_series(0, > 499) i where i % 6 = 0; > > postgres=# update prt1 t1 set c = 'foo' from prt2 t2 left join (select a, b, > 1 from base_tbl) ss(c1, c2, c3) on (t2.b = ss.c2) where t1::text = t2::text > and t1.a = t2.b; > ERROR: ConvertRowtypeExpr found where not expected > Thanks for the test. > To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES to > pull_var_clause() in find_placeholders_in_expr() as well. > > The patch doesn't pass that to pull_var_clause() in other places such as > fix_placeholder_input_needed_levels() or planner.c, but I don't understand > the reason why that's OK. I was trying to be conservative not to include PVC_RECURSE_CONVERTROWTYPES everywhere. But it looks like because of inheritance_planner() and partition-wise aggregate commit, we can have ConvertRowtypeExprs almost everywhere. Added PVC_RECURSE_CONVERTROWTYPEs in almost all the places except the ones listed in 0001 patch. > > Sorry, I've not finished the review yet, so I'll continue that. Thanks for the tests and review. Here's updated patch set. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/04/27 14:40), Ashutosh Bapat wrote: > Here's updated patch set. Thanks for updating the patch! Here are my review comments on patch 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch: * This assertion in deparseConvertRowtypeExpr wouldn't always be true because of cases like ALTER FOREIGN TABLE DROP COLUMN plus ALTER FOREIGN TABLE ADD COLUMN: + Assert(parent_desc->natts == child_desc->natts); Here is such an example: postgres=# create table fprt1 (a int, b int, c varchar) partition by range (a); postgres=# create table fprt1_p1 (like fprt1); postgres=# create table fprt1_p2 (like fprt1); postgres=# create foreign table ftprt1_p1 (a int, b int, c varchar) server loopback options (table_name 'fprt1_p1', use_remote_estimate 'true'); postgres=# alter foreign table ftprt1_p1 drop column c; postgres=# alter foreign table ftprt1_p1 add column c varchar; postgres=# alter table fprt1 attach partition ftprt1_p1 for values from (0) to (250); postgres=# create foreign table ftprt1_p2 partition of fprt1 for values from (250) to (500) server loopback options (table_name 'fprt1_p2', use_remote_estimate 'true'); postgres=# insert into fprt1 select i, i, to_char(i/50, 'FM0000') from generate_series(0, 499, 2) i; postgres=# analyze fprt1; postgres=# analyze fprt1_p1; postgres=# analyze fprt1_p2; postgres=# create table fprt2 (a int, b int, c varchar) partition by range (b); postgres=# create table fprt2_p1 (like fprt2); postgres=# create table fprt2_p2 (like fprt2); postgres=# create foreign table ftprt2_p1 partition of fprt2 for values from (0) to (250) server loopback options (table_name 'fprt2_p1', use_remote_estimate 'true'); postgres=# create foreign table ftprt2_p2 partition of fprt2 for values from (250) to (500) server loopback options (table_name 'fprt2_p2', use_remote_estimate 'true'); postgres=# insert into fprt2 select i, i, to_char(i/50, 'FM0000') from generate_series(0, 499, 3) i; postgres=# analyze fprt2; postgres=# analyze fprt2_p1; postgres=# analyze fprt2_p2; postgres=# set enable_partitionwise_join to true; postgres=# select t1, t2 from fprt1 t1 join fprt2 t2 on (t1.a = t2.b) where t1.a % 25 = 0; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. I think we should remove that assertion. * But I don't think that change is enough. Here is another oddity with that assertion removed. postgres=# alter table fprt1 detach partition ftprt1_p1; postgres=# alter table fprt1 detach partition ftprt1_p2; postgres=# alter table fprt1 drop column c; postgres=# alter table fprt1 add column c varchar; postgres=# alter table fprt1 attach partition ftprt1_p1 for values from (0) to (250); postgres=# alter table fprt1 attach partition ftprt1_p2 for values from (250) to (500); postgres=# set enable_partitionwise_join to true; postgres=# select t1, t2 from fprt1 t1 join fprt2 t2 on (t1.a = t2.b) where t1.a % 25 = 0; ERROR: malformed record literal: "(300,300,"(300,300,0006)",0006)" DETAIL: Too many columns. CONTEXT: processing expression at position 1 in select list The reason for that would be: in the following, the patch doesn't take care of dropped columns in the parent table (ie, columns such that attrMap[cnt]=0). + /* Construct ROW expression according to the conversion map. */ + appendStringInfoString(buf, "ROW("); + for (cnt = 0; cnt < natts; cnt++) + { + if (cnt > 0) + appendStringInfoString(buf, ", "); + deparseColumnRef(buf, child_var->varno, attrMap[cnt], root, + qualify_col); + } + appendStringInfoChar(buf, ')'); To fix this, I think we should skip the deparseColumnRef processing for dropped columns in the parent table. * I think it would be better to do EXPLAIN with the VERBOSE option to the queries this patch added to the regression tests, to see if ConvertRowtypeExprs are correctly deparsed in the remote queries. That's all I have for now. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Tue, May 1, 2018 at 5:00 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2018/04/27 14:40), Ashutosh Bapat wrote: >> >> Here's updated patch set. > > > Thanks for updating the patch! Here are my review comments on patch > 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch: > > * This assertion in deparseConvertRowtypeExpr wouldn't always be true > because of cases like ALTER FOREIGN TABLE DROP COLUMN plus ALTER FOREIGN > TABLE ADD COLUMN: > > + Assert(parent_desc->natts == child_desc->natts); > > Here is such an example: > > I think we should remove that assertion. Removed. > > * But I don't think that change is enough. Here is another oddity with that > assertion removed. > > To fix this, I think we should skip the deparseColumnRef processing for > dropped columns in the parent table. Done. I have changed the test file a bit to test these scenarios. Thanks for testing. > > * I think it would be better to do EXPLAIN with the VERBOSE option to the > queries this patch added to the regression tests, to see if > ConvertRowtypeExprs are correctly deparsed in the remote queries. The reason VERBOSE option was omitted for partition-wise join was to avoid repeated and excessive output for every child-join. I think that still applies. PFA patch-set with the fixes. I also noticed that the new function deparseConvertRowtypeExpr is not quite following the naming convention of the other deparseFoo functions. Foo is usually the type of the node the parser would produced when the SQL string produced by that function is parsed. That doesn't hold for the SQL string produced by ConvertRowtypeExpr but then naming it as generic deparseRowExpr() wouldn't be correct either. And then there are functions like deparseColumnRef which may produce a variety of SQL strings which get parsed into different node types e.g. a whole-row reference would produce ROW(...) which gets parsed as a RowExpr. Please let me know if you have any suggestions for the name. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/05/08 16:20), Ashutosh Bapat wrote: > On Tue, May 1, 2018 at 5:00 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> Here are my review comments on patch >> 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch: >> >> * This assertion in deparseConvertRowtypeExpr wouldn't always be true >> because of cases like ALTER FOREIGN TABLE DROP COLUMN plus ALTER FOREIGN >> TABLE ADD COLUMN: >> >> + Assert(parent_desc->natts == child_desc->natts); >> I think we should remove that assertion. > > Removed. >> * But I don't think that change is enough. Here is another oddity with that >> assertion removed. >> To fix this, I think we should skip the deparseColumnRef processing for >> dropped columns in the parent table. > > Done. Thanks for those changes! > I have changed the test file a bit to test these scenarios. +1 >> * I think it would be better to do EXPLAIN with the VERBOSE option to the >> queries this patch added to the regression tests, to see if >> ConvertRowtypeExprs are correctly deparsed in the remote queries. > > The reason VERBOSE option was omitted for partition-wise join was to > avoid repeated and excessive output for every child-join. I think that > still applies. I'd like to leave that for the committer's judge. > PFA patch-set with the fixes. Thanks for updating the patch! > I also noticed that the new function deparseConvertRowtypeExpr is not > quite following the naming convention of the other deparseFoo > functions. Foo is usually the type of the node the parser would > produced when the SQL string produced by that function is parsed. That > doesn't hold for the SQL string produced by ConvertRowtypeExpr but > then naming it as generic deparseRowExpr() wouldn't be correct either. > And then there are functions like deparseColumnRef which may produce a > variety of SQL strings which get parsed into different node types e.g. > a whole-row reference would produce ROW(...) which gets parsed as a > RowExpr. Please let me know if you have any suggestions for the name. To be honest, I don't have any strong opinion about that. But I like "deparseConvertRowtypeExpr" because that name seems to well represent the functionality, so I'd vote for that. BTW, one thing I'm a bit concerned about is this: (2018/04/25 18:51), Ashutosh Bapat wrote: > Actually I noticed that ConvertRowtypeExpr are used to cast a child's > whole row reference expression (not just a Var node) into that of its > parent and not. For example a cast like NULL::child::parent produces a > ConvertRowtypeExpr encapsulating a NULL constant node and not a Var > node. We are interested only in ConvertRowtypeExprs embedding Var > nodes with Var::varattno = 0. I have changed this code to use function > is_converted_whole_row_reference() instead of the above code with > Assert. In order to use that function, I had to take it out of > setrefs.c and put it in nodeFuncs.c which seemed to be a better fit. This change seems a bit confusing to me because the flag bits "PVC_INCLUDE_CONVERTROWTYPES" and "PVC_RECURSE_CONVERTROWTYPES" passed to pull_var_clause look as if it handles any ConvertRowtypeExpr nodes from a given clause, but really, it only handles ConvertRowtypeExprs you mentioned above. To make that function easy to understand and use, I think it'd be better to use the IsA(node, ConvertRowtypeExpr) test as in the first version of the patch, instead of is_converted_whole_row_reference, which would be more consistent with other cases such as PlaceHolderVar. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Wed, May 9, 2018 at 5:20 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > > (2018/04/25 18:51), Ashutosh Bapat wrote: >> Actually I noticed that ConvertRowtypeExpr are used to cast a child's >> whole row reference expression (not just a Var node) into that of its >> parent and not. For example a cast like NULL::child::parent produces a >> ConvertRowtypeExpr encapsulating a NULL constant node and not a Var >> node. We are interested only in ConvertRowtypeExprs embedding Var >> nodes with Var::varattno = 0. I have changed this code to use function >> is_converted_whole_row_reference() instead of the above code with >> Assert. In order to use that function, I had to take it out of >> setrefs.c and put it in nodeFuncs.c which seemed to be a better fit. > > This change seems a bit confusing to me because the flag bits > "PVC_INCLUDE_CONVERTROWTYPES" and "PVC_RECURSE_CONVERTROWTYPES" passed to > pull_var_clause look as if it handles any ConvertRowtypeExpr nodes from a > given clause, but really, it only handles ConvertRowtypeExprs you mentioned > above. To make that function easy to understand and use, I think it'd be > better to use the IsA(node, ConvertRowtypeExpr) test as in the first version > of the patch, instead of is_converted_whole_row_reference, which would be > more consistent with other cases such as PlaceHolderVar. I agree that using is_converted_whole_row_reference() is not consistent with the other nodes that are handled by pull_var_clause(). I also agree that PVC_*_CONVERTROWTYPES doesn't reflect exactly what's being done with those options. But using is_converted_whole_row_reference() is the correct thing to do since we are interested only in the whole-row references embedded in ConvertRowtypeExpr. There can be anything encapsulated in ConvertRowtypeExpr(), a non-shippable function for example. We don't want to try to push that down in postgres_fdw's case. Neither in other cases. So, it boils down to changing names of PVC_*_CONVERTROWTYPES to something more meaningful. How about PVC_*_CONVERTWHOLEROWS? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/05/10 13:04), Ashutosh Bapat wrote: > On Wed, May 9, 2018 at 5:20 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> (2018/04/25 18:51), Ashutosh Bapat wrote: >>> Actually I noticed that ConvertRowtypeExpr are used to cast a child's >>> whole row reference expression (not just a Var node) into that of its >>> parent and not. For example a cast like NULL::child::parent produces a >>> ConvertRowtypeExpr encapsulating a NULL constant node and not a Var >>> node. We are interested only in ConvertRowtypeExprs embedding Var >>> nodes with Var::varattno = 0. I have changed this code to use function >>> is_converted_whole_row_reference() instead of the above code with >>> Assert. In order to use that function, I had to take it out of >>> setrefs.c and put it in nodeFuncs.c which seemed to be a better fit. >> >> This change seems a bit confusing to me because the flag bits >> "PVC_INCLUDE_CONVERTROWTYPES" and "PVC_RECURSE_CONVERTROWTYPES" passed to >> pull_var_clause look as if it handles any ConvertRowtypeExpr nodes from a >> given clause, but really, it only handles ConvertRowtypeExprs you mentioned >> above. To make that function easy to understand and use, I think it'd be >> better to use the IsA(node, ConvertRowtypeExpr) test as in the first version >> of the patch, instead of is_converted_whole_row_reference, which would be >> more consistent with other cases such as PlaceHolderVar. > > I agree that using is_converted_whole_row_reference() is not > consistent with the other nodes that are handled by pull_var_clause(). > I also agree that PVC_*_CONVERTROWTYPES doesn't reflect exactly what's > being done with those options. But using > is_converted_whole_row_reference() is the correct thing to do since we > are interested only in the whole-row references embedded in > ConvertRowtypeExpr. There can be anything encapsulated in > ConvertRowtypeExpr(), a non-shippable function for example. We don't > want to try to push that down in postgres_fdw's case. Neither in other > cases. Yeah, but I think the IsA test would be sufficient to make things work correctly because we can assume that there aren't any other nodes than Var, PHV, and CRE translating a wholerow value of a child table into a wholerow value of its parent table's rowtype in the expression clause to which we apply pull_var_clause with PVC_INCLUDE_CONVERTROWTYPES. BTW, one thing I noticed about the new version of the patch is: there is yet another case where pull_var_clause produces an error. Here is an example: postgres=# create table t1 (a int, b text); CREATE TABLE postgres=# create table t2 (a int, b text); CREATE TABLE postgres=# create foreign table ft1 (a int, b text) server loopback options (table_name 't1'); CREATE FOREIGN TABLE postgres=# create foreign table ft2 (a int, b text) server loopback options (table_name 't2'); CREATE FOREIGN TABLE postgres=# insert into ft1 values (1, 'foo'); INSERT 0 1 postgres=# insert into ft1 values (2, 'bar'); INSERT 0 1 postgres=# insert into ft2 values (1, 'test1'); INSERT 0 1 postgres=# insert into ft2 values (2, 'test2'); INSERT 0 1 postgres=# analyze ft1; ANALYZE postgres=# analyze ft2; ANALYZE postgres=# create table parent (a int, b text); CREATE TABLE postgres=# alter foreign table ft1 inherit parent; ALTER FOREIGN TABLE postgres=# explain verbose update parent set b = ft2.b from ft2 where parent.a = ft2.a returning parent; ERROR: ConvertRowtypeExpr found where not expected To produce this, apply the patch in [1] as well as the new version; without that patch in [1], the update query would crash the server (see [1] for details). To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES to pull_var_clause in build_remote_returning in postgres_fdw.c as well. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/5AF43E02.30000%40lab.ntt.co.jp
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Thu, May 10, 2018 at 6:23 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2018/05/10 13:04), Ashutosh Bapat wrote: >> >> On Wed, May 9, 2018 at 5:20 PM, Etsuro Fujita >> <fujita.etsuro@lab.ntt.co.jp> wrote: >>> >>> (2018/04/25 18:51), Ashutosh Bapat wrote: >>>> >>>> Actually I noticed that ConvertRowtypeExpr are used to cast a child's >>>> whole row reference expression (not just a Var node) into that of its >>>> parent and not. For example a cast like NULL::child::parent produces a >>>> ConvertRowtypeExpr encapsulating a NULL constant node and not a Var >>>> node. We are interested only in ConvertRowtypeExprs embedding Var >>>> nodes with Var::varattno = 0. I have changed this code to use function >>>> is_converted_whole_row_reference() instead of the above code with >>>> Assert. In order to use that function, I had to take it out of >>>> setrefs.c and put it in nodeFuncs.c which seemed to be a better fit. >>> >>> >>> This change seems a bit confusing to me because the flag bits >>> "PVC_INCLUDE_CONVERTROWTYPES" and "PVC_RECURSE_CONVERTROWTYPES" passed to >>> pull_var_clause look as if it handles any ConvertRowtypeExpr nodes from a >>> given clause, but really, it only handles ConvertRowtypeExprs you >>> mentioned >>> above. To make that function easy to understand and use, I think it'd be >>> better to use the IsA(node, ConvertRowtypeExpr) test as in the first >>> version >>> of the patch, instead of is_converted_whole_row_reference, which would be >>> more consistent with other cases such as PlaceHolderVar. >> >> >> I agree that using is_converted_whole_row_reference() is not >> consistent with the other nodes that are handled by pull_var_clause(). >> I also agree that PVC_*_CONVERTROWTYPES doesn't reflect exactly what's >> being done with those options. But using >> is_converted_whole_row_reference() is the correct thing to do since we >> are interested only in the whole-row references embedded in >> ConvertRowtypeExpr. There can be anything encapsulated in >> ConvertRowtypeExpr(), a non-shippable function for example. We don't >> want to try to push that down in postgres_fdw's case. Neither in other >> cases. > > > Yeah, but I think the IsA test would be sufficient to make things work > correctly because we can assume that there aren't any other nodes than Var, > PHV, and CRE translating a wholerow value of a child table into a wholerow > value of its parent table's rowtype in the expression clause to which we > apply pull_var_clause with PVC_INCLUDE_CONVERTROWTYPES. Not really. Consider following case using the same tables fprt1 and fprt2 in test sql/postgres_fdw.sql create function row_as_is(a ftprt2_p1) returns ftprt2_p1 as $$begin return a; end;$$ language plpgsql; With the change you propose i.e. diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c index f972712..088da14 100644 --- a/src/backend/optimizer/util/var.c +++ b/src/backend/optimizer/util/var.c @@ -646,8 +646,9 @@ pull_var_clause_walker(Node *node, pull_var_clause_context *context) else elog(ERROR, "PlaceHolderVar found where not expected"); } - else if (is_converted_whole_row_reference(node)) + else if (IsA(node, ConvertRowtypeExpr)) { + Assert(is_converted_whole_row_reference(node)); if (context->flags & PVC_INCLUDE_CONVERTROWTYPES) { context->varlist = lappend(context->varlist, node); following query crashes at Assert(is_converted_whole_row_reference(node)); If we remove that assert, it would end up pushing down row_as_is(), which is a local user defined function, to the foreign server. That's not desirable since the foreign server may not have that user defined function. > > BTW, one thing I noticed about the new version of the patch is: there is yet > another case where pull_var_clause produces an error. Here is an example: > > postgres=# create table t1 (a int, b text); > CREATE TABLE > postgres=# create table t2 (a int, b text); > CREATE TABLE > postgres=# create foreign table ft1 (a int, b text) server loopback options > (table_name 't1'); > CREATE FOREIGN TABLE > postgres=# create foreign table ft2 (a int, b text) server loopback options > (table_name 't2'); > CREATE FOREIGN TABLE > postgres=# insert into ft1 values (1, 'foo'); > INSERT 0 1 > postgres=# insert into ft1 values (2, 'bar'); > INSERT 0 1 > postgres=# insert into ft2 values (1, 'test1'); > INSERT 0 1 > postgres=# insert into ft2 values (2, 'test2'); > INSERT 0 1 > postgres=# analyze ft1; > ANALYZE > postgres=# analyze ft2; > ANALYZE > postgres=# create table parent (a int, b text); > CREATE TABLE > postgres=# alter foreign table ft1 inherit parent; > ALTER FOREIGN TABLE > postgres=# explain verbose update parent set b = ft2.b from ft2 where > parent.a = ft2.a returning parent; > ERROR: ConvertRowtypeExpr found where not expected > > To produce this, apply the patch in [1] as well as the new version; without > that patch in [1], the update query would crash the server (see [1] for > details). To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES > to pull_var_clause in build_remote_returning in postgres_fdw.c as well. I missed that call to PVC since it was added after I had created my patches. That's a disadvantage of having changed PVC to use flags instead of bools. We won't notice such changes. Thanks for pointing it out. Changed as per your suggestion. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/05/11 16:17), Ashutosh Bapat wrote: > On Thu, May 10, 2018 at 6:23 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> Yeah, but I think the IsA test would be sufficient to make things work >> correctly because we can assume that there aren't any other nodes than Var, >> PHV, and CRE translating a wholerow value of a child table into a wholerow >> value of its parent table's rowtype in the expression clause to which we >> apply pull_var_clause with PVC_INCLUDE_CONVERTROWTYPES. > > Not really. > Consider following case using the same tables fprt1 and fprt2 in test > sql/postgres_fdw.sql > create function row_as_is(a ftprt2_p1) returns ftprt2_p1 as $$begin > return a; end;$$ language plpgsql; > > With the change you propose i.e. > > diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c > index f972712..088da14 100644 > --- a/src/backend/optimizer/util/var.c > +++ b/src/backend/optimizer/util/var.c > @@ -646,8 +646,9 @@ pull_var_clause_walker(Node *node, > pull_var_clause_context *context) > else > elog(ERROR, "PlaceHolderVar found where not expected"); > } > - else if (is_converted_whole_row_reference(node)) > + else if (IsA(node, ConvertRowtypeExpr)) > { > + Assert(is_converted_whole_row_reference(node)); > if (context->flags& PVC_INCLUDE_CONVERTROWTYPES) > { > context->varlist = lappend(context->varlist, node); > > > following query crashes at Assert(is_converted_whole_row_reference(node)); I guess you forgot to show the query. > If we remove that assert, it would end up pushing down row_as_is(), > which is a local user defined function, to the foreign server. That's > not desirable since the foreign server may not have that user defined > function. I don't understand the counter-example you tried to show, but I think that I was wrong here; the IsA test isn't sufficient. >> BTW, one thing I noticed about the new version of the patch is: there is yet >> another case where pull_var_clause produces an error. Here is an example: >> To produce this, apply the patch in [1] as well as the new version; without >> that patch in [1], the update query would crash the server (see [1] for >> details). To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES >> to pull_var_clause in build_remote_returning in postgres_fdw.c as well. > > I missed that call to PVC since it was added after I had created my > patches. That's a disadvantage of having changed PVC to use flags > instead of bools. We won't notice such changes. Thanks for pointing it > out. Changed as per your suggestion. Thanks for that change and the updated version! Yet yet another case where pull_var_clause produces an error: postgres=# create table list_pt (a int, b text) partition by list (a); CREATE TABLE postgres=# create table list_ptp1 partition of list_pt for values in (1); CREATE TABLE postgres=# alter table list_ptp1 add constraint list_ptp1_check check (list_ptp1::list_pt::text != NULL); ERROR: ConvertRowtypeExpr found where not expected The patch kept the flags passed to pull_var_clause in src/backend/catalog/heap.c as-is, but I think we need to modify that accordingly. Probably, the flags passed to pull_var_clause in CreateTrigger as well. Having said that, I started to think this approach would make code much complicated. Another idea I came up with to avoid that would be to use pull_var_clause as-is and then rewrite wholerows of partitions back to ConvertRowtypeExprs translating those wholerows to wholerows of their parents tables' rowtypes if necessary. That would be annoying and a little bit inefficient, but the places where we need to rewrite is limited: prepare_sort_from_pathkeys and build_tlist_to_deparse, IIUC. So we could really minimize the code change and avoid the additional overhead caused by the is_converted_whole_row_reference test added to pull_var_clause. I think the latter would be rather important, because PWJ is disabled by default. What do you think about that approach? Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Fri, May 11, 2018 at 6:31 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > > I guess you forgot to show the query. Sorry. Here's the query. explain verbose select * from fprt1 t1 join fprt2 t2 on t1.a = t2.b where t1 = row_as_is(row(t2.a, t2.b, t2.c)::ftprt2_p1)::fprt2; > > Yet yet another case where pull_var_clause produces an error: > > postgres=# create table list_pt (a int, b text) partition by list (a); > CREATE TABLE > postgres=# create table list_ptp1 partition of list_pt for values in (1); > CREATE TABLE > postgres=# alter table list_ptp1 add constraint list_ptp1_check check > (list_ptp1::list_pt::text != NULL); > ERROR: ConvertRowtypeExpr found where not expected > > The patch kept the flags passed to pull_var_clause in > src/backend/catalog/heap.c as-is, but I think we need to modify that > accordingly. Probably, the flags passed to pull_var_clause in CreateTrigger > as well. With all the support that we have added in partitioning for v11, I think we need to add PVC_RECURSE_CONVERTROWTYPE at all the places, which were left unchanged in the earlier versions of patches, which were written when all that support wasn't in v11. > > Having said that, I started to think this approach would make code much > complicated. Another idea I came up with to avoid that would be to use > pull_var_clause as-is and then rewrite wholerows of partitions back to > ConvertRowtypeExprs translating those wholerows to wholerows of their > parents tables' rowtypes if necessary. That would be annoying and a little > bit inefficient, but the places where we need to rewrite is limited: > prepare_sort_from_pathkeys and build_tlist_to_deparse, IIUC. Other reason why we can't use your approach is we can not decide whether ConvertRowtypeExpr is needed by just looking at the Var node. You might say, that we add a ConvertRowtypeExpr if the Var::varno references a child relation. But that's not true. For example a whole row reference embedded in ConvertRowtypeExpr added by query adjustments in inheritance_planner() is not a child's whole-row reference in the adjusted query. So, it's not clear to me when we add a ConvertRowtypeExpr and we don't. I am not sure if those are the only two places which require a fix. Right now it looks like those are the places which need PVC_INCLUDE_CONVERTROWTYPE, but that may not be true in the future, esp. given the amount of work we expect to happen in the partitioning area. When that happens, fixing all those places in that way wouldn't be good. > So we could > really minimize the code change and avoid the additional overhead caused by > the is_converted_whole_row_reference test added to pull_var_clause. I think > the latter would be rather important, because PWJ is disabled by default. > What do you think about that approach? Partition-wise join and partition-wise aggregation are disabled by default temporarily. We will change it in near future once the memory consumption issue has been tackled. The features are useful and users would want those to be enabled by default like other query optimizations. So, we can't take a decision based on that. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Mon, May 14, 2018 at 10:21 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > On Fri, May 11, 2018 at 6:31 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> >> I guess you forgot to show the query. > > Sorry. Here's the query. > > explain verbose select * from fprt1 t1 join fprt2 t2 on t1.a = t2.b > where t1 = row_as_is(row(t2.a, t2.b, t2.c)::ftprt2_p1)::fprt2; > >> >> Yet yet another case where pull_var_clause produces an error: >> >> postgres=# create table list_pt (a int, b text) partition by list (a); >> CREATE TABLE >> postgres=# create table list_ptp1 partition of list_pt for values in (1); >> CREATE TABLE >> postgres=# alter table list_ptp1 add constraint list_ptp1_check check >> (list_ptp1::list_pt::text != NULL); >> ERROR: ConvertRowtypeExpr found where not expected >> >> The patch kept the flags passed to pull_var_clause in >> src/backend/catalog/heap.c as-is, but I think we need to modify that >> accordingly. Probably, the flags passed to pull_var_clause in CreateTrigger >> as well. > > With all the support that we have added in partitioning for v11, I > think we need to add PVC_RECURSE_CONVERTROWTYPE at all the places, > which were left unchanged in the earlier versions of patches, which > were written when all that support wasn't in v11. > >> >> Having said that, I started to think this approach would make code much >> complicated. Another idea I came up with to avoid that would be to use >> pull_var_clause as-is and then rewrite wholerows of partitions back to >> ConvertRowtypeExprs translating those wholerows to wholerows of their >> parents tables' rowtypes if necessary. That would be annoying and a little >> bit inefficient, but the places where we need to rewrite is limited: >> prepare_sort_from_pathkeys and build_tlist_to_deparse, IIUC. > > Other reason why we can't use your approach is we can not decide > whether ConvertRowtypeExpr is needed by just looking at the Var node. > You might say, that we add a ConvertRowtypeExpr if the Var::varno > references a child relation. But that's not true. For example a whole > row reference embedded in ConvertRowtypeExpr added by query > adjustments in inheritance_planner() is not a child's whole-row > reference in the adjusted query. So, it's not clear to me when we add > a ConvertRowtypeExpr and we don't. I am not sure if those are the only > two places which require a fix. Right now it looks like those are the > places which need PVC_INCLUDE_CONVERTROWTYPE, but that may not be true > in the future, esp. given the amount of work we expect to happen in > the partitioning area. When that happens, fixing all those places in > that way wouldn't be good. > >> So we could >> really minimize the code change and avoid the additional overhead caused by >> the is_converted_whole_row_reference test added to pull_var_clause. I think >> the latter would be rather important, because PWJ is disabled by default. >> What do you think about that approach? > > Partition-wise join and partition-wise aggregation are disabled by > default temporarily. We will change it in near future once the memory > consumption issue has been tackled. The features are useful and users > would want those to be enabled by default like other query > optimizations. So, we can't take a decision based on that. Here's set of updated patches. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/05/14 15:34), Ashutosh Bapat wrote: > On Mon, May 14, 2018 at 10:21 AM, Ashutosh Bapat > <ashutosh.bapat@enterprisedb.com> wrote: >> On Fri, May 11, 2018 at 6:31 PM, Etsuro Fujita >> <fujita.etsuro@lab.ntt.co.jp> wrote: >> Here's the query. >> >> explain verbose select * from fprt1 t1 join fprt2 t2 on t1.a = t2.b >> where t1 = row_as_is(row(t2.a, t2.b, t2.c)::ftprt2_p1)::fprt2; Thanks! >>> Yet yet another case where pull_var_clause produces an error: >>> >>> postgres=# create table list_pt (a int, b text) partition by list (a); >>> CREATE TABLE >>> postgres=# create table list_ptp1 partition of list_pt for values in (1); >>> CREATE TABLE >>> postgres=# alter table list_ptp1 add constraint list_ptp1_check check >>> (list_ptp1::list_pt::text != NULL); >>> ERROR: ConvertRowtypeExpr found where not expected >>> >>> The patch kept the flags passed to pull_var_clause in >>> src/backend/catalog/heap.c as-is, but I think we need to modify that >>> accordingly. Probably, the flags passed to pull_var_clause in CreateTrigger >>> as well. >> >> With all the support that we have added in partitioning for v11, I >> think we need to add PVC_RECURSE_CONVERTROWTYPE at all the places, >> which were left unchanged in the earlier versions of patches, which >> were written when all that support wasn't in v11. Actually, we'd get the same error without using anything in partitioning. Here is an example: postgres=# create table parent (a int, b text); CREATE TABLE postgres=# create table child (a int, b text); CREATE TABLE postgres=# alter table child inherit parent ; ALTER TABLE postgres=# alter table child add constraint child_check check (child::parent::text != NULL); ERROR: ConvertRowtypeExpr found where not expected >>> Having said that, I started to think this approach would make code much >>> complicated. Another idea I came up with to avoid that would be to use >>> pull_var_clause as-is and then rewrite wholerows of partitions back to >>> ConvertRowtypeExprs translating those wholerows to wholerows of their >>> parents tables' rowtypes if necessary. That would be annoying and a little >>> bit inefficient, but the places where we need to rewrite is limited: >>> prepare_sort_from_pathkeys and build_tlist_to_deparse, IIUC. >> >> Other reason why we can't use your approach is we can not decide >> whether ConvertRowtypeExpr is needed by just looking at the Var node. >> You might say, that we add a ConvertRowtypeExpr if the Var::varno >> references a child relation. But that's not true. For example a whole >> row reference embedded in ConvertRowtypeExpr added by query >> adjustments in inheritance_planner() is not a child's whole-row >> reference in the adjusted query. Sorry, I don't understand this fully. >> So, it's not clear to me when we add >> a ConvertRowtypeExpr and we don't. I think it'd be OK to rewrite so at least in prepare_sort_from_pathkeys and build_tlist_to_deparse. >> I am not sure if those are the only >> two places which require a fix. Right now it looks like those are the >> places which need PVC_INCLUDE_CONVERTROWTYPE, but that may not be true >> in the future, esp. given the amount of work we expect to happen in >> the partitioning area. When that happens, fixing all those places in >> that way wouldn't be good. I have to admit that the approach I proposed is a band-aid fix. >>> So we could >>> really minimize the code change and avoid the additional overhead caused by >>> the is_converted_whole_row_reference test added to pull_var_clause. I think >>> the latter would be rather important, because PWJ is disabled by default. >>> What do you think about that approach? >> >> Partition-wise join and partition-wise aggregation are disabled by >> default temporarily. We will change it in near future once the memory >> consumption issue has been tackled. The features are useful and users >> would want those to be enabled by default like other query >> optimizations. So, we can't take a decision based on that. Yeah, I really agree on that point! However, considering that pull_var_clause is used in many places where partitioning is *not* involved, I still think it's better to avoid spending extra cycles needed only for partitioning in that function. > Here's set of updated patches. Thanks for updating the patch! Here are some other minor comments on patch 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch: + /* Construct the conversion map. */ + parent_desc = lookup_rowtype_tupdesc(cre->resulttype, 0); I think it'd be better to pass -1, not 0, as the typmod argument for that function because that would be consistent with other places where we use that function with named rowtypes (eg, ruleutils.c). -is_subquery_var(Var *node, RelOptInfo *foreignrel, int *relno, int *colno) +is_subquery_column(Node *node, Index varno, RelOptInfo *foreignrel, + int *relno, int *colno) -1 for that change; because 1) we use "var" for implying many things as in eg, pull_var_clause and 2) I think it'd make back-patching easy to keep the name as-is. Other than that the patch set looks good to me. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Wed, May 16, 2018 at 4:01 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > >>>> So we could >>>> really minimize the code change and avoid the additional overhead caused >>>> by >>>> the is_converted_whole_row_reference test added to pull_var_clause. I >>>> think >>>> the latter would be rather important, because PWJ is disabled by >>>> default. >>>> What do you think about that approach? >>> >>> >>> Partition-wise join and partition-wise aggregation are disabled by >>> default temporarily. We will change it in near future once the memory >>> consumption issue has been tackled. The features are useful and users >>> would want those to be enabled by default like other query >>> optimizations. So, we can't take a decision based on that. > > > Yeah, I really agree on that point! However, considering that > pull_var_clause is used in many places where partitioning is *not* involved, > I still think it's better to avoid spending extra cycles needed only for > partitioning in that function. Right. The changes done in inheritance_planner() imply that we can encounter a ConvertRowtypeExpr even in the expressions for a relation which is not a child relation in the translated query. It was a child in the original query but after translating it becomes a parent and has ConvertRowtypeExpr somewhere there. So, there is no way to know whether a given expression will have ConvertRowtypeExpr in it. That's my understanding. But if we can device such a test, I am happy to apply that test before or witin the call to pull_var_clause(). We don't need that expensive test if user has passed PVC_RECURSE_CONVERTROWTYPE. So we could test that first and then check the shape of expression tree. It would cause more asymmetry in pull_var_clause(), but we can live with it or change the order of tests for all the three options. I will differ that to a committer. > + /* Construct the conversion map. */ > + parent_desc = lookup_rowtype_tupdesc(cre->resulttype, 0); > > I think it'd be better to pass -1, not 0, as the typmod argument for that > function because that would be consistent with other places where we use > that function with named rowtypes (eg, ruleutils.c). Done. > > -is_subquery_var(Var *node, RelOptInfo *foreignrel, int *relno, int *colno) > +is_subquery_column(Node *node, Index varno, RelOptInfo *foreignrel, > + int *relno, int *colno) > > -1 for that change; because 1) we use "var" for implying many things as in > eg, pull_var_clause and 2) I think it'd make back-patching easy to keep the > name as-is. I think pull_var_clause is the only place where we do that and I don't like that very much. I also don't like is_subquery_var() name since it's too specific. We might want that kind of functionality to ship generic expressions and not just Vars in future. Usually we would be searching for columns in the subquery targetlist so the name change looks good to me. IIRC, the function was added one release back, so backpatching pain, if any, won't be much. But I don't think we should carry a misnomer for many releases to come. Let's differ this to a committer. Here's set of updated patches. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/05/16 22:49), Ashutosh Bapat wrote: > On Wed, May 16, 2018 at 4:01 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> However, considering that >> pull_var_clause is used in many places where partitioning is *not* involved, >> I still think it's better to avoid spending extra cycles needed only for >> partitioning in that function. > > Right. The changes done in inheritance_planner() imply that we can > encounter a ConvertRowtypeExpr even in the expressions for a relation > which is not a child relation in the translated query. It was a child > in the original query but after translating it becomes a parent and > has ConvertRowtypeExpr somewhere there. Sorry, I don't understand this. Could you show me such an example? > So, there is no way to know > whether a given expression will have ConvertRowtypeExpr in it. That's > my understanding. But if we can device such a test, I am happy to > apply that test before or witin the call to pull_var_clause(). > > We don't need that expensive test if user has passed > PVC_RECURSE_CONVERTROWTYPE. So we could test that first and then check > the shape of expression tree. It would cause more asymmetry in > pull_var_clause(), but we can live with it or change the order of > tests for all the three options. I will differ that to a committer. Sounds more invasive. Considering where we are in the development cycle for PG11, I think it'd be better to address this by something like the approach I proposed. Anyway, +1 for asking the committer's opinion. >> + /* Construct the conversion map. */ >> + parent_desc = lookup_rowtype_tupdesc(cre->resulttype, 0); >> >> I think it'd be better to pass -1, not 0, as the typmod argument for that >> function because that would be consistent with other places where we use >> that function with named rowtypes (eg, ruleutils.c). > > Done. Thanks! >> -is_subquery_var(Var *node, RelOptInfo *foreignrel, int *relno, int *colno) >> +is_subquery_column(Node *node, Index varno, RelOptInfo *foreignrel, >> + int *relno, int *colno) >> >> -1 for that change; because 1) we use "var" for implying many things as in >> eg, pull_var_clause and 2) I think it'd make back-patching easy to keep the >> name as-is. > > I think pull_var_clause is the only place where we do that and I don't > like that very much. I also don't like is_subquery_var() name since > it's too specific. We might want that kind of functionality to ship > generic expressions and not just Vars in future. Usually we would be > searching for columns in the subquery targetlist so the name change > looks good to me. IIRC, the function was added one release back, so > backpatching pain, if any, won't be much. But I don't think we should > carry a misnomer for many releases to come. Let's differ this to a > committer. OK > Here's set of updated patches. Thanks for updating the patch set! I noticed followings. Sorry for not having noticed that before. - On patch 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch: +extern bool +is_converted_whole_row_reference(Node *node) I think we should remove "extern" from the definition. - On patch 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch: + /* Construct ROW expression according to the conversion map. */ + appendStringInfoString(buf, "ROW("); + for (cnt = 0; cnt < parent_desc->natts; cnt++) + { + /* Ignore dropped columns. */ + if (attrMap[cnt] == 0) + continue; + + if (cnt > 0) + appendStringInfoString(buf, ", "); + deparseColumnRef(buf, child_var->varno, attrMap[cnt], + planner_rt_fetch(child_var->varno, root), + qualify_col); + } + appendStringInfoChar(buf, ')'); The result would start with ", " in the case where the cnt=0 column of the parent relation is a dropped one. Here is an example: postgres=# create table pt1 (a int, b int) partition by range (b); CREATE TABLE postgres=# alter table pt1 drop column a; ALTER TABLE postgres=# alter table pt1 add column a int; ALTER TABLE postgres=# create table loct11 (like pt1); CREATE TABLE postgres=# create foreign table pt1p1 partition of pt1 for values from (0) to (100) server loopback options (table_name 'loct11', use_remote_estimate 'true'); CREATE FOREIGN TABLE postgres=# insert into pt1 select i, i from generate_series(0, 99, 2) i; INSERT 0 50 postgres=# analyze pt1; ANALYZE postgres=# analyze pt1p1; ANALYZE postgres=# create table pt2 (a int, b int) partition by range (b); CREATE TABLE postgres=# create table loct21 (like pt2); CREATE TABLE postgres=# create foreign table pt2p1 partition of pt2 for values from (0) to (100) server loopback options (table_name 'loct21', use_remote_estimate 'true'); CREATE FOREIGN TABLE postgres=# insert into pt2 select i, i from generate_series(0, 99, 3) i; INSERT 0 34 postgres=# analyze pt2; ANALYZE postgres=# analyze pt2p1; ANALYZE postgres=# set enable_partitionwise_join to true; SET postgres=# explain verbose select pt1.* from pt1, pt2 where pt1.b = pt2.b for update of pt1; ERROR: syntax error at or near "," CONTEXT: remote SQL command: EXPLAIN SELECT r4.b, r4.a, r4.ctid, CASE WHEN (r4.*)::text IS NOT NULL THEN ROW(, r4.b, r4.a) END, CASE WHEN (r4.*)::text IS NOT NULL THEN 57467 END, r6.ctid, CASE WHEN (r6.*)::text IS NOT NULL THEN ROW(r6.a, r6.b) END, CASE WHEN (r6.*)::text IS NOT NULL THEN 57476 END FROM (public.loct11 r4 INNER JOIN public.loct21 r6 ON (((r4.b = r6.b)))) FOR UPDATE OF r4 I think we can fix this by adding another flag to indicate whether we deparsed the first live column of the relation, as in the "first" bool flag in deparseTargetList. One more thing to add is: the patch adds support for deparsing a ConvertRowtypeExpr that translates a wholerow of a childrel into a wholerow of its parentrel's rowtype, so by modifying is_foreign_expr accordingly, we could push down such CREs in JOIN ON conditions to the remote for example. But that would be an improvement, not a fix, so I think we should leave that for another patch for PG12. Other than that, the patch set looks good to me. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Thu, May 17, 2018 at 4:50 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2018/05/16 22:49), Ashutosh Bapat wrote: >> >> On Wed, May 16, 2018 at 4:01 PM, Etsuro Fujita >> <fujita.etsuro@lab.ntt.co.jp> wrote: >>> >>> However, considering that >>> pull_var_clause is used in many places where partitioning is *not* >>> involved, >>> I still think it's better to avoid spending extra cycles needed only for >>> partitioning in that function. >> >> >> Right. The changes done in inheritance_planner() imply that we can >> encounter a ConvertRowtypeExpr even in the expressions for a relation >> which is not a child relation in the translated query. It was a child >> in the original query but after translating it becomes a parent and >> has ConvertRowtypeExpr somewhere there. > > > Sorry, I don't understand this. Could you show me such an example? Even without inheritance we can induce a ConvertRowtypeExpr on a base relation which is not "other" relation create table parent (a int, b int); create table child () inherits(parent); alter table child add constraint whole_par_const check ((child::parent).a = 1); There is no way to tell by looking at the parsed query whether pull_var_clause() from StoreRelCheck() will encounter a ConvertRowtypeExpr or not. We could check whether the tables involved are inherited or not, but that's more expensive than is_converted_whole_row_reference(). > >> So, there is no way to know >> whether a given expression will have ConvertRowtypeExpr in it. That's >> my understanding. But if we can device such a test, I am happy to >> apply that test before or witin the call to pull_var_clause(). >> >> We don't need that expensive test if user has passed >> PVC_RECURSE_CONVERTROWTYPE. So we could test that first and then check >> the shape of expression tree. It would cause more asymmetry in >> pull_var_clause(), but we can live with it or change the order of >> tests for all the three options. I will differ that to a committer. > > > Sounds more invasive. Considering where we are in the development cycle for > PG11, I think it'd be better to address this by something like the approach > I proposed. Anyway, +1 for asking the committer's opinion. Thanks. > > - On patch 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch: > > +extern bool > +is_converted_whole_row_reference(Node *node) > > I think we should remove "extern" from the definition. Done. > > - On patch 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch: > > > I think we can fix this by adding another flag to indicate whether we > deparsed the first live column of the relation, as in the "first" bool flag > in deparseTargetList. Thanks. Fixed. > > One more thing to add is: the patch adds support for deparsing a > ConvertRowtypeExpr that translates a wholerow of a childrel into a wholerow > of its parentrel's rowtype, so by modifying is_foreign_expr accordingly, we > could push down such CREs in JOIN ON conditions to the remote for example. > But that would be an improvement, not a fix, so I think we should leave that > for another patch for PG12. Right. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/05/17 23:19), Ashutosh Bapat wrote: > On Thu, May 17, 2018 at 4:50 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> (2018/05/16 22:49), Ashutosh Bapat wrote: >>> On Wed, May 16, 2018 at 4:01 PM, Etsuro Fujita >>> <fujita.etsuro@lab.ntt.co.jp> wrote: >>>> However, considering that >>>> pull_var_clause is used in many places where partitioning is *not* >>>> involved, >>>> I still think it's better to avoid spending extra cycles needed only for >>>> partitioning in that function. >>> >>> >>> Right. The changes done in inheritance_planner() imply that we can >>> encounter a ConvertRowtypeExpr even in the expressions for a relation >>> which is not a child relation in the translated query. It was a child >>> in the original query but after translating it becomes a parent and >>> has ConvertRowtypeExpr somewhere there. >> >> >> Sorry, I don't understand this. Could you show me such an example? > > Even without inheritance we can induce a ConvertRowtypeExpr on a base > relation which is not "other" relation > > create table parent (a int, b int); > create table child () inherits(parent); > alter table child add constraint whole_par_const check ((child::parent).a = 1); > > There is no way to tell by looking at the parsed query whether > pull_var_clause() from StoreRelCheck() will encounter a > ConvertRowtypeExpr or not. We could check whether the tables involved > are inherited or not, but that's more expensive than > is_converted_whole_row_reference(). Yeah, ISTM that the inheritance test makes things worse. >>> So, there is no way to know >>> whether a given expression will have ConvertRowtypeExpr in it. That's >>> my understanding. But if we can device such a test, I am happy to >>> apply that test before or witin the call to pull_var_clause(). >>> >>> We don't need that expensive test if user has passed >>> PVC_RECURSE_CONVERTROWTYPE. So we could test that first and then check >>> the shape of expression tree. It would cause more asymmetry in >>> pull_var_clause(), but we can live with it or change the order of >>> tests for all the three options. I will differ that to a committer. >> >> >> Sounds more invasive. Considering where we are in the development cycle for >> PG11, I think it'd be better to address this by something like the approach >> I proposed. Anyway, +1 for asking the committer's opinion. > > Thanks. Let's ask that. >> - On patch 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch: >> >> +extern bool >> +is_converted_whole_row_reference(Node *node) >> >> I think we should remove "extern" from the definition. > > Done. >> - On patch 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch: >> >> >> I think we can fix this by adding another flag to indicate whether we >> deparsed the first live column of the relation, as in the "first" bool flag >> in deparseTargetList. > > Thanks. Fixed. Thanks for updating the patch set! Other than pull_var_clause things, the updated version looks good to me, so I'll mark this as Ready for Committer. As I said before, I think this issue should be addressed in advance of the PG11 release. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/05/18 16:33), Etsuro Fujita wrote: > Other than pull_var_clause things, > the updated version looks good to me, so I'll mark this as Ready for > Committer. Since I'm not 100% sure that that is the right way to go, I've been rethinking how to fix this issue. Yet another idea I came up with to fix this is to redesign the handling of the tlists for children in the partitioning case. Currently, we build the reltarget for a child by applying adjust_appendrel_attrs to the reltarget for its parent in set_append_rel_size, which maps a wholerow Var referring to the parent rel to a ConvertRowtypeExpr that translates a wholerow of the child rel into a wholerow of the parent rel's rowtype. This works well for the non-partitioned inheritance case, but makes complicated the code for handling the partitioning case especially when planning partitionwise-joins. And I think this would be the root cause of this issue. I don't think the tlists for the children need to have their wholerows transformed into the corresponding ConvertRowtypeExprs *at this point*, so what I'd like to propose is to 1) map a parent wholerow Var simply to a child wholerow Var, instead (otherwise, the same as adjust_appendrel_attrs), when building the reltarget for a child in set_append_rel_size, 2) build the tlists for child joins using those children's wholerow Vars at path creation time, and 3) adjust those wholerow Vars in the tlists for subpaths in the chosen AppendPath so that those are transformed into the corresponding ConvertRowtypeExprs, at plan creation time (ie, in create_append_plan/create_merge_append_plan). IIUC, this would not require any changes to pull_var_clause as proposed by the patch. This would not require any changes to postgres_fdw as proposed by the patch, either. In addition, this would make unnecessary the code added to setrefs.c by the partitionwise-join patch. Maybe I'm missing something though. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Wed, Jun 6, 2018 at 5:00 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2018/05/18 16:33), Etsuro Fujita wrote: >> >> Other than pull_var_clause things, >> the updated version looks good to me, so I'll mark this as Ready for >> Committer. > > > Since I'm not 100% sure that that is the right way to go, I've been > rethinking how to fix this issue. Yet another idea I came up with to fix > this is to redesign the handling of the tlists for children in the > partitioning case. Currently, we build the reltarget for a child by > applying adjust_appendrel_attrs to the reltarget for its parent in > set_append_rel_size, which maps a wholerow Var referring to the parent rel > to a ConvertRowtypeExpr that translates a wholerow of the child rel into a > wholerow of the parent rel's rowtype. This works well for the > non-partitioned inheritance case, but makes complicated the code for > handling the partitioning case especially when planning partitionwise-joins. > And I think this would be the root cause of this issue. Although true, this is not only about targetlist. Even the whole-row expressions in the conditions, equivalence classes and other planner/optimizer data structures are translated to ConvertRowtypeExpr. > I don't think the > tlists for the children need to have their wholerows transformed into the > corresponding ConvertRowtypeExprs *at this point*, so what I'd like to > propose is to 1) map a parent wholerow Var simply to a child wholerow Var, > instead (otherwise, the same as adjust_appendrel_attrs), when building the > reltarget for a child in set_append_rel_size, 2) build the tlists for child > joins using those children's wholerow Vars at path creation time, and 3) > adjust those wholerow Vars in the tlists for subpaths in the chosen > AppendPath so that those are transformed into the corresponding > ConvertRowtypeExprs, at plan creation time (ie, in > create_append_plan/create_merge_append_plan). IIUC, this would not require > any changes to pull_var_clause as proposed by the patch. This would not > require any changes to postgres_fdw as proposed by the patch, either. In > addition, this would make unnecessary the code added to setrefs.c by the > partitionwise-join patch. Maybe I'm missing something though. Not translating whole-row expressions to ConvertRowtypeExpr before creating paths can lead to a number of anomalies. For example, 1. if there's an index on the whole-row expression of child, translating parent's whole-row expression to child's whole-row expression as is will lead to using that index, when in reality the it can not be used since the condition/ORDER BY clause (which originally referred the parent's whole-row expression) require the child's whole-row reassembled as parent's whole-row. 2. Constraints on child'd whole-row expression, will be used to prune a child when they can not be used since the original condition was on parent' whole-row expression and not that of the child. 3. Equivalence classes will think that a child whole-row expression (without ConvertRowtypeExpr) is equivalent to an expression which is part of the same equivalence class as the parent' whole-row expression. Given these serious problems, I don't think we could afford not to cover a child whole-row reference in ConvertRowtypeExpr. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/06/07 19:42), Ashutosh Bapat wrote: > On Wed, Jun 6, 2018 at 5:00 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> Since I'm not 100% sure that that is the right way to go, I've been >> rethinking how to fix this issue. Yet another idea I came up with to fix >> this is to redesign the handling of the tlists for children in the >> partitioning case. Currently, we build the reltarget for a child by >> applying adjust_appendrel_attrs to the reltarget for its parent in >> set_append_rel_size, which maps a wholerow Var referring to the parent rel >> to a ConvertRowtypeExpr that translates a wholerow of the child rel into a >> wholerow of the parent rel's rowtype. This works well for the >> non-partitioned inheritance case, but makes complicated the code for >> handling the partitioning case especially when planning partitionwise-joins. >> And I think this would be the root cause of this issue. > > Although true, this is not only about targetlist. Even the whole-row > expressions in the conditions, equivalence classes and other > planner/optimizer data structures are translated to > ConvertRowtypeExpr. Yeah, but I mean we modify set_append_rel_size so that we only map a parent wholerow Var in the parent tlist to a child wholerow Var in the child's tlist; parent wholerow Vars in the parent's other expressions such as conditions are transformed into CREs as-is. >> I don't think the >> tlists for the children need to have their wholerows transformed into the >> corresponding ConvertRowtypeExprs *at this point*, so what I'd like to >> propose is to 1) map a parent wholerow Var simply to a child wholerow Var, >> instead (otherwise, the same as adjust_appendrel_attrs), when building the >> reltarget for a child in set_append_rel_size, 2) build the tlists for child >> joins using those children's wholerow Vars at path creation time, and 3) >> adjust those wholerow Vars in the tlists for subpaths in the chosen >> AppendPath so that those are transformed into the corresponding >> ConvertRowtypeExprs, at plan creation time (ie, in >> create_append_plan/create_merge_append_plan). IIUC, this would not require >> any changes to pull_var_clause as proposed by the patch. This would not >> require any changes to postgres_fdw as proposed by the patch, either. In >> addition, this would make unnecessary the code added to setrefs.c by the >> partitionwise-join patch. Maybe I'm missing something though. > > Not translating whole-row expressions to ConvertRowtypeExpr before > creating paths can lead to a number of anomalies. For example, > > 1. if there's an index on the whole-row expression of child, > translating parent's whole-row expression to child's whole-row > expression as is will lead to using that index, when in reality the it > can not be used since the condition/ORDER BY clause (which originally > referred the parent's whole-row expression) require the child's > whole-row reassembled as parent's whole-row. > 2. Constraints on child'd whole-row expression, will be used to prune > a child when they can not be used since the original condition was on > parent' whole-row expression and not that of the child. > 3. Equivalence classes will think that a child whole-row expression > (without ConvertRowtypeExpr) is equivalent to an expression which is > part of the same equivalence class as the parent' whole-row > expression. Is that still possible when we only map a parent wholerow Var in the parent's tlist to a child wholerow Var in the child's tlist? Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Mon, Jun 11, 2018 at 4:05 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2018/06/07 19:42), Ashutosh Bapat wrote: >> >> On Wed, Jun 6, 2018 at 5:00 PM, Etsuro Fujita >> <fujita.etsuro@lab.ntt.co.jp> wrote: >>> >>> Since I'm not 100% sure that that is the right way to go, I've been >>> rethinking how to fix this issue. Yet another idea I came up with to fix >>> this is to redesign the handling of the tlists for children in the >>> partitioning case. Currently, we build the reltarget for a child by >>> applying adjust_appendrel_attrs to the reltarget for its parent in >>> set_append_rel_size, which maps a wholerow Var referring to the parent >>> rel >>> to a ConvertRowtypeExpr that translates a wholerow of the child rel into >>> a >>> wholerow of the parent rel's rowtype. This works well for the >>> non-partitioned inheritance case, but makes complicated the code for >>> handling the partitioning case especially when planning >>> partitionwise-joins. >>> And I think this would be the root cause of this issue. >> >> >> Although true, this is not only about targetlist. Even the whole-row >> expressions in the conditions, equivalence classes and other >> planner/optimizer data structures are translated to >> ConvertRowtypeExpr. > > > Yeah, but I mean we modify set_append_rel_size so that we only map a parent > wholerow Var in the parent tlist to a child wholerow Var in the child's > tlist; parent wholerow Vars in the parent's other expressions such as > conditions are transformed into CREs as-is. What happens to a PlaceHolderVar containing whole-row reference when that appears in a condition and/or targetlist. > > >>> I don't think the >>> tlists for the children need to have their wholerows transformed into the >>> corresponding ConvertRowtypeExprs *at this point*, so what I'd like to >>> propose is to 1) map a parent wholerow Var simply to a child wholerow >>> Var, >>> instead (otherwise, the same as adjust_appendrel_attrs), when building >>> the >>> reltarget for a child in set_append_rel_size, 2) build the tlists for >>> child >>> joins using those children's wholerow Vars at path creation time, and 3) >>> adjust those wholerow Vars in the tlists for subpaths in the chosen >>> AppendPath so that those are transformed into the corresponding >>> ConvertRowtypeExprs, at plan creation time (ie, in >>> create_append_plan/create_merge_append_plan). IIUC, this would not >>> require >>> any changes to pull_var_clause as proposed by the patch. This would not >>> require any changes to postgres_fdw as proposed by the patch, either. In >>> addition, this would make unnecessary the code added to setrefs.c by the >>> partitionwise-join patch. Maybe I'm missing something though. >> >> >> Not translating whole-row expressions to ConvertRowtypeExpr before >> creating paths can lead to a number of anomalies. For example, >> >> 1. if there's an index on the whole-row expression of child, >> translating parent's whole-row expression to child's whole-row >> expression as is will lead to using that index, when in reality the it >> can not be used since the condition/ORDER BY clause (which originally >> referred the parent's whole-row expression) require the child's >> whole-row reassembled as parent's whole-row. >> 2. Constraints on child'd whole-row expression, will be used to prune >> a child when they can not be used since the original condition was on >> parent' whole-row expression and not that of the child. >> 3. Equivalence classes will think that a child whole-row expression >> (without ConvertRowtypeExpr) is equivalent to an expression which is >> part of the same equivalence class as the parent' whole-row >> expression. > > > Is that still possible when we only map a parent wholerow Var in the > parent's tlist to a child wholerow Var in the child's tlist? Yes. 1 will affect the choice of index-only scans. We use pathkey comparisons at a number of places so tlist going out of sync with equivalence classes can affect a number of places. build_tlist_to_deparse() is used to deparse targetlist at the time of creating paths,as well as during the planning. According to your proposal we will build different tlists during path creation and plan creation. That doesn't look good. Apart from that your proposal of changing a child's targetlist at the time of plan creation to use CRE doesn't help since the original problem as described in [1] happens at the time of creating plans as described in [1]. The parent's whole-row reference has a different data type than child's whole-row reference. If we do not cover child's whole-row reference by CRE, the parent and child targetlists will go out of sync as far as the type of individual columns are concerned. That too doesn't look good to me. [1] https://www.postgresql.org/message-id/CAFjFpRfD5=LsOqG1eUf-vMpm9KoEZBBGJo68D9wkUBJxop_emQ@mail.gmail.com -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/06/14 22:37), Ashutosh Bapat wrote: > On Mon, Jun 11, 2018 at 4:05 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> (2018/06/07 19:42), Ashutosh Bapat wrote: >>> On Wed, Jun 6, 2018 at 5:00 PM, Etsuro Fujita >>> <fujita.etsuro@lab.ntt.co.jp> wrote: >> Yeah, but I mean we modify set_append_rel_size so that we only map a parent >> wholerow Var in the parent tlist to a child wholerow Var in the child's >> tlist; parent wholerow Vars in the parent's other expressions such as >> conditions are transformed into CREs as-is. > > What happens to a PlaceHolderVar containing whole-row reference when > that appears in a condition and/or targetlist. Whole-row Vars contained in such PHV's expressions will be mapped into ConvertRowtypeExprs with adjust_appendrel_attrs. >>>> I don't think the >>>> tlists for the children need to have their wholerows transformed into the >>>> corresponding ConvertRowtypeExprs *at this point*, so what I'd like to >>>> propose is to 1) map a parent wholerow Var simply to a child wholerow >>>> Var, >>>> instead (otherwise, the same as adjust_appendrel_attrs), when building >>>> the >>>> reltarget for a child in set_append_rel_size, 2) build the tlists for >>>> child >>>> joins using those children's wholerow Vars at path creation time, and 3) >>>> adjust those wholerow Vars in the tlists for subpaths in the chosen >>>> AppendPath so that those are transformed into the corresponding >>>> ConvertRowtypeExprs, at plan creation time (ie, in >>>> create_append_plan/create_merge_append_plan). IIUC, this would not >>>> require >>>> any changes to pull_var_clause as proposed by the patch. This would not >>>> require any changes to postgres_fdw as proposed by the patch, either. In >>>> addition, this would make unnecessary the code added to setrefs.c by the >>>> partitionwise-join patch. Maybe I'm missing something though. >>> >>> >>> Not translating whole-row expressions to ConvertRowtypeExpr before >>> creating paths can lead to a number of anomalies. For example, >>> >>> 1. if there's an index on the whole-row expression of child, >>> translating parent's whole-row expression to child's whole-row >>> expression as is will lead to using that index, when in reality the it >>> can not be used since the condition/ORDER BY clause (which originally >>> referred the parent's whole-row expression) require the child's >>> whole-row reassembled as parent's whole-row. >>> 2. Constraints on child'd whole-row expression, will be used to prune >>> a child when they can not be used since the original condition was on >>> parent' whole-row expression and not that of the child. >>> 3. Equivalence classes will think that a child whole-row expression >>> (without ConvertRowtypeExpr) is equivalent to an expression which is >>> part of the same equivalence class as the parent' whole-row >>> expression. >> >> >> Is that still possible when we only map a parent wholerow Var in the >> parent's tlist to a child wholerow Var in the child's tlist? > > Yes. 1 will affect the choice of index-only scans. We use pathkey > comparisons at a number of places so tlist going out of sync with > equivalence classes can affect a number of places. Sorry, I'm still not sure if #1 is really a problem. Could you show me an example for that? > build_tlist_to_deparse() is used to deparse targetlist at the time of > creating paths,as well as during the planning. According to your > proposal we will build different tlists during path creation and plan > creation. That doesn't look good. Maybe my explanation was not enough, but my proposal will guarantee that the FDW can build the same tlist at path creation time and plan creation time because the RelOptInfo's reltarget from which the tlist is created doesn't change at all. > Apart from that your proposal of > changing a child's targetlist at the time of plan creation to use CRE > doesn't help since the original problem as described in [1] happens at > the time of creating plans as described in [1]. I think my proposal will address the original issue. Actually, I've created a patch implementing that proposal. That's still WIP, but it works well even for these cases (and makes code much more simple, as mentioned above!) But I think that patch needs more work, so I'm planning to post it next week. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/06/15 20:56), Etsuro Fujita wrote: > Actually, I've > created a patch implementing that proposal. > But I think that patch needs more work, so I'm > planning to post it next week. Here is a patch for that. * As I said upthread, the patch makes code much more simple; I removed all the changes to setrefs.c added by the partitionwise-join patch. I also simplified the logic for building a tlist for a child-join rel. The original PWJ computes attr_needed data even for child rels, and build the tlist for a child-join by passing to build_joinrel_tlist that data for input child rels for the child-join. But I think that's redundant, and it's more straightforward to apply adjust_appendrel_attrs to the parent-join's tlist to get the child-join's tlist. So, I changed that way, which made unnecessary all the changes to build_joinrel_tlist and placeholder.c added by the PWJ patch, so I removed those as well. * The patch contains all of the regression tests in the original patch proposed by Ashutosh. Best regards, Etsuro Fujita
Attachment
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Tue, Jun 19, 2018 at 6:16 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > > * As I said upthread, the patch makes code much more simple; I removed all > the changes to setrefs.c added by the partitionwise-join patch. I also > simplified the logic for building a tlist for a child-join rel. The original > PWJ computes attr_needed data even for child rels, and build the tlist for a > child-join by passing to build_joinrel_tlist that data for input child rels > for the child-join. But I think that's redundant, and it's more > straightforward to apply adjust_appendrel_attrs to the parent-join's tlist > to get the child-join's tlist. So, I changed that way, which made > unnecessary all the changes to build_joinrel_tlist and placeholder.c added > by the PWJ patch, so I removed those as well. > > * The patch contains all of the regression tests in the original patch > proposed by Ashutosh. I have started reviewing the patch. + if (enable_partitionwise_join && rel->top_parent_is_partitioned) + { + build_childrel_tlist(root, rel, childrel, 1, &appinfo); + } Why do we need rel->top_parent_is_partitioned? If a relation is partitioned (if (rel->part_scheme), it's either the top parent or is partition of some other partitioned table. In either case this condition will be true. + /* No work if the child plan is an Append or MergeAppend */ + if (IsA(subplan, Append) || IsA(subplan, MergeAppend)) + return; Why? Probably it's related to the answer to the first question, But I don't see the connection. Given that partition-wise join will be applicable level by level, we need to recurse in adjust_subplan_tlist(). + /* The child plan should be able to do projection */ + Assert(is_projection_capable_plan(subplan)); + Again why? A MergeAppend's subplan could be a Sort plan, which will not be projection capable. This is not a full review. I will continue reviewing it further. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Robert Haas
Date:
On Tue, Jun 19, 2018 at 8:46 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > Here is a patch for that. > > * As I said upthread, the patch makes code much more simple; I removed all > the changes to setrefs.c added by the partitionwise-join patch. I also > simplified the logic for building a tlist for a child-join rel. The original > PWJ computes attr_needed data even for child rels, and build the tlist for a > child-join by passing to build_joinrel_tlist that data for input child rels > for the child-join. But I think that's redundant, and it's more > straightforward to apply adjust_appendrel_attrs to the parent-join's tlist > to get the child-join's tlist. So, I changed that way, which made > unnecessary all the changes to build_joinrel_tlist and placeholder.c added > by the PWJ patch, so I removed those as well. > > * The patch contains all of the regression tests in the original patch > proposed by Ashutosh. I think this approach is going to run into trouble if the level at which we have to apply the ConvertRowTypeExpr happens not to be a projection-capable node. And, in general, it seems to me that we want to produce the right outputs at the lowest possible level of the plan tree. For instance, suppose that one of the relations being scanned is not parallel-safe but the others are. Then, we could end up with a plan like this: Append -> Seq Scan on temp_rela -> Gather -> Parallel Seq Scan on thing1 -> Gather -> Parallel Seq Scan on thing2 If a projection is required to convert the row type expression, we certainly want that to get pushed below the Gather, not to happen at the Gather level itself. We certainly don't want it to happen at the Append level, which can't even handle it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/06/22 22:54), Ashutosh Bapat wrote: > I have started reviewing the patch. Thanks for the review! > + if (enable_partitionwise_join&& rel->top_parent_is_partitioned) > + { > + build_childrel_tlist(root, rel, childrel, 1,&appinfo); > + } > > Why do we need rel->top_parent_is_partitioned? If a relation is > partitioned (if (rel->part_scheme), it's either the top parent or is > partition of some other partitioned table. In either case this > condition will be true. This would be needed to avoid unnecessarily applying build_childrel_tlist to child rels of a partitioned table for which we don't consider partitionwise join. Consider: postgres=# create table lpt (c1 int, c2 text) partition by list (c1); CREATE TABLE postgres=# create table lpt_p1 partition of lpt for values in (1); CREATE TABLE postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text); CREATE TABLE postgres=# create table test (c1 int, c2 text); CREATE TABLE postgres=# explain verbose select * from (select * from lpt union all select * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1); QUERY PLAN -------------------------------------------------------------------------------- ---- Merge Join (cost=289.92..538.20 rows=16129 width=72) Output: lpt_p1.c1, lpt_p1.c2, test.c1, test.c2 Merge Cond: (test.c1 = lpt_p1.c1) -> Sort (cost=88.17..91.35 rows=1270 width=36) Output: test.c1, test.c2 Sort Key: test.c1 -> Seq Scan on public.test (cost=0.00..22.70 rows=1270 width=36) Output: test.c1, test.c2 -> Sort (cost=201.74..208.09 rows=2540 width=36) Output: lpt_p1.c1, lpt_p1.c2 Sort Key: lpt_p1.c1 -> Append (cost=0.00..58.10 rows=2540 width=36) -> Seq Scan on public.lpt_p1 (cost=0.00..22.70 rows=1270 width= 36) Output: lpt_p1.c1, lpt_p1.c2 -> Seq Scan on public.lpt_p2 (cost=0.00..22.70 rows=1270 width= 36) Output: lpt_p2.c1, lpt_p2.c2 (16 rows) In this example, the top parent is not a partitioned table and we don't need to consider partitionwise join for that, so we don't need to apply that to the child rel lpt_p1 of the partitioned table lpt (and the table lpt_p2). > + /* No work if the child plan is an Append or MergeAppend */ > + if (IsA(subplan, Append) || IsA(subplan, MergeAppend)) > + return; > > Why? Probably it's related to the answer to the first question, But I > don't see the connection. Given that partition-wise join will be > applicable level by level, we need to recurse in > adjust_subplan_tlist(). I don't think so; I think if the child plan is an Append or MergeAppend, the tlist for each subplan of the Append or MergeAppend would have already been adjusted while create_plan_recurse before we are called here. > + /* The child plan should be able to do projection */ > + Assert(is_projection_capable_plan(subplan)); > + > Again why? A MergeAppend's subplan could be a Sort plan, which will > not be projection capable. IIUC, since we are called here right after create_plan_recurse, the child plan would be a scan or join unless it's neither Append nor MergeAppend. So it should be projection-capable. Maybe I'm missing something, though. > This is not a full review. I will continue reviewing it further. Thanks again. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/06/22 23:58), Robert Haas wrote: > I think this approach is going to run into trouble if the level at > which we have to apply the ConvertRowTypeExpr happens not to be a > projection-capable node. Actually, the level we have to do that would be a child rel of a partitioned table or a child join of a partitionwise join, so the plan node would be a scan or join plan unless the child rel or child join is itself partitioned, in which case the plan node would be Append/MergeAppend and the proposed patch recursively would apply that conversion to child plans for the Append/MergeAppend). > And, in general, it seems to me that we want > to produce the right outputs at the lowest possible level of the plan > tree. For instance, suppose that one of the relations being scanned > is not parallel-safe but the others are. Then, we could end up with a > plan like this: > > Append > -> Seq Scan on temp_rela > -> Gather > -> Parallel Seq Scan on thing1 > -> Gather > -> Parallel Seq Scan on thing2 > > If a projection is required to convert the row type expression, we > certainly want that to get pushed below the Gather, not to happen at > the Gather level itself. IIUC, we currently don't consider such a plan for partition-wise join; we'll only consider gathering partial paths for the parent appendrel. So, I'm not sure we need to take into account that when applying the ConvertRowtypeExpr. Maybe I'm missing something, though. > We certainly don't want it to happen at the > Append level, which can't even handle it. I think so too. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Andres Freund
Date:
Hi Ashutosh, Etsuro, Robert, On 2018-06-22 10:58:28 -0400, Robert Haas wrote: > On Tue, Jun 19, 2018 at 8:46 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: > > Here is a patch for that. > > > > * As I said upthread, the patch makes code much more simple; I removed all > > the changes to setrefs.c added by the partitionwise-join patch. I also > > simplified the logic for building a tlist for a child-join rel. The original > > PWJ computes attr_needed data even for child rels, and build the tlist for a > > child-join by passing to build_joinrel_tlist that data for input child rels > > for the child-join. But I think that's redundant, and it's more > > straightforward to apply adjust_appendrel_attrs to the parent-join's tlist > > to get the child-join's tlist. So, I changed that way, which made > > unnecessary all the changes to build_joinrel_tlist and placeholder.c added > > by the PWJ patch, so I removed those as well. > > > > * The patch contains all of the regression tests in the original patch > > proposed by Ashutosh. > > I think this approach is going to run into trouble if the level at > which we have to apply the ConvertRowTypeExpr happens not to be a > projection-capable node. What's the plan forward here? This has been an open item for quite a while. Robert, are you in agreement with this approach on a high level? Who is going to drive this forward? Greetings, Andres Freund
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/07/04 1:35), Andres Freund wrote: > On 2018-06-22 10:58:28 -0400, Robert Haas wrote: >> On Tue, Jun 19, 2018 at 8:46 AM, Etsuro Fujita >> <fujita.etsuro@lab.ntt.co.jp> wrote: >>> Here is a patch for that. >> I think this approach is going to run into trouble if the level at >> which we have to apply the ConvertRowTypeExpr happens not to be a >> projection-capable node. > > What's the plan forward here? I still think that this approach would be the right way to go, so I plan to polish the patch. > This has been an open item for quite a > while. Robert, are you in agreement with this approach on a high level? > Who is going to drive this forward? I'm willing to do that if it's OK, but I'd like to get more feedback from Robert, Ashutosh or anyone else. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2018/06/22 22:54), Ashutosh Bapat wrote: >> >> I have started reviewing the patch. > > > Thanks for the review! > >> + if (enable_partitionwise_join&& rel->top_parent_is_partitioned) >> + { >> + build_childrel_tlist(root, rel, childrel, 1,&appinfo); >> + } >> >> Why do we need rel->top_parent_is_partitioned? If a relation is >> partitioned (if (rel->part_scheme), it's either the top parent or is >> partition of some other partitioned table. In either case this >> condition will be true. > > > This would be needed to avoid unnecessarily applying build_childrel_tlist to > child rels of a partitioned table for which we don't consider partitionwise > join. Consider: > > postgres=# create table lpt (c1 int, c2 text) partition by list (c1); > CREATE TABLE > postgres=# create table lpt_p1 partition of lpt for values in (1); > CREATE TABLE > postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text); > CREATE TABLE > postgres=# create table test (c1 int, c2 text); > CREATE TABLE > postgres=# explain verbose select * from (select * from lpt union all select > * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1); --- plan clipped > > In this example, the top parent is not a partitioned table and we don't need > to consider partitionwise join for that, so we don't need to apply that to > the child rel lpt_p1 of the partitioned table lpt (and the table lpt_p2). FWIW, the test is not sufficient. In the above example, a simple select * from lpt inner join test where lpt.c1 = test.c1 would not use partition-wise join technique. Why not to avoid build_childrel_tlist() in that case as well? Worst we can change the criteria to use partition-wise join in future e.g. above case would use partition-wise join by 1. merging lpt_p1 into corresponding partition of lpt so that ss is partitioned and 2. repartitioning test or joining test with each partition of lpt separately. In those cases the changes you are doing here need to be reverted and put somewhere else. There's already a patch to reverse the order of join and grouping out there. How would this work with that. In general I think set_append_rel_size() or build_simple_rel() is not the place where we should decide whether the given relation will participate in a PWJ. Also we should not selectively add a ConvertRowtypeExpr on top of a whole-row Var of some a child relation. We should do it for all the child rels or shouldn't do it at all. An in-between state will produce a hell lot of confusion for any further optimization. Whenever we change the code around partition-wise operations in future, we will have to check whether or not a given child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I have mentioned earlier, I am also not comfortable with the targetlist of child relations being type inconsistent with that of the parent, which is a fundamental rule in inheritance. Worst keep them inconsistent during path creation and make them consistent at the time of creating plans. A child's whole-row Var has datatype of the child where as that of parent has datatype of parent. A ConvertRowtypeExpr is used to fix this inconsistency. That's why I chose to use pull_var_clause() as a place to fix the problem and not fix ConvertRowtypeExpr in targetlist itself. I am also not comfortable in just avoiding ConvertRowtypeExprs in targetlist and leave them as is in the conditions and ECs etc. This means searching a ConvertRowtypeExpr e.g. for creating pathkeys in targetlist will fail at the time of path creation but will succeed at the time of plan creation. This is turning more invasive that my approach of fixing it through PVC. > >> + /* No work if the child plan is an Append or MergeAppend */ >> + if (IsA(subplan, Append) || IsA(subplan, MergeAppend)) >> + return; >> >> Why? Probably it's related to the answer to the first question, But I >> don't see the connection. Given that partition-wise join will be >> applicable level by level, we need to recurse in >> adjust_subplan_tlist(). > > > I don't think so; I think if the child plan is an Append or MergeAppend, the > tlist for each subplan of the Append or MergeAppend would have already been > adjusted while create_plan_recurse before we are called here. Ok. Thanks for the clarification. I think we should add a comment. > >> + /* The child plan should be able to do projection */ >> + Assert(is_projection_capable_plan(subplan)); >> + >> Again why? A MergeAppend's subplan could be a Sort plan, which will >> not be projection capable. > > > IIUC, since we are called here right after create_plan_recurse, the child > plan would be a scan or join unless it's neither Append nor MergeAppend. So > it should be projection-capable. Maybe I'm missing something, though. That's not true. add_paths_to_append_rel() adds sort paths on scan if necessary and creates merge append path. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/07/02 18:46), Etsuro Fujita wrote: > (2018/06/22 23:58), Robert Haas wrote: >> And, in general, it seems to me that we want >> to produce the right outputs at the lowest possible level of the plan >> tree. For instance, suppose that one of the relations being scanned >> is not parallel-safe but the others are. Then, we could end up with a >> plan like this: >> >> Append >> -> Seq Scan on temp_rela >> -> Gather >> -> Parallel Seq Scan on thing1 >> -> Gather >> -> Parallel Seq Scan on thing2 >> >> If a projection is required to convert the row type expression, we >> certainly want that to get pushed below the Gather, not to happen at >> the Gather level itself. > > IIUC, we currently don't consider such a plan for partition-wise join; > we'll only consider gathering partial paths for the parent appendrel. While updating the patch, I noticed that I'm wrong here. IIUC, apply_scanjoin_target_to_paths would allow us to consider such an Append for a partitioned relation when scanjoin_target_parallel_safe=false, as it generates new Append paths by recursively adjusting all partitions for which we call generate_gather_paths in that case as shown below: /* * If the scan/join target is not parallel-safe, partial paths cannot * generate it. */ if (!scanjoin_target_parallel_safe) { /* * Since we can't generate the final scan/join target, this is our * last opportunity to use any partial paths that exist. We don't do * this if the case where the target is parallel-safe, since we will * be able to generate superior paths by doing it after the final * scan/join target has been applied. * * Note that this may invalidate rel->cheapest_total_path, so we must * not rely on it after this point without first calling set_cheapest. */ generate_gather_paths(root, rel, false); /* Can't use parallel query above this level. */ rel->partial_pathlist = NIL; rel->consider_parallel = false; } I don't produce a test case where the plan is an Append with Gather subplans, but I'm not sure that it's a good thing to allow us to consider such a plan because in that plan, each Gather would try to grab its own pool of workers. Am I missing something? Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Wed, Jul 4, 2018 at 3:36 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > > I don't produce a test case where the plan is an Append with Gather > subplans, but I'm not sure that it's a good thing to allow us to consider > such a plan because in that plan, each Gather would try to grab its own pool > of workers. Am I missing something? If the overall join can not use parallelism, having individual child joins use parallel query might turn out to be efficient even if done one child at a time. Parallel append drastically reduces the cases where something like could be useful, but I don't think we can theoretically eliminate the need for such a plan. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/07/04 19:04), Ashutosh Bapat wrote: > On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> (2018/06/22 22:54), Ashutosh Bapat wrote: >>> + if (enable_partitionwise_join&& rel->top_parent_is_partitioned) >>> + { >>> + build_childrel_tlist(root, rel, childrel, 1,&appinfo); >>> + } >>> >>> Why do we need rel->top_parent_is_partitioned? If a relation is >>> partitioned (if (rel->part_scheme), it's either the top parent or is >>> partition of some other partitioned table. In either case this >>> condition will be true. >> >> >> This would be needed to avoid unnecessarily applying build_childrel_tlist to >> child rels of a partitioned table for which we don't consider partitionwise >> join. Consider: >> >> postgres=# create table lpt (c1 int, c2 text) partition by list (c1); >> CREATE TABLE >> postgres=# create table lpt_p1 partition of lpt for values in (1); >> CREATE TABLE >> postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text); >> CREATE TABLE >> postgres=# create table test (c1 int, c2 text); >> CREATE TABLE >> postgres=# explain verbose select * from (select * from lpt union all select >> * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1); > > --- plan clipped > >> >> In this example, the top parent is not a partitioned table and we don't need >> to consider partitionwise join for that, so we don't need to apply that to >> the child rel lpt_p1 of the partitioned table lpt (and the table lpt_p2). > > FWIW, the test is not sufficient. In the above example, a simple > select * from lpt inner join test where lpt.c1 = test.c1 would not use > partition-wise join technique. Why not to avoid build_childrel_tlist() > in that case as well? I might misunderstand your words, but in the above example the patch doesn't apply build_childrel_tlist to lpt_p1 and lpt_p2. The reason for that is because we can avoid adjusting the tlists for the corresponding subplans at plan creation time so that whole-row Vars in the tlists are transformed into CREs. I think the overhead of the adjustment is not that big, but not zero, so it would be worth avoiding applying build_childrel_tlist to partitions if the top parent won't participate in a partitionwise-join at all. > Worst we can change the criteria to use > partition-wise join in future e.g. above case would use partition-wise > join by 1. merging lpt_p1 into corresponding partition of lpt so that > ss is partitioned and 2. repartitioning test or joining test with each > partition of lpt separately. In those cases the changes you are doing > here need to be reverted and put somewhere else. There's already a > patch to reverse the order of join and grouping out there. How would > this work with that. Interesting, but that seems like a matter of PG12+. > In general I think set_append_rel_size() or build_simple_rel() is not > the place where we should decide whether the given relation will > participate in a PWJ. Also we should not selectively add a > ConvertRowtypeExpr on top of a whole-row Var of some a child relation. > We should do it for all the child rels or shouldn't do it at all. One thing I thought was to apply build_childrel_tlist for all the child rels whether its top parent is a partitioned table or not, but as I mentioned above, that would incur needless overhead for adjusting the tlists for the child rels that don't involve in a partitionwise join such as lpt_p1 and lpt_p2, which I think is not good. > An > in-between state will produce a hell lot of confusion for any further > optimization. Whenever we change the code around partition-wise > operations in future, we will have to check whether or not a given > child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I > have mentioned earlier, I am also not comfortable with the targetlist > of child relations being type inconsistent with that of the parent, > which is a fundamental rule in inheritance. Worst keep them > inconsistent during path creation and make them consistent at the time > of creating plans. A child's whole-row Var has datatype of the child > where as that of parent has datatype of parent. I don't see any critical issue here. Could you elaborate a bit more on that point? > A ConvertRowtypeExpr > is used to fix this inconsistency. That's why I chose to use > pull_var_clause() as a place to fix the problem and not fix > ConvertRowtypeExpr in targetlist itself. I think the biggest issue in the original patch you proposed is that we spend extra cycles where partitioning is not involved, which is the biggest reason why I think the original patch isn't the right way to go. > I am also not comfortable in just avoiding ConvertRowtypeExprs in > targetlist and leave them as is in the conditions and ECs etc. This > means searching a ConvertRowtypeExpr e.g. for creating pathkeys in > targetlist will fail at the time of path creation but will succeed at > the time of plan creation. > > This is turning more invasive that my approach of fixing it through PVC. Sorry, I don't understand this. Could you show me places where the problem happens? >>> + /* No work if the child plan is an Append or MergeAppend */ >>> + if (IsA(subplan, Append) || IsA(subplan, MergeAppend)) >>> + return; >>> >>> Why? Probably it's related to the answer to the first question, But I >>> don't see the connection. Given that partition-wise join will be >>> applicable level by level, we need to recurse in >>> adjust_subplan_tlist(). >> >> >> I don't think so; I think if the child plan is an Append or MergeAppend, the >> tlist for each subplan of the Append or MergeAppend would have already been >> adjusted while create_plan_recurse before we are called here. > > Ok. Thanks for the clarification. I think we should add a comment. Will do. >>> + /* The child plan should be able to do projection */ >>> + Assert(is_projection_capable_plan(subplan)); >>> + >>> Again why? A MergeAppend's subplan could be a Sort plan, which will >>> not be projection capable. >> >> >> IIUC, since we are called here right after create_plan_recurse, the child >> plan would be a scan or join unless it's neither Append nor MergeAppend. So >> it should be projection-capable. Maybe I'm missing something, though. > > That's not true. add_paths_to_append_rel() adds sort paths on scan if > necessary and creates merge append path. Really? I think create_merge_append_path called from generate_mergeappend_paths called from add_paths_to_append_rel uses a dummy sort path just to compute the cost, but I don't think create_merge_append_path (or generate_mergeappend_paths or add_paths_to_append_rel) insert a sort path to a scan (or join) path. Thanks for the comments! Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Wed, Jul 4, 2018 at 5:36 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2018/07/04 19:04), Ashutosh Bapat wrote: >> >> On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita >> <fujita.etsuro@lab.ntt.co.jp> wrote: >>> >>> (2018/06/22 22:54), Ashutosh Bapat wrote: >>>> >>>> + if (enable_partitionwise_join&& >>>> rel->top_parent_is_partitioned) >>>> + { >>>> + build_childrel_tlist(root, rel, childrel, 1,&appinfo); >>>> + } >>>> >>>> Why do we need rel->top_parent_is_partitioned? If a relation is >>>> partitioned (if (rel->part_scheme), it's either the top parent or is >>>> partition of some other partitioned table. In either case this >>>> condition will be true. >>> >>> >>> >>> This would be needed to avoid unnecessarily applying build_childrel_tlist >>> to >>> child rels of a partitioned table for which we don't consider >>> partitionwise >>> join. Consider: >>> >>> postgres=# create table lpt (c1 int, c2 text) partition by list (c1); >>> CREATE TABLE >>> postgres=# create table lpt_p1 partition of lpt for values in (1); >>> CREATE TABLE >>> postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text); >>> CREATE TABLE >>> postgres=# create table test (c1 int, c2 text); >>> CREATE TABLE >>> postgres=# explain verbose select * from (select * from lpt union all >>> select >>> * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1); >> >> >> --- plan clipped >> >>> >>> In this example, the top parent is not a partitioned table and we don't >>> need >>> to consider partitionwise join for that, so we don't need to apply that >>> to >>> the child rel lpt_p1 of the partitioned table lpt (and the table lpt_p2). >> >> >> FWIW, the test is not sufficient. In the above example, a simple >> select * from lpt inner join test where lpt.c1 = test.c1 would not use >> partition-wise join technique. Why not to avoid build_childrel_tlist() >> in that case as well? > > > I might misunderstand your words, but in the above example the patch doesn't > apply build_childrel_tlist to lpt_p1 and lpt_p2. The reason for that is > because we can avoid adjusting the tlists for the corresponding subplans at > plan creation time so that whole-row Vars in the tlists are transformed into > CREs. I think the overhead of the adjustment is not that big, but not zero, > so it would be worth avoiding applying build_childrel_tlist to partitions if > the top parent won't participate in a partitionwise-join at all. I don't think that answers my question. When we join lpt with test, your patch will apply build_childrel_tlist() to lpt_p1 and lpt_p2 even when join between lpt and test is not going to use partition-wise join. Why? As per your explanation, the condition "if (enable_partitionwise_join && rel->top_parent_is_partitioned)" is used to avoid applying build_childrel_tlist() when partition-wise join won't be possible. But it's not covering all the cases. > >> Worst we can change the criteria to use >> partition-wise join in future e.g. above case would use partition-wise >> join by 1. merging lpt_p1 into corresponding partition of lpt so that >> ss is partitioned and 2. repartitioning test or joining test with each >> partition of lpt separately. In those cases the changes you are doing >> here need to be reverted and put somewhere else. There's already a >> patch to reverse the order of join and grouping out there. How would >> this work with that. > > > Interesting, but that seems like a matter of PG12+. Yes, and I don't think it's a good idea to change this fix for PG12+ again. > >> In general I think set_append_rel_size() or build_simple_rel() is not >> the place where we should decide whether the given relation will >> participate in a PWJ. Also we should not selectively add a >> ConvertRowtypeExpr on top of a whole-row Var of some a child relation. >> We should do it for all the child rels or shouldn't do it at all. > > > One thing I thought was to apply build_childrel_tlist for all the child rels > whether its top parent is a partitioned table or not, but as I mentioned > above, that would incur needless overhead for adjusting the tlists for the > child rels that don't involve in a partitionwise join such as lpt_p1 and > lpt_p2, which I think is not good. > >> An >> in-between state will produce a hell lot of confusion for any further >> optimization. Whenever we change the code around partition-wise >> operations in future, we will have to check whether or not a given >> child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I >> have mentioned earlier, I am also not comfortable with the targetlist >> of child relations being type inconsistent with that of the parent, >> which is a fundamental rule in inheritance. Worst keep them >> inconsistent during path creation and make them consistent at the time >> of creating plans. A child's whole-row Var has datatype of the child >> where as that of parent has datatype of parent. > > > I don't see any critical issue here. Could you elaborate a bit more on that > point? I think breaking a fundamental rule like this itself is critical. But interestingly I am not able to find a case where it becomes a problem. But may be I haven't tried enough. And the question is if it's not required to have the targetlists type consistent, why even bother with ConvertRowtypeExpr addition there? We can use your approach of adding ConvertRowtypeExpr at the end in all the cases. > >> A ConvertRowtypeExpr >> is used to fix this inconsistency. That's why I chose to use >> pull_var_clause() as a place to fix the problem and not fix >> ConvertRowtypeExpr in targetlist itself. > > > I think the biggest issue in the original patch you proposed is that we > spend extra cycles where partitioning is not involved, which is the biggest > reason why I think the original patch isn't the right way to go. When there are no partitions involved, there won't be any ConvertRowtypeExprs there which means the function is_converted_whole_row_reference() would just return from the first line checking IsA() and nullness of node. pull_var_clause() is structured so that it always incurs IsA() cost whether or not the corresponding type of node is present in the query or not. E.g. even if there are no aggregates in the query, we will still incur some cycles checking IsA(Aggref, node) when that function is called. We could restructure the condition as IsA(node, ConvertRowtypeExpr) && is_converted_whole_row_reference(node) to avoid even the extra cycles spent in function call. > >> I am also not comfortable in just avoiding ConvertRowtypeExprs in >> targetlist and leave them as is in the conditions and ECs etc. This >> means searching a ConvertRowtypeExpr e.g. for creating pathkeys in >> targetlist will fail at the time of path creation but will succeed at >> the time of plan creation. >> >> This is turning more invasive that my approach of fixing it through PVC. > > > Sorry, I don't understand this. Could you show me places where the problem > happens? I think breaking a fundamental rule like this itself is invasive. > >>>> + /* The child plan should be able to do projection */ >>>> + Assert(is_projection_capable_plan(subplan)); >>>> + >>>> Again why? A MergeAppend's subplan could be a Sort plan, which will >>>> not be projection capable. >>> >>> >>> >>> IIUC, since we are called here right after create_plan_recurse, the child >>> plan would be a scan or join unless it's neither Append nor MergeAppend. >>> So >>> it should be projection-capable. Maybe I'm missing something, though. >> >> >> That's not true. add_paths_to_append_rel() adds sort paths on scan if >> necessary and creates merge append path. > > > Really? I think create_merge_append_path called from > generate_mergeappend_paths called from add_paths_to_append_rel uses a dummy > sort path just to compute the cost, but I don't think > create_merge_append_path (or generate_mergeappend_paths or > add_paths_to_append_rel) insert a sort path to a scan (or join) path. You are right. We add sort after adjust_subplan_tlist() has been called. So, we are fine there. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/07/04 19:11), Ashutosh Bapat wrote: > On Wed, Jul 4, 2018 at 3:36 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> >> I don't produce a test case where the plan is an Append with Gather >> subplans, but I'm not sure that it's a good thing to allow us to consider >> such a plan because in that plan, each Gather would try to grab its own pool >> of workers. Am I missing something? > > If the overall join can not use parallelism, having individual child > joins use parallel query might turn out to be efficient even if done > one child at a time. Parallel append drastically reduces the cases > where something like could be useful, but I don't think we can > theoretically eliminate the need for such a plan. In the case where scanjoin_target_parallel_safe=false, we actually will consider such parallel paths using the existing partial paths for the parent appendrel in the code path shown in a previous email (note: we would already have done generate_partitionwise_join_paths or set_append_rel_pathlist for the parent appendrel in query_planner.) For such a parallel path, we might need to do a projection in the Gather node for generating the SRF-free scan/join target, but I think that would be still efficient. So, I'm not sure that we really need to create child Gathers and generate an Append with those Gathers, as you mentioned. Another thing I noticed is: actually, we don't produce an Append with child Gathers in apply_scanjoin_target_to_paths, which I thought we would do that in the case of scanjoin_target_parallel_safe=false, but I noticed I was wrong. Sorry for that. The reason is because in that case, even if we create new partial Append paths with child Gathers, we don't run generate_gatehr_paths for the newly created partial paths at the end of that function shown below, since the parent's consider_parallel flag is set to false in that case: /* * Consider generating Gather or Gather Merge paths. We must only do this * if the relation is parallel safe, and we don't do it for child rels to * avoid creating multiple Gather nodes within the same plan. We must do * this after all paths have been generated and before set_cheapest, since * one of the generated paths may turn out to be the cheapest one. */ if (rel->consider_parallel && !IS_OTHER_REL(rel)) generate_gather_paths(root, rel, false); Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/07/05 20:15), Etsuro Fujita wrote: > (2018/07/04 19:11), Ashutosh Bapat wrote: >> On Wed, Jul 4, 2018 at 3:36 PM, Etsuro Fujita >> <fujita.etsuro@lab.ntt.co.jp> wrote: >>> >>> I don't produce a test case where the plan is an Append with Gather >>> subplans, but I'm not sure that it's a good thing to allow us to >>> consider >>> such a plan because in that plan, each Gather would try to grab its >>> own pool >>> of workers. Am I missing something? >> >> If the overall join can not use parallelism, having individual child >> joins use parallel query might turn out to be efficient even if done >> one child at a time. Parallel append drastically reduces the cases >> where something like could be useful, but I don't think we can >> theoretically eliminate the need for such a plan. > > In the case where scanjoin_target_parallel_safe=false, we actually will > consider such parallel paths using the existing partial paths for the > parent appendrel in the code path shown in a previous email (note: we > would already have done generate_partitionwise_join_paths or > set_append_rel_pathlist for the parent appendrel in query_planner.) For > such a parallel path, we might need to do a projection in the Gather > node for generating the SRF-free scan/join target, but I think that > would be still efficient. So, I'm not sure that we really need to create > child Gathers and generate an Append with those Gathers, as you mentioned. > > Another thing I noticed is: actually, we don't produce an Append with > child Gathers in apply_scanjoin_target_to_paths, which I thought we > would do that in the case of scanjoin_target_parallel_safe=false, but I > noticed I was wrong. Sorry for that. The reason is because in that case, > even if we create new partial Append paths with child Gathers, we don't > run generate_gatehr_paths for the newly created partial paths at the end > of that function shown below, since the parent's consider_parallel flag > is set to false in that case: > > /* > * Consider generating Gather or Gather Merge paths. We must only do this > * if the relation is parallel safe, and we don't do it for child rels to > * avoid creating multiple Gather nodes within the same plan. We must do > * this after all paths have been generated and before set_cheapest, since > * one of the generated paths may turn out to be the cheapest one. > */ > if (rel->consider_parallel && !IS_OTHER_REL(rel)) > generate_gather_paths(root, rel, false); One thing to add is: ISTM we need some cleanup for apply_scanjoin_target_to_paths. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Thu, Jul 5, 2018 at 4:45 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > > > In the case where scanjoin_target_parallel_safe=false, we actually will > consider such parallel paths using the existing partial paths for the parent > appendrel in the code path shown in a previous email (note: we would already > have done generate_partitionwise_join_paths or set_append_rel_pathlist for > the parent appendrel in query_planner.) For such a parallel path, we might > need to do a projection in the Gather node for generating the SRF-free > scan/join target, but I think that would be still efficient. So, I'm not > sure that we really need to create child Gathers and generate an Append with > those Gathers, as you mentioned. I don't think it will be Append of Gathers, but Append where one of the subplans is Gather. I don't think that possibility is completely eliminated. > > Another thing I noticed is: actually, we don't produce an Append with child > Gathers in apply_scanjoin_target_to_paths, which I thought we would do that > in the case of scanjoin_target_parallel_safe=false, but I noticed I was > wrong. Sorry for that. The reason is because in that case, even if we > create new partial Append paths with child Gathers, we don't run > generate_gatehr_paths for the newly created partial paths at the end of that > function shown below, since the parent's consider_parallel flag is set to > false in that case: > > /* > * Consider generating Gather or Gather Merge paths. We must only do > this > * if the relation is parallel safe, and we don't do it for child rels > to > * avoid creating multiple Gather nodes within the same plan. We must do > * this after all paths have been generated and before set_cheapest, > since > * one of the generated paths may turn out to be the cheapest one. > */ > if (rel->consider_parallel && !IS_OTHER_REL(rel)) > generate_gather_paths(root, rel, false); Hmm. I don't think that's a great idea since such a plan might turn out cheaper esp. when there are very few children which could use parallel query and parallel append is possible at the top parent. But anyway, that's what it is today. But I think, we shouldn't write code assuming that an Append will never see a Gather below it. We might see some plans like that in future and need to change your code. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/07/05 22:04), Ashutosh Bapat wrote: > On Thu, Jul 5, 2018 at 4:45 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> Another thing I noticed is: actually, we don't produce an Append with child >> Gathers in apply_scanjoin_target_to_paths, which I thought we would do that >> in the case of scanjoin_target_parallel_safe=false, but I noticed I was >> wrong. Sorry for that. The reason is because in that case, even if we >> create new partial Append paths with child Gathers, we don't run >> generate_gatehr_paths for the newly created partial paths at the end of that >> function shown below, since the parent's consider_parallel flag is set to >> false in that case: > Hmm. I don't think that's a great idea since such a plan might turn > out cheaper esp. when there are very few children which could use > parallel query and parallel append is possible at the top parent. Actually, I don't think we can create such a Parallel Append, because in the case where we have scanjoin_target_parallel_safe=false, child paths would not be parallel-safe anymore as those paths are rewritten to have parallel-unsafe targets in the recursion of apply_scanjoin_target_to_paths. > anyway, that's what it is today. But I think, we shouldn't write code > assuming that an Append will never see a Gather below it. We might see > some plans like that in future and need to change your code. I agree that we might allow such an Append in future, but I'm not sure we should write code for that in PG11, because 1) that would make code complicated than necessary and 2) that might cause overlooking of unexpected behavior of the planner; if the planner created such an Append erroneously, we would run that by that code without noticing that. To reduce the maintenance burden and the overlooking risk, I think it's better to extend code for such a plan when we allow that plan for a partitionwise join. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/07/04 21:37), Ashutosh Bapat wrote: > On Wed, Jul 4, 2018 at 5:36 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> (2018/07/04 19:04), Ashutosh Bapat wrote: >>> On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita >>> <fujita.etsuro@lab.ntt.co.jp> wrote: >>>> (2018/06/22 22:54), Ashutosh Bapat wrote: >>>>> + if (enable_partitionwise_join&& >>>>> rel->top_parent_is_partitioned) >>>>> + { >>>>> + build_childrel_tlist(root, rel, childrel, 1,&appinfo); >>>>> + } >>>>> >>>>> Why do we need rel->top_parent_is_partitioned? If a relation is >>>>> partitioned (if (rel->part_scheme), it's either the top parent or is >>>>> partition of some other partitioned table. In either case this >>>>> condition will be true. >>>> This would be needed to avoid unnecessarily applying build_childrel_tlist >>>> to >>>> child rels of a partitioned table for which we don't consider >>>> partitionwise >>>> join. Consider: >>>> >>>> postgres=# create table lpt (c1 int, c2 text) partition by list (c1); >>>> CREATE TABLE >>>> postgres=# create table lpt_p1 partition of lpt for values in (1); >>>> CREATE TABLE >>>> postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text); >>>> CREATE TABLE >>>> postgres=# create table test (c1 int, c2 text); >>>> CREATE TABLE >>>> postgres=# explain verbose select * from (select * from lpt union all >>>> select >>>> * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1); >> I might misunderstand your words, but in the above example the patch doesn't >> apply build_childrel_tlist to lpt_p1 and lpt_p2. The reason for that is >> because we can avoid adjusting the tlists for the corresponding subplans at >> plan creation time so that whole-row Vars in the tlists are transformed into >> CREs. I think the overhead of the adjustment is not that big, but not zero, >> so it would be worth avoiding applying build_childrel_tlist to partitions if >> the top parent won't participate in a partitionwise-join at all. > > I don't think that answers my question. When we join lpt with test, > your patch will apply build_childrel_tlist() to lpt_p1 and lpt_p2 even > when join between lpt and test is not going to use partition-wise > join. Why? Maybe my explanation including the example was not good. Sorry about that, but my patch will *not* apply build_childrel_tlist to lpt_p1 and lpt_p2 since the top parent of lpt_p1 and lpt_p2 is the UNION ALL subquery and hence not a partitioned table (ie, we have rel->top_parent_is_partitioned=false for lpt_p1 and lpt_p2). > As per your explanation, the condition "if > (enable_partitionwise_join&& rel->top_parent_is_partitioned)" is > used to avoid applying build_childrel_tlist() when partition-wise join > won't be possible. But it's not covering all the cases. Let me explain about that: 1) my patch won't apply that function to a child if its top parent is an appendrel built from a UNION ALL subquery, even though the child is a partition of a partitioned table pulled up from a leaf subquery into the parent query, like lpt_p1, and 2) my patch will apply that function to a child if its top parent is a partitioned table, whether or not the partitioned table is involved in a partitionwise join. By #1, we avoid the adjustment work at plan creation time, as explained above. It might be worth trying to be smarter about #2 (for example, in the case of a join of a partitioned table and a non-partitioned table, since we don't consider a partitionwise join for that join, it's better to not apply that function to partitions of the partitioned table, to avoid the adjustment work at plan creation time), but ISTM we don't have enough information to be smarter. >>> An >>> in-between state will produce a hell lot of confusion for any further >>> optimization. Whenever we change the code around partition-wise >>> operations in future, we will have to check whether or not a given >>> child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I >>> have mentioned earlier, I am also not comfortable with the targetlist >>> of child relations being type inconsistent with that of the parent, >>> which is a fundamental rule in inheritance. Worst keep them >>> inconsistent during path creation and make them consistent at the time >>> of creating plans. A child's whole-row Var has datatype of the child >>> where as that of parent has datatype of parent. >> >> >> I don't see any critical issue here. Could you elaborate a bit more on that >> point? > > I think breaking a fundamental rule like this itself is critical. But > interestingly I am not able to find a case where it becomes a problem. > But may be I haven't tried enough. And the question is if it's not > required to have the targetlists type consistent, why even bother with > ConvertRowtypeExpr addition there? We can use your approach of adding > ConvertRowtypeExpr at the end in all the cases. I think that the tlist of a (not-the-topmost) child relation doesn't need to be type-consistent with that of the parent; it has only to include all Vars that are needed for higher joinquals and final output to the parent appendrel. (In other words, I think we can build the tlist in the same manner as we build tlists of base or join relations in the main join tree.) On the other hand, other expressions such as WHERE quals need to be type-consistent with those of the parent; else we would create a plan that produces the wrong result. So, with my patch we carry out special handling for the tlist; it includes a child whole-row Var, if needed, instead of a CRE converting the Var to the parent rowtype. That allows us to create/process child-join plans using the existing planner functions as-is. >>> A ConvertRowtypeExpr >>> is used to fix this inconsistency. That's why I chose to use >>> pull_var_clause() as a place to fix the problem and not fix >>> ConvertRowtypeExpr in targetlist itself. >> >> >> I think the biggest issue in the original patch you proposed is that we >> spend extra cycles where partitioning is not involved, which is the biggest >> reason why I think the original patch isn't the right way to go. > > When there are no partitions involved, there won't be any > ConvertRowtypeExprs there which means the function > is_converted_whole_row_reference() would just return from the first > line checking IsA() and nullness of node. Really? IIRC, with the original patch we would spend extra cycles in a non-partitioned inheritance processing [1]. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/5AFC0865.8050802%40lab.ntt.co.jp
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Fri, Jul 6, 2018 at 4:29 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2018/07/04 21:37), Ashutosh Bapat wrote: >> >> On Wed, Jul 4, 2018 at 5:36 PM, Etsuro Fujita >> <fujita.etsuro@lab.ntt.co.jp> wrote: >>> >>> (2018/07/04 19:04), Ashutosh Bapat wrote: >>>> >>>> On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita >>>> <fujita.etsuro@lab.ntt.co.jp> wrote: >>>>> >>>>> (2018/06/22 22:54), Ashutosh Bapat wrote: >>>>>> >>>>>> + if (enable_partitionwise_join&& >>>>>> rel->top_parent_is_partitioned) >>>>>> + { >>>>>> + build_childrel_tlist(root, rel, childrel, 1,&appinfo); >>>>>> + } >>>>>> >>>>>> Why do we need rel->top_parent_is_partitioned? If a relation is >>>>>> partitioned (if (rel->part_scheme), it's either the top parent or is >>>>>> partition of some other partitioned table. In either case this >>>>>> condition will be true. > > >>>>> This would be needed to avoid unnecessarily applying >>>>> build_childrel_tlist >>>>> to >>>>> child rels of a partitioned table for which we don't consider >>>>> partitionwise >>>>> join. Consider: >>>>> >>>>> postgres=# create table lpt (c1 int, c2 text) partition by list (c1); >>>>> CREATE TABLE >>>>> postgres=# create table lpt_p1 partition of lpt for values in (1); >>>>> CREATE TABLE >>>>> postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text); >>>>> CREATE TABLE >>>>> postgres=# create table test (c1 int, c2 text); >>>>> CREATE TABLE >>>>> postgres=# explain verbose select * from (select * from lpt union all >>>>> select >>>>> * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1); > > >>> I might misunderstand your words, but in the above example the patch >>> doesn't >>> apply build_childrel_tlist to lpt_p1 and lpt_p2. The reason for that is >>> because we can avoid adjusting the tlists for the corresponding subplans >>> at >>> plan creation time so that whole-row Vars in the tlists are transformed >>> into >>> CREs. I think the overhead of the adjustment is not that big, but not >>> zero, >>> so it would be worth avoiding applying build_childrel_tlist to partitions >>> if >>> the top parent won't participate in a partitionwise-join at all. >> >> >> I don't think that answers my question. When we join lpt with test, >> your patch will apply build_childrel_tlist() to lpt_p1 and lpt_p2 even >> when join between lpt and test is not going to use partition-wise >> join. Why? > > > Maybe my explanation including the example was not good. Sorry about that, > but my patch will *not* apply build_childrel_tlist to lpt_p1 and lpt_p2 > since the top parent of lpt_p1 and lpt_p2 is the UNION ALL subquery and > hence not a partitioned table (ie, we have > rel->top_parent_is_partitioned=false for lpt_p1 and lpt_p2). > >> As per your explanation, the condition "if >> (enable_partitionwise_join&& rel->top_parent_is_partitioned)" is >> used to avoid applying build_childrel_tlist() when partition-wise join >> won't be possible. But it's not covering all the cases. > > > Let me explain about that: 1) my patch won't apply that function to a child > if its top parent is an appendrel built from a UNION ALL subquery, even > though the child is a partition of a partitioned table pulled up from a leaf > subquery into the parent query, like lpt_p1, and 2) my patch will apply that > function to a child if its top parent is a partitioned table, whether or not > the partitioned table is involved in a partitionwise join. By #1, we avoid > the adjustment work at plan creation time, as explained above. It might be > worth trying to be smarter about #2 (for example, in the case of a join of a > partitioned table and a non-partitioned table, since we don't consider a > partitionwise join for that join, it's better to not apply that function to > partitions of the partitioned table, to avoid the adjustment work at plan > creation time), but ISTM we don't have enough information to be smarter. That looks like a kludge to me rather than a proper fix. It's not clear to me as to when somebody can expect ConvertRowtypeExpr in the targetlist and when don't while creating paths and to an extent plans. For example, inside add_paths_to_append_rel() or in apply_scanjoin_target_to_paths() or for that matter any path creation or plan creation function, we will sometimes get targetlists with ConvertRowtypeExpr() and sometime not. How do we know which is when. > >>>> A ConvertRowtypeExpr >>>> is used to fix this inconsistency. That's why I chose to use >>>> pull_var_clause() as a place to fix the problem and not fix >>>> ConvertRowtypeExpr in targetlist itself. >>> >>> >>> >>> I think the biggest issue in the original patch you proposed is that we >>> spend extra cycles where partitioning is not involved, which is the >>> biggest >>> reason why I think the original patch isn't the right way to go. >> >> >> When there are no partitions involved, there won't be any >> ConvertRowtypeExprs there which means the function >> is_converted_whole_row_reference() would just return from the first >> line checking IsA() and nullness of node. > > > Really? IIRC, with the original patch we would spend extra cycles in a > non-partitioned inheritance processing [1]. As I said, we do spend cycles in that function testing whether a node is Aggref or not even when the query doesn't have aggregates or grouping OR spend cycles in testing whether a node is a PlaceHolderVar when the query doesn't produce any. So, I don't see any problem with spending a few cycles testing whether a node is ConvertRowtypeExpr or not when a ConvertRowtypeExpr is not in the query or command. That's not a huge performance trouble. I would be happy to change my mind, if you show me performance different with and without this patch in planning. I haven't seen any. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/07/06 20:20), Ashutosh Bapat wrote: > On Fri, Jul 6, 2018 at 4:29 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> (2018/07/04 21:37), Ashutosh Bapat wrote: >>> On Wed, Jul 4, 2018 at 5:36 PM, Etsuro Fujita >>> <fujita.etsuro@lab.ntt.co.jp> wrote: >> Let me explain about that: 1) my patch won't apply that function to a child >> if its top parent is an appendrel built from a UNION ALL subquery, even >> though the child is a partition of a partitioned table pulled up from a leaf >> subquery into the parent query, like lpt_p1, and 2) my patch will apply that >> function to a child if its top parent is a partitioned table, whether or not >> the partitioned table is involved in a partitionwise join. By #1, we avoid >> the adjustment work at plan creation time, as explained above. It might be >> worth trying to be smarter about #2 (for example, in the case of a join of a >> partitioned table and a non-partitioned table, since we don't consider a >> partitionwise join for that join, it's better to not apply that function to >> partitions of the partitioned table, to avoid the adjustment work at plan >> creation time), but ISTM we don't have enough information to be smarter. > > That looks like a kludge to me rather than a proper fix. It's not > clear to me as to when somebody can expect ConvertRowtypeExpr in the > targetlist and when don't while creating paths and to an extent plans. > For example, inside add_paths_to_append_rel() or in > apply_scanjoin_target_to_paths() or for that matter any path creation > or plan creation function, we will sometimes get targetlists with > ConvertRowtypeExpr() and sometime not. How do we know which is when. That is known from a new flag "has_child_wholerow" added to RelOptInfo to indicate whether an other relation's reltarget has child wholerow Vars instead of ConvertRowtypeExprs. That flag is initialized to false and only set to true in build_childrel_tlist if it adds child wholerow Vars to the reltarget of a given other relation. Though, I don't think we need to be conscious about whether the reltarget has child wholerow Vars or ConvertRowtypeExprs when creating paths; we would just create paths based on the reltarget. What we need to be careful about is when creating an Append/MergeAppend plan; we have to adjust the subplan's tlist so child wholerow Vars in that tlist are converted to the corresponding ConvertRowtypeExprs, if that tlist has those Vars, which is also known from the new flag. >>>> I think the biggest issue in the original patch you proposed is that we >>>> spend extra cycles where partitioning is not involved, which is the >>>> biggest >>>> reason why I think the original patch isn't the right way to go. >>> >>> >>> When there are no partitions involved, there won't be any >>> ConvertRowtypeExprs there which means the function >>> is_converted_whole_row_reference() would just return from the first >>> line checking IsA() and nullness of node. >> >> >> Really? IIRC, with the original patch we would spend extra cycles in a >> non-partitioned inheritance processing [1]. > > As I said, we do spend cycles in that function testing whether a node > is Aggref or not even when the query doesn't have aggregates or > grouping OR spend cycles in testing whether a node is a PlaceHolderVar > when the query doesn't produce any. So, I don't see any problem with > spending a few cycles testing whether a node is ConvertRowtypeExpr or > not when a ConvertRowtypeExpr is not in the query or command. That's > not a huge performance trouble. I would be happy to change my mind, if > you show me performance different with and without this patch in > planning. I haven't seen any. I have to admit that the case in [1] wouldn't affect the performance, but my concern is that there might be some cases where the test affects performance. In contrast, in the approach I proposed, we wouldn't need to worry about that because the approach does not modify code that is not related to partitioning. I think that's a good thing. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Mon, Jul 9, 2018 at 4:33 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: >> >> >> As I said, we do spend cycles in that function testing whether a node >> is Aggref or not even when the query doesn't have aggregates or >> grouping OR spend cycles in testing whether a node is a PlaceHolderVar >> when the query doesn't produce any. So, I don't see any problem with >> spending a few cycles testing whether a node is ConvertRowtypeExpr or >> not when a ConvertRowtypeExpr is not in the query or command. That's >> not a huge performance trouble. I would be happy to change my mind, if >> you show me performance different with and without this patch in >> planning. I haven't seen any. > > > I have to admit that the case in [1] wouldn't affect the performance, but my > concern is that there might be some cases where the test affects > performance. What are those cases? Can you please provide any numbers supporting your claim? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/07/09 20:06), Ashutosh Bapat wrote: > On Mon, Jul 9, 2018 at 4:33 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >>> As I said, we do spend cycles in that function testing whether a node >>> is Aggref or not even when the query doesn't have aggregates or >>> grouping OR spend cycles in testing whether a node is a PlaceHolderVar >>> when the query doesn't produce any. So, I don't see any problem with >>> spending a few cycles testing whether a node is ConvertRowtypeExpr or >>> not when a ConvertRowtypeExpr is not in the query or command. That's >>> not a huge performance trouble. I would be happy to change my mind, if >>> you show me performance different with and without this patch in >>> planning. I haven't seen any. >> >> >> I have to admit that the case in [1] wouldn't affect the performance, but my >> concern is that there might be some cases where the test affects >> performance. > > What are those cases? Can you please provide any numbers supporting your claim? I don't have any numbers right now, so that is nothing but a concern. But as I said in a previous email, in the approach I proposed, we don't need to spend extra cycles where partitioning is not involved. I think that is a good thing in itself. No? Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
> > I don't have any numbers right now, so that is nothing but a concern. But as > I said in a previous email, in the approach I proposed, we don't need to > spend extra cycles where partitioning is not involved. I think that is a > good thing in itself. No? At the cost of having targetlist being type inconsistent. I don't have any testcase either to show that that's a problem in practice. So, it's a trade-off of a concern vs concern. Apart from that, in your approach there are extra cycles spent in traversing the targetlist to add ConvertRowtypeExpr, albeit only when there is a whole-row expression in the targetlist, when creating plans. That's not there in my patch. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/07/09 20:43), Ashutosh Bapat wrote: >> >> I don't have any numbers right now, so that is nothing but a concern. But as >> I said in a previous email, in the approach I proposed, we don't need to >> spend extra cycles where partitioning is not involved. I think that is a >> good thing in itself. No? > > At the cost of having targetlist being type inconsistent. I don't have > any testcase either to show that that's a problem in practice. So, > it's a trade-off of a concern vs concern. As I said before, I don't see any issue in having such a targetlist, but I might be missing something, so I'd like to discuss a bit more about that. Could you tell me the logic/place in the PG code where you think the problem might occur. (IIRC, you mentioned something about that before (pathkeys? or index-only scans?), but sorry, I don't understand that.) > Apart from that, in your approach there are extra cycles spent in > traversing the targetlist to add ConvertRowtypeExpr, albeit only when > there is a whole-row expression in the targetlist, when creating > plans. That's not there in my patch. Right. That's unfortunate, but I think that that would be still better than needing to spent extra cycles where partitioning is not involved. And ISTM that the traversing cost is not that large compared to the cost we pay before that for query planning for a partitionwise join. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Tue, Jul 10, 2018 at 9:08 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2018/07/09 20:43), Ashutosh Bapat wrote: >>> >>> >>> I don't have any numbers right now, so that is nothing but a concern. But >>> as >>> I said in a previous email, in the approach I proposed, we don't need to >>> spend extra cycles where partitioning is not involved. I think that is a >>> good thing in itself. No? >> >> >> At the cost of having targetlist being type inconsistent. I don't have >> any testcase either to show that that's a problem in practice. So, >> it's a trade-off of a concern vs concern. > > > As I said before, I don't see any issue in having such a targetlist, but I > might be missing something, so I'd like to discuss a bit more about that. > Could you tell me the logic/place in the PG code where you think the problem > might occur. (IIRC, you mentioned something about that before (pathkeys? or > index-only scans?), but sorry, I don't understand that.) IIUC, index-only scan will be used when the all the required columns are covered by an index. If there is an index on the whole-row reference of the parent, it will be translated into a ConvertRowtypeExpr of the child when an index on the child is created. If the targetlist doesn't have ConvertRowtypeExpr, as your patch does, the planner won't be able to use such an index on the child table. But I couldn't create an index with a whole-row reference in it. So, I think this isn't possible right now. But in future if we allow creating index on whole-row reference or Pathkey points to an equivalence class, which contains equivalence members. A parent equivalence class member containing a whole-row reference gets translated into a child equivalence member containing a ConvertRowtypeExpr. At places in planner we match equivalence members to the targetlist entries. This matching will fail unexpectedly when ConvertRowtypeExpr is removed from a child's targetlist. But again I couldn't reproduce a problem when such a mismatch arises. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/07/10 19:58), Ashutosh Bapat wrote: > On Tue, Jul 10, 2018 at 9:08 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> (2018/07/09 20:43), Ashutosh Bapat wrote: >>> At the cost of having targetlist being type inconsistent. I don't have >>> any testcase either to show that that's a problem in practice. >> As I said before, I don't see any issue in having such a targetlist, but I >> might be missing something, so I'd like to discuss a bit more about that. >> Could you tell me the logic/place in the PG code where you think the problem >> might occur. (IIRC, you mentioned something about that before (pathkeys? or >> index-only scans?), but sorry, I don't understand that.) > > IIUC, index-only scan will be used when the all the required columns > are covered by an index. If there is an index on the whole-row > reference of the parent, it will be translated into a > ConvertRowtypeExpr of the child when an index on the child is created. I think so too. > If the targetlist doesn't have ConvertRowtypeExpr, as your patch does, > the planner won't be able to use such an index on the child table. But > I couldn't create an index with a whole-row reference in it. So, I > think this isn't possible right now. Actually, even if we could create such an index on the child table and the targetlist had the ConvertRowtypeExpr, the planner would still not be able to use an index-only scan with that index; because check_index_only would not consider that an index-only scan is possible for that index because that index is an expression index and that function currently does not consider that index expressions are able to be returned back in an index-only scan. That behavior of the planner might be improved in future, though. > Pathkey points to an equivalence class, which contains equivalence > members. A parent equivalence class member containing a whole-row > reference gets translated into a child equivalence member containing a > ConvertRowtypeExpr. I think so too. > At places in planner we match equivalence members > to the targetlist entries. This matching will fail unexpectedly when > ConvertRowtypeExpr is removed from a child's targetlist. But again I > couldn't reproduce a problem when such a mismatch arises. IIUC, I don't think the planner assumes that for an equivalence member there is an matching entry for that member in the targetlist; what I think the planner assumes is: an equivalence member is able to be computed from expressions in the targetlist. So, I think it is safe to have whole-row Vars instead of ConvertRowtypeExprs in the targetlist. Thanks for the explanation! Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Wed, Jul 11, 2018 at 1:23 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > > > Actually, even if we could create such an index on the child table and the > targetlist had the ConvertRowtypeExpr, the planner would still not be able > to use an index-only scan with that index; because check_index_only would > not consider that an index-only scan is possible for that index because that > index is an expression index and that function currently does not consider > that index expressions are able to be returned back in an index-only scan. > That behavior of the planner might be improved in future, though. > >> Pathkey points to an equivalence class, which contains equivalence >> members. A parent equivalence class member containing a whole-row >> reference gets translated into a child equivalence member containing a >> ConvertRowtypeExpr. Right and when we do so, not having ConvertRowtypeExpr in the targetlist will be a problem. > > > I think so too. > >> At places in planner we match equivalence members >> to the targetlist entries. This matching will fail unexpectedly when >> ConvertRowtypeExpr is removed from a child's targetlist. But again I >> couldn't reproduce a problem when such a mismatch arises. > > > IIUC, I don't think the planner assumes that for an equivalence member there > is an matching entry for that member in the targetlist; what I think the > planner assumes is: an equivalence member is able to be computed from > expressions in the targetlist. This is true. However, > So, I think it is safe to have whole-row > Vars instead of ConvertRowtypeExprs in the targetlist. when it's looking for an expression, it finds a whole-row expression so it think it needs to add a ConvertRowtypeExpr on that. But when the plan is created, there is ConvertRowtypeExpr already, but there is no way to know that a new ConvertRowtypeExpr is not needed anymore. So, we may have two ConvertRowtypeExprs giving wrong results. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/07/11 20:02), Ashutosh Bapat wrote: > On Wed, Jul 11, 2018 at 1:23 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> Actually, even if we could create such an index on the child table and the >> targetlist had the ConvertRowtypeExpr, the planner would still not be able >> to use an index-only scan with that index; because check_index_only would >> not consider that an index-only scan is possible for that index because that >> index is an expression index and that function currently does not consider >> that index expressions are able to be returned back in an index-only scan. >> That behavior of the planner might be improved in future, though. > Right and when we do so, not having ConvertRowtypeExpr in the > targetlist will be a problem. Yeah, but I don't think that that's unsolvable; because in that case the CRE as an index expression could be converted back to the whole-row Var's rowtype by adding another CRE to the index expression for that conversion, I suspect that that special handling could allow us to support an index-only scan even when having the whole-row Var instead of the CRE in the targetlist. (Having said that, I'm not 100% sure we need to solve that problem when we improve the planner, because there doesn't seem to me to be enough use-case to justify making the code complicated for that.) Anyway, I think that that would be a matter of future versions of PG. >>> At places in planner we match equivalence members >>> to the targetlist entries. This matching will fail unexpectedly when >>> ConvertRowtypeExpr is removed from a child's targetlist. But again I >>> couldn't reproduce a problem when such a mismatch arises. >> >> >> IIUC, I don't think the planner assumes that for an equivalence member there >> is an matching entry for that member in the targetlist; what I think the >> planner assumes is: an equivalence member is able to be computed from >> expressions in the targetlist. > > This is true. However, > >> So, I think it is safe to have whole-row >> Vars instead of ConvertRowtypeExprs in the targetlist. > > when it's looking for an expression, it finds a whole-row expression > so it think it needs to add a ConvertRowtypeExpr on that. But when the > plan is created, there is ConvertRowtypeExpr already, but there is no > way to know that a new ConvertRowtypeExpr is not needed anymore. So, > we may have two ConvertRowtypeExprs giving wrong results. Maybe I'm missing something, but I don't think that we need to worry about that, because in the approach I proposed, we only add CREs above whole-row Vars in the targetlists for subplans of an Append/MergeAppend for a partitioned relation at plan creation time. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Thu, Jul 12, 2018 at 9:02 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2018/07/11 20:02), Ashutosh Bapat wrote: >> >> On Wed, Jul 11, 2018 at 1:23 PM, Etsuro Fujita >> <fujita.etsuro@lab.ntt.co.jp> wrote: >>> >>> Actually, even if we could create such an index on the child table and >>> the >>> targetlist had the ConvertRowtypeExpr, the planner would still not be >>> able >>> to use an index-only scan with that index; because check_index_only would >>> not consider that an index-only scan is possible for that index because >>> that >>> index is an expression index and that function currently does not >>> consider >>> that index expressions are able to be returned back in an index-only >>> scan. >>> That behavior of the planner might be improved in future, though. > > >> Right and when we do so, not having ConvertRowtypeExpr in the >> targetlist will be a problem. > > > Yeah, but I don't think that that's unsolvable; because in that case the CRE > as an index expression could be converted back to the whole-row Var's > rowtype by adding another CRE to the index expression for that conversion, I > suspect that that special handling could allow us to support an index-only > scan even when having the whole-row Var instead of the CRE in the > targetlist. I am not able to understand this. Can you please provide an example? > (Having said that, I'm not 100% sure we need to solve that > problem when we improve the planner, because there doesn't seem to me to be > enough use-case to justify making the code complicated for that.) Anyway, I > think that that would be a matter of future versions of PG. I am afraid that a fix which just works only till we change the other parts of the system is useful only till that time. At that time, it needs to be replaced with a different fix. If that time is long enough, that's ok. In this case, I agree that if we haven't heard from users till now that they need to create indexes on whole-row expressions, there's chance that we will never implement this feature. > >>>> At places in planner we match equivalence members >>>> to the targetlist entries. This matching will fail unexpectedly when >>>> ConvertRowtypeExpr is removed from a child's targetlist. But again I >>>> couldn't reproduce a problem when such a mismatch arises. >>> >>> >>> >>> IIUC, I don't think the planner assumes that for an equivalence member >>> there >>> is an matching entry for that member in the targetlist; what I think the >>> planner assumes is: an equivalence member is able to be computed from >>> expressions in the targetlist. >> >> >> This is true. However, >> >>> So, I think it is safe to have whole-row >>> Vars instead of ConvertRowtypeExprs in the targetlist. >> >> >> when it's looking for an expression, it finds a whole-row expression >> so it think it needs to add a ConvertRowtypeExpr on that. But when the >> plan is created, there is ConvertRowtypeExpr already, but there is no >> way to know that a new ConvertRowtypeExpr is not needed anymore. So, >> we may have two ConvertRowtypeExprs giving wrong results. > > > Maybe I'm missing something, but I don't think that we need to worry about > that, because in the approach I proposed, we only add CREs above whole-row > Vars in the targetlists for subplans of an Append/MergeAppend for a > partitioned relation at plan creation time. There's a patch in an adjacent thread started by David Rowley to rip out Append/MergeAppend when there is only one subplan. So, your solution won't work there. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/07/12 13:38), Ashutosh Bapat wrote: > On Thu, Jul 12, 2018 at 9:02 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> (2018/07/11 20:02), Ashutosh Bapat wrote: >>> On Wed, Jul 11, 2018 at 1:23 PM, Etsuro Fujita >>> <fujita.etsuro@lab.ntt.co.jp> wrote: >>>> Actually, even if we could create such an index on the child table and >>>> the >>>> targetlist had the ConvertRowtypeExpr, the planner would still not be >>>> able >>>> to use an index-only scan with that index; because check_index_only would >>>> not consider that an index-only scan is possible for that index because >>>> that >>>> index is an expression index and that function currently does not >>>> consider >>>> that index expressions are able to be returned back in an index-only >>>> scan. >>>> That behavior of the planner might be improved in future, though. >> >> >>> Right and when we do so, not having ConvertRowtypeExpr in the >>> targetlist will be a problem. >> >> >> Yeah, but I don't think that that's unsolvable; because in that case the CRE >> as an index expression could be converted back to the whole-row Var's >> rowtype by adding another CRE to the index expression for that conversion, I >> suspect that that special handling could allow us to support an index-only >> scan even when having the whole-row Var instead of the CRE in the >> targetlist. > > I am not able to understand this. Can you please provide an example? Sorry, my explanation was not good. Let me try. We can't create an inherited index on the whole-row reference in a partitioned table, but actually, we can directly create the corresponding index on a child table: postgres=# create table pt (a int, b text) partition by list (a); CREATE TABLE postgres=# create table ptp1 (b text, a int check (a in (1))); CREATE TABLE postgres=# alter table pt attach partition ptp1 for values in (1); ALTER TABLE postgres=# create index ptp1_conv_wr_idx on ptp1 ((ptp1::pt)); CREATE INDEX postgres=# insert into pt values (1, 'foo'); INSERT 0 1 postgres=# select *, ptp1 as wr, ptp1::pt as conv_wr from ptp1; b | a | wr | conv_wr -----+---+---------+--------- foo | 1 | (foo,1) | (1,foo) (1 row) In this example, the value of the whole-row reference to the child table ptp1 for that record is ('foo',1), and that of the index expression for that record is (1,'foo'). Those have different column orders, but the latter could be mapped to the former by a technique like do_convert_tuple. So, what I suspect is: by producing the value of the whole-row reference from that of the index expression like that, we could support index-only scans with such an index in the case where we have the whole-row reference in the targetlist, not the index expression itself. >> (Having said that, I'm not 100% sure we need to solve that >> problem when we improve the planner, because there doesn't seem to me to be >> enough use-case to justify making the code complicated for that.) Anyway, I >> think that that would be a matter of future versions of PG. > > I am afraid that a fix which just works only till we change the other > parts of the system is useful only till that time. At that time, it > needs to be replaced with a different fix. I agree on that point. > If that time is long > enough, that's ok. In this case, I agree that if we haven't heard from > users till now that they need to create indexes on whole-row > expressions, there's chance that we will never implement this feature. I don't object to adding that feature to future versions of PG if required. >>>> So, I think it is safe to have whole-row >>>> Vars instead of ConvertRowtypeExprs in the targetlist. >>> >>> >>> when it's looking for an expression, it finds a whole-row expression >>> so it think it needs to add a ConvertRowtypeExpr on that. But when the >>> plan is created, there is ConvertRowtypeExpr already, but there is no >>> way to know that a new ConvertRowtypeExpr is not needed anymore. So, >>> we may have two ConvertRowtypeExprs giving wrong results. >> >> >> Maybe I'm missing something, but I don't think that we need to worry about >> that, because in the approach I proposed, we only add CREs above whole-row >> Vars in the targetlists for subplans of an Append/MergeAppend for a >> partitioned relation at plan creation time. > > There's a patch in an adjacent thread started by David Rowley to rip > out Append/MergeAppend when there is only one subplan. So, your > solution won't work there. Thanks for sharing that information! I skimmed the thread. I haven't yet caught up with all the discussions there, so I think I'm missing something, but it looks like that we haven't yet reached any consensus on the way to go. In my opinion, I like the approach mentioned in [1]. And if we go that way, my patch seems to fit into that, because in that approach the Append/MergeAppend could be removed after adjusting the targetlists for its subplans in create_append_plan/create_merge_append_plan. Anyway, I'd like to join in that work for PG12. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/1867.1509032831%40sss.pgh.pa.us
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Thu, Jul 12, 2018 at 4:32 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > > In this example, the value of the whole-row reference to the child table > ptp1 for that record is ('foo',1), and that of the index expression for that > record is (1,'foo'). Those have different column orders, but the latter > could be mapped to the former by a technique like do_convert_tuple. So, > what I suspect is: by producing the value of the whole-row reference from > that of the index expression like that, The expression in this case would look like ptp1::pt::ptp1 which won't match targetlist expression ptp1. I am also doubtful that the planner will be able to deduce that it need to apply an inverse function of ::pt and what exactly such an inverse function is. So index only scan won't be picked. > we could support index-only scans > with such an index in the case where we have the whole-row reference in the > targetlist, not the index expression itself. Can you please show an index only scan path being created in this case? > >> If that time is long >> enough, that's ok. In this case, I agree that if we haven't heard from >> users till now that they need to create indexes on whole-row >> expressions, there's chance that we will never implement this feature. > > > I don't object to adding that feature to future versions of PG if required. > >>>>> So, I think it is safe to have whole-row >>>>> Vars instead of ConvertRowtypeExprs in the targetlist. >>>> >>>> >>>> >>>> when it's looking for an expression, it finds a whole-row expression >>>> so it think it needs to add a ConvertRowtypeExpr on that. But when the >>>> plan is created, there is ConvertRowtypeExpr already, but there is no >>>> way to know that a new ConvertRowtypeExpr is not needed anymore. So, >>>> we may have two ConvertRowtypeExprs giving wrong results. >>> >>> >>> >>> Maybe I'm missing something, but I don't think that we need to worry >>> about >>> that, because in the approach I proposed, we only add CREs above >>> whole-row >>> Vars in the targetlists for subplans of an Append/MergeAppend for a >>> partitioned relation at plan creation time. >> >> >> There's a patch in an adjacent thread started by David Rowley to rip >> out Append/MergeAppend when there is only one subplan. So, your >> solution won't work there. > > > Thanks for sharing that information! I skimmed the thread. I haven't yet > caught up with all the discussions there, so I think I'm missing something, > but it looks like that we haven't yet reached any consensus on the way to > go. In my opinion, I like the approach mentioned in [1]. And if we go that > way, my patch seems to fit into that, because in that approach the > Append/MergeAppend could be removed after adjusting the targetlists for its > subplans in create_append_plan/create_merge_append_plan. Anyway, I'd like > to join in that work for PG12. Whatever may be the outcome of that work, I think what we fix here shouldn't require to reverted in a few months from now, just so that that patch works. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/07/04 11:00), Etsuro Fujita wrote: > (2018/07/04 1:35), Andres Freund wrote: >> What's the plan forward here? > > I still think that this approach would be the right way to go, so I plan > to polish the patch. The approach would not require extra cycles where partitioning is not involved as discussed before, and makes code much simpler, which would make maintenance and the FDW author's life easy, so I still think that the approach would be the direction to go in. So, I updated the patch. Here are changes: * I assumed that the child plan passed to adjust_subplan_tlist was a scan or join unless it was Append or MergeAppend, but I noticed I was wrong; if the topmost scan/join relation is partitioned, apply_scanjoin_target_to_paths might have pushed down ProjectSet or Result to each of partitions. So, in that case the child's plan would have ProjectSet or Result atop the scan or join plan. (I could not produce such an example, though.) To address this, I modified build_childrel_tlist so that we just apply adjust_appendrel_attrs to the parent's targetlist to get the child's if the parent is the topmost scan/join relation. So, we don't need to care about that in adjust_subplan_tlist. I think this change would also increase the efficiency for that case. I added regression test cases for that as well. * I fixed an oversight in outfuncs.c. * I renamed member variables I added to RelOptInfo before, and revised code/comments a bit. Attached is an updated version of the patch. Best regards, Etsuro Fujita
Attachment
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/07/13 23:05), Ashutosh Bapat wrote: > On Thu, Jul 12, 2018 at 4:32 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> In this example, the value of the whole-row reference to the child table >> ptp1 for that record is ('foo',1), and that of the index expression for that >> record is (1,'foo'). Those have different column orders, but the latter >> could be mapped to the former by a technique like do_convert_tuple. > The expression in this case would look like ptp1::pt::ptp1 which won't > match targetlist expression ptp1. I am also doubtful that the planner > will be able to deduce that it need to apply an inverse function of > ::pt and what exactly such an inverse function is. So index only scan > won't be picked. >> we could support index-only scans >> with such an index in the case where we have the whole-row reference in the >> targetlist, not the index expression itself. > > Can you please show an index only scan path being created in this case? We currently don't consider index-only scan with index expressions, so I haven't thought in detail yet about how the planner would work. But once we have that index-only scan, I think we could extend that to the case mentioned above, by adding this to the planner: if the index expression is of the form var::parenttype, consider that (not only the expression itself but) var can be returned from the index. I think the expression like ptp1::pt::ptp1 would be useful to get the value of ptp1 from the index at execution time. >>> There's a patch in an adjacent thread started by David Rowley to rip >>> out Append/MergeAppend when there is only one subplan. So, your >>> solution won't work there. >> >> >> Thanks for sharing that information! I skimmed the thread. I haven't yet >> caught up with all the discussions there, so I think I'm missing something, >> but it looks like that we haven't yet reached any consensus on the way to >> go. In my opinion, I like the approach mentioned in [1]. And if we go that >> way, my patch seems to fit into that, because in that approach the >> Append/MergeAppend could be removed after adjusting the targetlists for its >> subplans in create_append_plan/create_merge_append_plan. Anyway, I'd like >> to join in that work for PG12. > > Whatever may be the outcome of that work, I think what we fix here > shouldn't require to reverted in a few months from now, just so that > that patch works. I think we could add that optimization without reverting this change because the essential part of this change is to make create_plan adjust the tlists of the subplans based on the instruction stored into the subplans' RelOptInfos (ie, need_adjust_tlist in the second version of the patch). I think this technique could be extended even to the case where we have that optimization. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Robert Haas
Date:
On Wed, Jul 18, 2018 at 8:00 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > [ new patch ] + /* + * If the child plan is an Append or MergeAppend, the targetlists of its + * subplans would have already been adjusted before we get here, so need + * to work here. + */ + if (IsA(subplan, Append) || IsA(subplan, MergeAppend)) + return; + + /* The child plan should be a scan or join */ + Assert(is_projection_capable_plan(subplan)); This looks like a terrible design to me. If somebody in future invents a new plan type that is not projection-capable, then this could fail an assertion here and there won't be any simple fix. And if you reach here and the child targetlists somehow haven't been adjusted, then you won't even fail an assertion, you'll just crash (or maybe return wrong answers). I'm going to study this some more now, but I really think this is going in the wrong direction. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Robert Haas
Date:
On Fri, Jul 20, 2018 at 8:38 AM, Robert Haas <robertmhaas@gmail.com> wrote: > This looks like a terrible design to me. If somebody in future > invents a new plan type that is not projection-capable, then this > could fail an assertion here and there won't be any simple fix. And > if you reach here and the child targetlists somehow haven't been > adjusted, then you won't even fail an assertion, you'll just crash (or > maybe return wrong answers). To elaborate on this thought a bit, it is currently the case that all scan and join types are projection-capable, and most likely we would make any such node types added in the future projection-capable as well, but it still doesn't seem like a good idea to me to have tangentially-related parts of the system depend on that in subtle ways. Also, even today, this would fail if the subplan happened to be a Sort, and it's not very obvious why that couldn't happen. If it can't happen today, it's not obvious why a future code change couldn't introduce cases where it can happen. More broadly, the central idea of this patch is that we can set the target list for a child rel to something other than the target list that we expect it will eventually produce, and then at plan creation time fix it. I think that's a bad idea. The target list affects lots of things, like costing. If we don't insert a ConvertRowTypeExpr into the child's target list, the costing will be wrong to the extent that ConvertRowTypeExpr has any cost, which it does. Any other current or future property that is computed based on the target list -- parallel-safety, width, security-level, whatever -- could also potentially end up with a wrong value if the target list as written does not match the target list as will actually be produced. It's true that, in all of these areas, the ConvertRowTypeExpr isn't likely to have a lot of impact; it probably won't have a big effect on costing and may not actually cause a problem for the other things that I mentioned. On the other hand, if it does cause a serious problem, what options would we have for fixing it other than to back out your entire patch and solve the problem some other way? The idea that derived properties won't be correct unless they are based on the real target list is pretty fundamental, and I think deviating from the idea that we get the correct target list first and then compute the derived properties afterward is likely to cause real headaches for future developers. In short, plan creation time is just the wrong time to change the plan. It's just a time to translate the plan from the form needed by the planner to the form needed by the executor. It would be fine if the change you were making were only cosmetic, but it's not. After looking over both patches, I think Ashutosh Bapat has basically the right idea. What he is proposing is a localized fix that doesn't seem to make any changes to the way things work overall. I do think his patches need to be fixed up a bit to avoid conflating ConvertRowtypeExpr with "converted whole-row reference." The two are certainly not equivalent; the latter is a narrow special case. Some of the comments could use some more wordsmithing, too, I think. Apart from those gripes, though, I think it's solving the problem the correct way: don't build the wrong plan and try to fix it, just build the right plan from the beginning. There are definitely some things not to like about this approach. In particular, I definitely agree that treating a converted whole-row reference specially is not very nice. It feels like it might be substantially cleaner to be able to represent this idea as a single node rather than a combination of ConvertRowtypeExpr and var with varattno = 0. Perhaps in the future we ought to consider either (a) trying to use a RowExpr for this rather than ConvertRowtypeExpr or (b) inventing a new WholeRowExpr node that stores two RTIs, one for the relation that's generating the whole-row reference and the other for the relation that is controlling the column ordering of the result or (c) allowing a Var to represent a whole-row expression that has to produce its outputs according to the ordering of some other RTE. But I don't think it's wise or necessary to whack that around just to fix this bug; it is refactoring or improvement work best left to a future release. Also, it is definitely a real shame that we have to build attr_needed data for each child separately. Right now, we've got to build a whole lot of things for each child individually, and that's slow and inefficient. We're not going to scale nicely to large partitioning hierarchies unless we can figure out some way of minimizing the work we do for each partition. So, the fact that Fujii-san's approach avoids needing to compute attr_needed in some cases is very appealing. However, in my opinion, it's nowhere near appealing enough to justify trying to do surgery on the target list at plan-creation time. I think we need to leave optimizing this to a future effort. Partition-wise join/aggregate, and partitioning in general, need a lot of optimization in a lot of places, and fortunately there are people working on that, but our goal here should just be to fix things up well enough that we can ship it. I don't see anything in Ashutosh's patch that is so ugly that we can't live with it for the time being. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/07/21 0:26), Robert Haas wrote: > On Fri, Jul 20, 2018 at 8:38 AM, Robert Haas<robertmhaas@gmail.com> wrote: >> This looks like a terrible design to me. If somebody in future >> invents a new plan type that is not projection-capable, then this >> could fail an assertion here and there won't be any simple fix. And >> if you reach here and the child targetlists somehow haven't been >> adjusted, then you won't even fail an assertion, you'll just crash (or >> maybe return wrong answers). > > To elaborate on this thought a bit, it is currently the case that all > scan and join types are projection-capable, and most likely we would > make any such node types added in the future projection-capable as > well, but it still doesn't seem like a good idea to me to have > tangentially-related parts of the system depend on that in subtle > ways. I have to admit that that is not a good idea. So, I'll update the patch so that we don't assume the projection capability of the subplan anymore, if we go this way. > Also, even today, this would fail if the subplan happened to be > a Sort, and it's not very obvious why that couldn't happen. You mean the MergeAppend case? In that case, the patch will adjust the subplan before adding a Sort above the subplan, so that could not happen. > More broadly, the central idea of this patch is that we can set the > target list for a child rel to something other than the target list > that we expect it will eventually produce, and then at plan creation > time fix it. Yeah, the patch would fix the the final output to the appendrel parent at plan creation time if necessary, but it would produces the tlist of an intermediate child scan/join node as written in the child relation's reltarget at that time. > I think that's a bad idea. The target list affects lots > of things, like costing. If we don't insert a ConvertRowTypeExpr into > the child's target list, the costing will be wrong to the extent that > ConvertRowTypeExpr has any cost, which it does. Actually, this is not true at least currently, because set_append_rel_size doesn't do anything about the costing: * NB: the resulting childrel->reltarget->exprs may contain arbitrary * expressions, which otherwise would not occur in a rel's targetlist. * Code that might be looking at an appendrel child must cope with * such. (Normally, a rel's targetlist would only include Vars and * PlaceHolderVars.) XXX we do not bother to update the cost or width * fields of childrel->reltarget; not clear if that would be useful. > Any other current or > future property that is computed based on the target list -- > parallel-safety, width, security-level, whatever -- could also > potentially end up with a wrong value if the target list as written > does not match the target list as will actually be produced. It's > true that, in all of these areas, the ConvertRowTypeExpr isn't likely > to have a lot of impact; it probably won't have a big effect on > costing and may not actually cause a problem for the other things that > I mentioned. On the other hand, if it does cause a serious problem, > what options would we have for fixing it other than to back out your > entire patch and solve the problem some other way? The idea that > derived properties won't be correct unless they are based on the real > target list is pretty fundamental, and I think deviating from the idea > that we get the correct target list first and then compute the derived > properties afterward is likely to cause real headaches for future > developers. > > In short, plan creation time is just the wrong time to change the > plan. It's just a time to translate the plan from the form needed by > the planner to the form needed by the executor. It would be fine if > the change you were making were only cosmetic, but it's not. Some createplan.c routines already change the tlists of their nodes. For example, create_merge_append_plan would add sort columns to each of its subplans if necessary. I think it would be similar to that to add a ConvertRowtypeExpr above a child whole-row Var in the subplan's tlist. > After looking over both patches, I think Ashutosh Bapat has basically > the right idea. What he is proposing is a localized fix that doesn't > seem to make any changes to the way things work overall. I reviewed his patch, and thought that that would fix the issue, but this is about the current design on the child tlist handling in partitionise join rather than that patch: it deviates from the assumption that we had for the scan/join planning till PG10 that "a rel's targetlist would only include Vars and PlaceHolderVars", and turns out that there are a lot of places where we need to modify code to suppress that deviation, as partly shown in that patch. So, ISTM that the current design in itself is not that localized. > Partition-wise join/aggregate, and partitioning in general, need a lot > of optimization in a lot of places, and fortunately there are people > working on that, but our goal here should just be to fix things up > well enough that we can ship it. I agree on that point. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Fri, Jul 20, 2018 at 8:56 PM, Robert Haas <robertmhaas@gmail.com> wrote: [ ... clipped portion ...] > > In short, plan creation time is just the wrong time to change the > plan. It's just a time to translate the plan from the form needed by > the planner to the form needed by the executor. It would be fine if > the change you were making were only cosmetic, but it's not. > Agree with all that, including the clipped paras. > After looking over both patches, I think Ashutosh Bapat has basically > the right idea. What he is proposing is a localized fix that doesn't > seem to make any changes to the way things work overall. I do think > his patches need to be fixed up a bit to avoid conflating > ConvertRowtypeExpr with "converted whole-row reference." The two are > certainly not equivalent; the latter is a narrow special case. Some > of the comments could use some more wordsmithing, too, I think. Apart > from those gripes, though, I think it's solving the problem the > correct way: don't build the wrong plan and try to fix it, just build > the right plan from the beginning. I will work on that if everyone agrees that that's the right way to go. Fujita-san seems to have some arguments still. I have argued on the same lines as yours but your explanation is better. I don't have anything more to add. I will wait for that to be resolved. > > There are definitely some things not to like about this approach. In > particular, I definitely agree that treating a converted whole-row > reference specially is not very nice. It feels like it might be > substantially cleaner to be able to represent this idea as a single > node rather than a combination of ConvertRowtypeExpr and var with > varattno = 0. Perhaps in the future we ought to consider either (a) > trying to use a RowExpr for this rather than ConvertRowtypeExpr or (b) > inventing a new WholeRowExpr node that stores two RTIs, one for the > relation that's generating the whole-row reference and the other for > the relation that is controlling the column ordering of the result or > (c) allowing a Var to represent a whole-row expression that has to > produce its outputs according to the ordering of some other RTE. I never liked representing a whole-row expression as a Var worst with varattno = 0. We should have always casted it as RowExpr(set of vars, one var per column). This problem wouldn't arise then. Many other problems wouldn't arise then, I think. I think we do that so that we can convert a tuple from buffer into a datum and put it in place of Var with varattno = 0. Given that I prefer (a) or (b) in all the cases. If not that, then c. But I agree that we have to avoid ConvertRowtypeExpr being used in this case. > But > I don't think it's wise or necessary to whack that around just to fix > this bug; it is refactoring or improvement work best left to a future > release. > > Also, it is definitely a real shame that we have to build attr_needed > data for each child separately. Right now, we've got to build a whole > lot of things for each child individually, and that's slow and > inefficient. We're not going to scale nicely to large partitioning > hierarchies unless we can figure out some way of minimizing the work > we do for each partition. So, the fact that Fujii-san's approach > avoids needing to compute attr_needed in some cases is very appealing. > However, in my opinion, it's nowhere near appealing enough to justify > trying to do surgery on the target list at plan-creation time. I > think we need to leave optimizing this to a future effort. > Partition-wise join/aggregate, and partitioning in general, need a lot > of optimization in a lot of places, and fortunately there are people > working on that, but our goal here should just be to fix things up > well enough that we can ship it. I agree. > I don't see anything in Ashutosh's > patch that is so ugly that we can't live with it for the time being. Thanks. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Robert Haas
Date:
On Mon, Jul 23, 2018 at 3:49 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > I have to admit that that is not a good idea. So, I'll update the patch so > that we don't assume the projection capability of the subplan anymore, if we > go this way. Isn't that assumption fundamental to your whole approach? >> Also, even today, this would fail if the subplan happened to be >> a Sort, and it's not very obvious why that couldn't happen. > > You mean the MergeAppend case? In that case, the patch will adjust the > subplan before adding a Sort above the subplan, so that could not happen. That would only help if create_merge_append_plan() itself inserted a Sort node. It wouldn't help if the Sort node came from a child path manufactured by create_sort_path(). >> I think that's a bad idea. The target list affects lots >> of things, like costing. If we don't insert a ConvertRowTypeExpr into >> the child's target list, the costing will be wrong to the extent that >> ConvertRowTypeExpr has any cost, which it does. > > Actually, this is not true at least currently, because set_append_rel_size > doesn't do anything about the costing: Why would it? Append can't project, so the cost of any expressions that appear in its target list is irrelevant. What is affected is the cost of the scans below the Append -- see e.g. cost_seqscan(), which uses the data produced by set_pathtarget_cost_width(). > Some createplan.c routines already change the tlists of their nodes. For > example, create_merge_append_plan would add sort columns to each of its > subplans if necessary. I think it would be similar to that to add a > ConvertRowtypeExpr above a child whole-row Var in the subplan's tlist. You have a point, but I think that code is actually not a very good idea, and I'd like to see us do less of that sort of thing, not more. Any case in which we change the plan while creating it has many of the same problems that I discussed in the previous email. For example, create_merge_append_path() has to know that a Sort node might get inserted and set the costing accordingly. If the callers guaranteed that the necessary sort path had already been inserted, then we wouldn't need that special handling. Also, that code is adding additional columns, computed from the columns we have available, so that we can sort. Those extra columns then get discarded at the next level of the Plan tree. What you're trying to do is different. Perhaps this is too harsh a judgement, but it looks to me like you're using a deliberately-wrong representation of the value that you ultimately want to produce and then patching it up after the fact. That seems quite a bit worse than what the existing code is doing. > I reviewed his patch, and thought that that would fix the issue, but this is > about the current design on the child tlist handling in partitionise join > rather than that patch: it deviates from the assumption that we had for the > scan/join planning till PG10 that "a rel's targetlist would only include > Vars and PlaceHolderVars", and turns out that there are a lot of places > where we need to modify code to suppress that deviation, as partly shown in > that patch. So, ISTM that the current design in itself is not that > localized. I used to think that was the assumption, too, but it seems to me that Tom told me a while back that it was never really true, and that assumption was in my head. Unfortunately, I don't have a link to that email handy. Either way, I think the solution is to get the tlist right from the start and cope with the consequences downstream, not start with the wrong thing and try to fix it later. To see an example of why I think that's a valuable approach, see 11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5, especially the changes in the regression test outputs. The old code discovered after it had generated an Append that it had the wrong tlist, but since the Append can't project, it had to add a Result node. With the new code, we get the children of the Append to produce the right output from the start, and then the Append just needs to concatenate all that output, so no Result node is needed. As noted in the commit message, it also made the costing more accurate. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/07/24 3:09), Robert Haas wrote: > On Mon, Jul 23, 2018 at 3:49 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> I have to admit that that is not a good idea. So, I'll update the patch so >> that we don't assume the projection capability of the subplan anymore, if we >> go this way. > > Isn't that assumption fundamental to your whole approach? I don't think so. What I mean here is: currently the subplan would be a scan/join node, but in future we might have eg, a Sort node atop the scan/join node, so it would be better to update the patch to handle such a case as well. >>> Also, even today, this would fail if the subplan happened to be >>> a Sort, and it's not very obvious why that couldn't happen. >> >> You mean the MergeAppend case? In that case, the patch will adjust the >> subplan before adding a Sort above the subplan, so that could not happen. > > That would only help if create_merge_append_plan() itself inserted a > Sort node. It wouldn't help if the Sort node came from a child path > manufactured by create_sort_path(). As I mentioned above, I think we could handle such a case as well. >>> I think that's a bad idea. The target list affects lots >>> of things, like costing. If we don't insert a ConvertRowTypeExpr into >>> the child's target list, the costing will be wrong to the extent that >>> ConvertRowTypeExpr has any cost, which it does. >> >> Actually, this is not true at least currently, because set_append_rel_size >> doesn't do anything about the costing: > > Why would it? Append can't project, so the cost of any expressions > that appear in its target list is irrelevant. What is affected is the > cost of the scans below the Append -- see e.g. cost_seqscan(), which > uses the data produced by set_pathtarget_cost_width(). By set_rel_size()? >> Some createplan.c routines already change the tlists of their nodes. For >> example, create_merge_append_plan would add sort columns to each of its >> subplans if necessary. I think it would be similar to that to add a >> ConvertRowtypeExpr above a child whole-row Var in the subplan's tlist. > > You have a point, but I think that code is actually not a very good > idea, and I'd like to see us do less of that sort of thing, not more. > Any case in which we change the plan while creating it has many of the > same problems that I discussed in the previous email. For example, > create_merge_append_path() has to know that a Sort node might get > inserted and set the costing accordingly. If the callers guaranteed > that the necessary sort path had already been inserted, then we > wouldn't need that special handling. I'm not sure that's a good idea, because I think we have a trade-off relation; the more we make create_plan simple, the more we need to make earlier states of the planner complicated. And it looks to me like the partitionwise join code is making earlier (and later) stages of the planner too complicated, to make create_plan simple. > Also, that code is adding additional columns, computed from the > columns we have available, so that we can sort. Those extra columns > then get discarded at the next level of the Plan tree. What you're > trying to do is different. Perhaps this is too harsh a judgement, but > it looks to me like you're using a deliberately-wrong representation > of the value that you ultimately want to produce and then patching it > up after the fact. When considering paritionwise joining, it would make things complicated to have a ConvertRowtypeExpr in a partrel's targetlist, because as discussed upthread, it deviates from the planner's assumption that a rel's targetlist would only include Vars and PHVs. So, I propose to include a child whole-row Var in the targetlist instead, in which case, we need to fix the Var after the fact, but can avoid making many other parts of the planner complicated. > That seems quite a bit worse than what the > existing code is doing. One idea to address that concern would be to adjust the pathtarget of each path created in generate_partitionwise_join_paths accordingly. I'm missing something, though. >> I reviewed his patch, and thought that that would fix the issue, but this is >> about the current design on the child tlist handling in partitionise join >> rather than that patch: it deviates from the assumption that we had for the >> scan/join planning till PG10 that "a rel's targetlist would only include >> Vars and PlaceHolderVars", and turns out that there are a lot of places >> where we need to modify code to suppress that deviation, as partly shown in >> that patch. So, ISTM that the current design in itself is not that >> localized. > > I used to think that was the assumption, too, but it seems to me that > Tom told me a while back that it was never really true, and that > assumption was in my head. Unfortunately, I don't have a link to that > email handy. Either way, I think the solution is to get the tlist > right from the start and cope with the consequences downstream, not > start with the wrong thing and try to fix it later. I'm not fully convinced that's really the right solution, but I think you know better about partitioning (as well as many other things) than me, so I think it's better for you to decide what to do for this issue. Thanks for working on this! Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Robert Haas
Date:
On Tue, Jul 24, 2018 at 7:51 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: >> Isn't that assumption fundamental to your whole approach? > > I don't think so. What I mean here is: currently the subplan would be a > scan/join node, but in future we might have eg, a Sort node atop the > scan/join node, so it would be better to update the patch to handle such a > case as well. But how would you do that? >>>> I think that's a bad idea. The target list affects lots >>>> of things, like costing. If we don't insert a ConvertRowTypeExpr into >>>> the child's target list, the costing will be wrong to the extent that >>>> ConvertRowTypeExpr has any cost, which it does. >>> >>> >>> Actually, this is not true at least currently, because >>> set_append_rel_size >>> doesn't do anything about the costing: >> >> >> Why would it? Append can't project, so the cost of any expressions >> that appear in its target list is irrelevant. What is affected is the >> cost of the scans below the Append -- see e.g. cost_seqscan(), which >> uses the data produced by set_pathtarget_cost_width(). > > By set_rel_size()? Sorry, I don't understand what you mean by this. > I'm not sure that's a good idea, because I think we have a trade-off > relation; the more we make create_plan simple, the more we need to make > earlier states of the planner complicated. > > And it looks to me like the partitionwise join code is making earlier (and > later) stages of the planner too complicated, to make create_plan simple. I think that create_plan is *supposed* to be simple. Its purpose is to prune away data that's only needed during planning and add things that can be computed at the last minute which are needed at execution time. Making it do anything else is, in my opinion, not good. > When considering paritionwise joining, it would make things complicated to > have a ConvertRowtypeExpr in a partrel's targetlist, because as discussed > upthread, it deviates from the planner's assumption that a rel's targetlist > would only include Vars and PHVs. So, I propose to include a child > whole-row Var in the targetlist instead, in which case, we need to fix the > Var after the fact, but can avoid making many other parts of the planner > complicated. Well, I could have the wrong idea here, but I tend to think allowing for ConvertRowTypeExpr elsewhere won't be that bad. Does anyone else want to weigh in on this issue? Tom, perhaps? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jul 24, 2018 at 7:51 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> I'm not sure that's a good idea, because I think we have a trade-off >> relation; the more we make create_plan simple, the more we need to make >> earlier states of the planner complicated. >> >> And it looks to me like the partitionwise join code is making earlier (and >> later) stages of the planner too complicated, to make create_plan simple. > I think that create_plan is *supposed* to be simple. Its purpose is > to prune away data that's only needed during planning and add things > that can be computed at the last minute which are needed at execution > time. Making it do anything else is, in my opinion, not good. I tend to agree with Robert on this. In particular, if you put anything into create_plan or later that affects cost estimates, you're really doing it wrong, because changing those estimates might've changed the plan entirely. (As I recall, we have a couple of cheats like that now, associated with dropping no-op projection and subquery scan nodes. But let's not add more.) TBH, I think this entire discussion is proving what sheer folly it was to allow partitions to have rowtypes not identical to their parent. But I suppose we're stuck with that now. regards, tom lane
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Ashutosh Bapat
Date:
On Thu, Jul 26, 2018 at 2:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Jul 24, 2018 at 7:51 AM, Etsuro Fujita >> <fujita.etsuro@lab.ntt.co.jp> wrote: >>> I'm not sure that's a good idea, because I think we have a trade-off >>> relation; the more we make create_plan simple, the more we need to make >>> earlier states of the planner complicated. >>> >>> And it looks to me like the partitionwise join code is making earlier (and >>> later) stages of the planner too complicated, to make create_plan simple. > >> I think that create_plan is *supposed* to be simple. Its purpose is >> to prune away data that's only needed during planning and add things >> that can be computed at the last minute which are needed at execution >> time. Making it do anything else is, in my opinion, not good. > > I tend to agree with Robert on this. In particular, if you put anything > into create_plan or later that affects cost estimates, you're really doing > it wrong, because changing those estimates might've changed the plan > entirely. (As I recall, we have a couple of cheats like that now, > associated with dropping no-op projection and subquery scan nodes. > But let's not add more.) +1 > > TBH, I think this entire discussion is proving what sheer folly it was > to allow partitions to have rowtypes not identical to their parent. > But I suppose we're stuck with that now. > Every table created has a different rowtype and when the partition's rowtype is different from that of the partitioned table, we add a ConvertRowtypeExpr node on top of the whole-row expression. 2153 if (appinfo->parent_reltype != appinfo->child_reltype) 2154 { 2155 ConvertRowtypeExpr *r = makeNode(ConvertRowtypeExpr); We may be able to set rowtype of a partition to be same as the rowtype of a partitioned table when a partition table is created using an ALTER TABLE syntax. But are you suggesting that we change the reltype of a partition being attached? What happens when we detach a partition and let the table live independent of the partitioned table? Do we create a new reltype? I am not even going into the problem when we try to attach a partition with different tuple layout. I do believe that such usecases arise and rewriting the table being attached is a viable option. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/07/26 5:27), Robert Haas wrote: > On Tue, Jul 24, 2018 at 7:51 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >>> Isn't that assumption fundamental to your whole approach? >> >> I don't think so. What I mean here is: currently the subplan would be a >> scan/join node, but in future we might have eg, a Sort node atop the >> scan/join node, so it would be better to update the patch to handle such a >> case as well. > > But how would you do that? What I had in mind was to insert a Rusult node with inject_projection_plan and adjust the tlist of the Result, as done for adding sort columns to a tlist in prepare_sort_from_pathkeys. >>>>> I think that's a bad idea. The target list affects lots >>>>> of things, like costing. If we don't insert a ConvertRowTypeExpr into >>>>> the child's target list, the costing will be wrong to the extent that >>>>> ConvertRowTypeExpr has any cost, which it does. >>>> >>>> >>>> Actually, this is not true at least currently, because >>>> set_append_rel_size >>>> doesn't do anything about the costing: >>> >>> >>> Why would it? Append can't project, so the cost of any expressions >>> that appear in its target list is irrelevant. What is affected is the >>> cost of the scans below the Append -- see e.g. cost_seqscan(), which >>> uses the data produced by set_pathtarget_cost_width(). >> >> By set_rel_size()? > > Sorry, I don't understand what you mean by this. I think the data used by such a costing function is computed by set_rel_size in set_append_rel_size, not set_pathtarget_cost_width; in the case of a plain partition, for example, set_rel_size would call set_plain_rel_size, and then set_plain_rel_size would eventually call set_rel_width, which sets reltarget->cost, which I think would be used by e.g., cost_seqscan. cost_qual_eval_node, which is called in set_rel_width for computing the cost of ConvertRowTypeExpr, ignores that expression, so currently, we don't charge any cost for it to the partition's reltarget->cost, and to the cost of a scan below the Append. >> I'm not sure that's a good idea, because I think we have a trade-off >> relation; the more we make create_plan simple, the more we need to make >> earlier states of the planner complicated. >> >> And it looks to me like the partitionwise join code is making earlier (and >> later) stages of the planner too complicated, to make create_plan simple. > > I think that create_plan is *supposed* to be simple. Its purpose is > to prune away data that's only needed during planning and add things > that can be computed at the last minute which are needed at execution > time. Making it do anything else is, in my opinion, not good. I agree on that point. >> When considering paritionwise joining, it would make things complicated to >> have a ConvertRowtypeExpr in a partrel's targetlist, because as discussed >> upthread, it deviates from the planner's assumption that a rel's targetlist >> would only include Vars and PHVs. So, I propose to include a child >> whole-row Var in the targetlist instead, in which case, we need to fix the >> Var after the fact, but can avoid making many other parts of the planner >> complicated. > > Well, I could have the wrong idea here, but I tend to think allowing > for ConvertRowTypeExpr elsewhere won't be that bad. I still don't like that because in my opinion, changes needed for that would not be localized, and that would make code complicated more than necessary. As I mentioned in a previous email, another idea to avoid that would be to adjust tlists for children at path creation time, not plan creation time; we could adjust the tlist for each of subpaths accumulated for an Append/MergeAppend path in add_paths_to_append_rel called from set_append_rel_pathlist or generate_partitionwise_join_paths, with create_projection_path adding ConvertRowTypeExpr. It seems unlikely that performing create_projection_path to such a subpath would change its property of being the cheapest, so it would be safe to adjust the tlists that way. This would not require making create_plan complicated anymore. I might be missing something, though. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Andres Freund
Date:
Hi, On 2018-07-20 08:38:09 -0400, Robert Haas wrote: > I'm going to study this some more now, but I really think this is > going in the wrong direction. We're going to have to get somewhere on this topic soon. This thread has been open for nearly half a year, and we're late in the beta phase now. What can we do to get to an agreement soon? Greetings, Andres Freund
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/07/31 4:06), Andres Freund wrote: > On 2018-07-20 08:38:09 -0400, Robert Haas wrote: >> I'm going to study this some more now, but I really think this is >> going in the wrong direction. > > We're going to have to get somewhere on this topic soon. This thread has > been open for nearly half a year, and we're late in the beta phase now. > What can we do to get to an agreement soon? I'd like to propose an updated patch as proposed in [1]. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/5B59BA55.30200%40lab.ntt.co.jp
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Alvaro Herrera
Date:
On 2018-Jul-31, Etsuro Fujita wrote: > (2018/07/31 4:06), Andres Freund wrote: > > On 2018-07-20 08:38:09 -0400, Robert Haas wrote: > > > I'm going to study this some more now, but I really think this is > > > going in the wrong direction. > > > > We're going to have to get somewhere on this topic soon. This thread has > > been open for nearly half a year, and we're late in the beta phase now. > > What can we do to get to an agreement soon? > > I'd like to propose an updated patch as proposed in [1]. But there is no patch there, and neither there is agreement from the other party that the approach described there is sound. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/07/26 21:11), Etsuro Fujita wrote: > (2018/07/26 5:27), Robert Haas wrote: >> Well, I could have the wrong idea here, but I tend to think allowing >> for ConvertRowTypeExpr elsewhere won't be that bad. > > I still don't like that because in my opinion, changes needed for that > would not be localized, and that would make code complicated more than > necessary. > > As I mentioned in a previous email, another idea to avoid that would be > to adjust tlists for children at path creation time, not plan creation > time; we could adjust the tlist for each of subpaths accumulated for an > Append/MergeAppend path in add_paths_to_append_rel called from > set_append_rel_pathlist or generate_partitionwise_join_paths, with > create_projection_path adding ConvertRowTypeExpr. It seems unlikely that > performing create_projection_path to such a subpath would change its > property of being the cheapest, so it would be safe to adjust the tlists > that way. This would not require making create_plan complicated anymore. > I might be missing something, though. I updated the patch that way. Updated patch attached. I fixed a bug and did a bit of cleanups as well. Best regards, Etsuro Fujita
Attachment
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/08/01 5:31), Alvaro Herrera wrote: > On 2018-Jul-31, Etsuro Fujita wrote: >> (2018/07/31 4:06), Andres Freund wrote: >>> On 2018-07-20 08:38:09 -0400, Robert Haas wrote: >>>> I'm going to study this some more now, but I really think this is >>>> going in the wrong direction. >>> >>> We're going to have to get somewhere on this topic soon. This thread has >>> been open for nearly half a year, and we're late in the beta phase now. >>> What can we do to get to an agreement soon? >> >> I'd like to propose an updated patch as proposed in [1]. > > But there is no patch there, and neither there is agreement from the > other party that the approach described there is sound. I posted the updated patch [1]. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/5B619D23.4010000%40lab.ntt.co.jp
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Robert Haas
Date:
On Wed, Aug 1, 2018 at 7:46 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > I posted the updated patch [1]. Etsuro-san: I really think we should just go with what Ashutosh had proposed. Tom, Ashutosh, and I all seem to agree that we shouldn't try to re-jigger things at create-plan time. I think that a 3-1 consensus against your proposal is sufficient to say we shouldn't go that way. Now, here you've got a new approach which avoids that, which I have not yet reviewed. I'll try to spend some time on it this afternoon, but really, I think it's too late for this. This bug was reported in February, and we're supposed to be pushing 11 final out the door in not much more than a month. Proposing a new approach in August is not good. Can't we just do what Ashutosh proposed for now and revisit this for v12? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Robert Haas
Date:
On Wed, Aug 1, 2018 at 7:44 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > I updated the patch that way. Updated patch attached. I fixed a bug and > did a bit of cleanups as well. Looking this over from a technical point of view, I think it's better than what you proposed before but I still don't agree with the approach. I don't think there is any getting around the fact that converted whole-row vars are going to result in some warts. Ashutosh inserted most of the necessary warts in the original partition-wise join patch, but missed some cases in postgres_fdw; his approach is, basically, to add the matching warts there. Your approach is to rip out all of those warts and insert different ones. You've simplified build_tlist_index_other_vars() and add_placeholders_to_joinrel() and build_joinrel_tlist() to basically what they were before partition-wise join went in. On the other hand, RelOptInfo grows three new fields, set_append_rel_size() ends up more complicated than it was before when you include the node code added in the build_childrel_tlist function it calls, build_child_joinrel() has a different set of complications, and most importantly create_append_path() and create_merge_append_path() now do surgery on their input paths that knows specifically about the converted-whole-row var case. I would be glad to be rid of the stuff that you're ripping out, but in terms of complexity, I don't think we really come out ahead with the stuff you're adding. I'm also still concerned about the design. In your old approach, you were creating the paths with with the wrong target list and then fixing it when you turned the paths into a plan. In your new approach, you're still creating the paths with the wrong target list, but now you're fixing it when you put an Append or MergeAppend path on top of them. I still think it's a bad idea to have anything other than the correct paths in the target list from the beginning. For one thing, what happens if no Append or MergeAppend is ever put on top of them? One way that can happen today is partition-wise aggregate, although I haven't been able to demonstrate a bug, so maybe that's covered somehow. But in general, with your approach, any code that looks at the tlist for a child-join has to know that the tlist is to be used as-is *except that* ConvertRowtypeExpr may need to be inserted. I think that special case could be easy to overlook, and it seems pretty arbitrary. A tlist is a list of arbitrary expressions to be produced; with this patch, we'd end up with the tlist being the list of expressions to be produced in every case *except for* child-joins containing whole-row-vars. I just can't get behind a strange exception like that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/08/02 2:44), Robert Haas wrote: > Tom, Ashutosh, and I all seem to agree that we shouldn't try to > re-jigger things at create-plan time. I think that a 3-1 consensus > against your proposal is sufficient to say we shouldn't go that way. Agreed. > Now, here you've got a new approach which avoids that, which I have > not yet reviewed. I'll try to spend some time on it this afternoon, > but really, I think it's too late for this. This bug was reported in > February, and we're supposed to be pushing 11 final out the door in > not much more than a month. Proposing a new approach in August is not > good. Agreed. > Can't we just do what Ashutosh proposed for now and revisit > this for v12? I think that may be possible. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/08/02 4:30), Robert Haas wrote: > On Wed, Aug 1, 2018 at 7:44 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> I updated the patch that way. Updated patch attached. I fixed a bug and >> did a bit of cleanups as well. > > Looking this over from a technical point of view, I think it's better > than what you proposed before but I still don't agree with the > approach. I don't think there is any getting around the fact that > converted whole-row vars are going to result in some warts. Ashutosh > inserted most of the necessary warts in the original partition-wise > join patch, but missed some cases in postgres_fdw; his approach is, > basically, to add the matching warts there. Your approach is to rip > out all of those warts and insert different ones. You've simplified > build_tlist_index_other_vars() and add_placeholders_to_joinrel() and > build_joinrel_tlist() to basically what they were before > partition-wise join went in. On the other hand, RelOptInfo grows > three new fields, Sorry, I forgot to remove one of the fields that isn't needed anymore. RelOptInfo needs two new fields, in the new approach. > set_append_rel_size() ends up more complicated than > it was before when you include the node code added in the > build_childrel_tlist function it calls, build_child_joinrel() has a > different set of complications, and most importantly > create_append_path() and create_merge_append_path() now do surgery on > their input paths that knows specifically about the > converted-whole-row var case. I would be glad to be rid of the stuff > that you're ripping out, but in terms of complexity, I don't think we > really come out ahead with the stuff you're adding. The new approach seems to me more localized than what Ashutosh had proposed. One obvious advantage of the new approach is that we no longer need changes to postgres_fdw for it to work for partitionwise joins, which would apply for any other FDWs, making the FDW authors' life easy. > I'm also still concerned about the design. In your old approach, you > were creating the paths with with the wrong target list and then > fixing it when you turned the paths into a plan. In your new > approach, you're still creating the paths with the wrong target list, > but now you're fixing it when you put an Append or MergeAppend path on > top of them. I still think it's a bad idea to have anything other > than the correct paths in the target list from the beginning. I don't think so; actually, the child's targetlist would not have to be the same as the parent's until the child gets in under the parent's Append or MergeAppend. So, I modified the child's target list so as to make it easy to join the child relations. > For one > thing, what happens if no Append or MergeAppend is ever put on top of > them? One way that can happen today is partition-wise aggregate, > although I haven't been able to demonstrate a bug, so maybe that's > covered somehow. Right. In the new approach, the targetlist for the topmost child scan/join is guaranteed to be the same as that for the topmost parent scan/join by the planner work that I added. > But in general, with your approach, any code that > looks at the tlist for a child-join has to know that the tlist is to > be used as-is *except that* ConvertRowtypeExpr may need to be > inserted. I think that special case could be easy to overlook, and it > seems pretty arbitrary. I think we can check to see whether the conversion is needed, from the flags added to RelOptInfo. > A tlist is a list of arbitrary expressions to > be produced; with this patch, we'd end up with the tlist being the > list of expressions to be produced in every case *except for* > child-joins containing whole-row-vars. I just can't get behind a > strange exception like that. I have to admit that it's weird to adjust the child's targetlist in that case when putting the child under the parent's Append/MergeAppend, but IMHO I think that would be much localized. Thanks for the comments! Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Robert Haas
Date:
On Thu, Aug 2, 2018 at 7:20 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > The new approach seems to me more localized than what Ashutosh had proposed. > One obvious advantage of the new approach is that we no longer need changes > to postgres_fdw for it to work for partitionwise joins, which would apply > for any other FDWs, making the FDW authors' life easy. Well, I don't know what to say here expect that I don't agree. I think Ashutosh's approach is a pretty natural extension of the system that we have now. It involves needing to handle converted whole row vars in some new places, most of which were handled in the original patch, but postgres_fdw was missed. Your approach involves changing the meaning of the target list, but only in narrow corner cases. I completely disagree that we can say it's less invasive. It may indeed be less work for extension authors, though, though perhaps at the expense of moving the conversion from the remote server to the local one. >> But in general, with your approach, any code that >> looks at the tlist for a child-join has to know that the tlist is to >> be used as-is *except that* ConvertRowtypeExpr may need to be >> inserted. I think that special case could be easy to overlook, and it >> seems pretty arbitrary. > > I think we can check to see whether the conversion is needed, from the flags > added to RelOptInfo. Sure, I don't dispute that it can be made to work. I just think it's an ugly kludge. > I have to admit that it's weird to adjust the child's targetlist in that > case when putting the child under the parent's Append/MergeAppend, but IMHO > I think that would be much localized. Yeah, I just don't agree at all. Does anyone else want to weigh in on this? It seems to me that there are several people who are quite willing to complain about the fact that this hasn't been fixed, but not willing to express an opinion about the shape of the fix. Either the RMT needs to take executive action, or we need some more votes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Andres Freund
Date:
Hi, On 2018-08-02 15:19:49 -0400, Robert Haas wrote: > Does anyone else want to weigh in on this? It seems to me that there > are several people who are quite willing to complain about the fact > that this hasn't been fixed, but not willing to express an opinion > about the shape of the fix. Either the RMT needs to take executive > action, or we need some more votes. (taking my RMT hat off, as it's not been coordinated) Well, I think it's enough in the weeds that not too many people are going to have an informed opinion on this, and I'm not sure uninformed opinions help. But I do think the fact that one of the authors and the committer are on one side, and Tom seems to tentatively agree, weighs quite a bit here. As far as I can tell, it's "just" Etsuro on the other side - obviously his opinion counts, but it doesn't outweigh others. I think if you feel confident, you should be ok just commit the approach you favor, and I read you as being confident that that's the better approach. Greetings, Andres Freund
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > Does anyone else want to weigh in on this? It seems to me that there > are several people who are quite willing to complain about the fact > that this hasn't been fixed, but not willing to express an opinion > about the shape of the fix. Either the RMT needs to take executive > action, or we need some more votes. [ reads thread ... ] Well, my vote is that I've got zero faith in either of these patches. I do not trust Ashutosh's patch because of the point you noted that it will kick in on ConvertRowtypeExprs that are not related to partitioning. It's also got a rather fundamental conceptual (or at least documentation) error in that it says it's making pull_var_clause do certain things to all ConvertRowtypeExprs, but that's not what the code actually does. I think the tension around that has a lot to do with the fact that the patch went through so many revisions, and I don't have any faith that it's right even yet. As you mentioned upthread, this might be insoluble without some representational change for converted whole-row Vars. As for Etsuro-san's patch, I don't like that for the same reasons you gave. Also, I note that back towards the beginning of July it was said that that patch depends on the assumption of no Gather below an Append. That's broken *now*, not in some hypothetical future. (See the nearby "FailedAssertion on partprune" thread, wherein we're trying to fix the planner's handling of exactly such plans.) What I'm thinking might be a more appropriate thing, at least for getting v11 out the door, is to refuse to generate partitionwise joins when any whole-row vars are involved. (Perhaps that's not enough to dodge all the problems, though?) Now, that's a bit of a problem for postgres_fdw, because it seems to insist on injecting WRVs even when the query text does not require any. Why is that, and can't we get rid of it? It's horrid for performance even without any of these other considerations. But if we fail to get rid of that in time for v11, it would mean that postgres_fdw can't participate in PWJ, which isn't great but I think we could live with it for now. BTW, what exactly do we think the production status of PWJ is, anyway? I notice that upthread it was stated that enable_partitionwise_join defaults to off, as indeed it still does, and we'd turn it on later when we'd gotten rid of some memory-hogging problems. If that hasn't happened yet (and I don't see any open item about considering enabling PWJ by default for v11), then I have exactly no hesitation about lobotomizing PWJ as hard as we need to to mask this problem for v11. I'd certainly prefer a solution along those lines to either of these patches. regards, tom lane
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/08/03 5:25), Tom Lane wrote: > What I'm thinking might be a more appropriate thing, at least for > getting v11 out the door, is to refuse to generate partitionwise > joins when any whole-row vars are involved. Agreed. > (Perhaps that's not > enough to dodge all the problems, though?) > > Now, that's a bit of a problem for postgres_fdw, because it seems to > insist on injecting WRVs even when the query text does not require any. > Why is that, and can't we get rid of it? It's horrid for performance > even without any of these other considerations. But if we fail to get > rid of that in time for v11, it would mean that postgres_fdw can't > participate in PWJ, which isn't great but I think we could live with it > for now. Sorry, I don't understand this. Could you elaborate on that a bit more? > BTW, what exactly do we think the production status of PWJ is, anyway? > I notice that upthread it was stated that enable_partitionwise_join > defaults to off, as indeed it still does, and we'd turn it on later > when we'd gotten rid of some memory-hogging problems. If that hasn't > happened yet (and I don't see any open item about considering enabling > PWJ by default for v11), then I have exactly no hesitation about > lobotomizing PWJ as hard as we need to to mask this problem for v11. That hasn't happened yet; I think we left that for PG12, IIRC. Here is a patch for refusing to generate PWJ paths when WRVs are involved: * We no longer need to handle WRVs, so I've simplified build_joinrel_tlist() and setrefs.c to what they were before partition-wise join went in, as in the previous patch. * attr_needed data for each child is used for building child-joins' tlists, but I think we can build those by applying adjust_appendrel_attrs to the parent's tlists, without attr_needed. So, I've also removed that as in the previous patch. Maybe I'm missing something, though. Best regards, Etsuro Fujita
Attachment
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/08/03 22:18), Etsuro Fujita wrote: > Here is a patch for refusing to generate PWJ paths when WRVs are involved: > > * We no longer need to handle WRVs, so I've simplified > build_joinrel_tlist() and setrefs.c to what they were before > partition-wise join went in, as in the previous patch. > > * attr_needed data for each child is used for building child-joins' > tlists, but I think we can build those by applying > adjust_appendrel_attrs to the parent's tlists, without attr_needed. So, > I've also removed that as in the previous patch. One thing to add: as for the latter, we don't need the changes to placeholder.c either, so I've also simplified that to what they were before partition-wise join went in. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Robert Haas
Date:
On Thu, Aug 2, 2018 at 4:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I do not trust Ashutosh's patch because of the point you noted that it > will kick in on ConvertRowtypeExprs that are not related to partitioning. I don't remember making that point -- can you clarify? > It's also got a rather fundamental conceptual (or at least documentation) > error in that it says it's making pull_var_clause do certain things to > all ConvertRowtypeExprs, but that's not what the code actually does. > I think the tension around that has a lot to do with the fact that the > patch went through so many revisions, and I don't have any faith that > it's right even yet. As you mentioned upthread, this might be insoluble > without some representational change for converted whole-row Vars. Insoluble is a strong word.... > What I'm thinking might be a more appropriate thing, at least for > getting v11 out the door, is to refuse to generate partitionwise > joins when any whole-row vars are involved. (Perhaps that's not > enough to dodge all the problems, though?) It's enough to dodge the problem being discussed on this thread. > Now, that's a bit of a problem for postgres_fdw, because it seems to > insist on injecting WRVs even when the query text does not require any. > Why is that, and can't we get rid of it? It's horrid for performance > even without any of these other considerations. But if we fail to get > rid of that in time for v11, it would mean that postgres_fdw can't > participate in PWJ, which isn't great but I think we could live with it > for now. I don't quite know what you mean here -- postgres_fdw does use whole-row vars for EPQ handling, which may be what you're thinking about. > BTW, what exactly do we think the production status of PWJ is, anyway? > I notice that upthread it was stated that enable_partitionwise_join > defaults to off, as indeed it still does, and we'd turn it on later > when we'd gotten rid of some memory-hogging problems. If that hasn't > happened yet (and I don't see any open item about considering enabling > PWJ by default for v11), then I have exactly no hesitation about > lobotomizing PWJ as hard as we need to to mask this problem for v11. > I'd certainly prefer a solution along those lines to either of these > patches. Yeah, that hasn't been done yet. Partition-wise join probably needs a good bit of work in a number of areas to do all the things that people will want it to do -- see the thread on advanced partition-matching, which is an improvement but still doesn't cover every case that could come up. In terms of memory usage, I think that we need some discussion of the approach that should be taken. I see a couple of possible things we could do, not necessarily mutually exclusive. 1. We could try to avoid translating all of the parent's data structures for every child RelOptInfo, either by rejiggering things so that the child doesn't need all of that data, or by making it able to use the parent's copy of the data. 2. We could try to recycle memory more quickly. For example, if we're planning a partition-wise join of A-B-C-D, first plan paths for A1-B1-C1-D1 (and all proper subsets of those rels), then throw away most of the memory, then plan paths for A2-B2-C2-D2, etc. 3. We could generate paths for on "representative" children (e.g. the biggest ones) and use that to estimate the cost of the partition-wise plan. If that plan is selected, then go back and figure out real paths for the other children. All of these things help in different ways, and Ashutosh had code for some version of all of them at various points, but the problem is quite difficult. If you have three tables with 1000 partitions each, then without partition-wise join you need 3000 (partitions) + 3 (baserels) + 3 (level-2 joinrels) + 1 (level-3 joinrel) RelOptInfos. If you do a partition-wise join, then you get 3000 level-2 child-joinrels and 1000 level-3 child joinrels. Those RelOptInfo structures, and the paths attached to them, cannot help but take up memory. Perhaps that's OK, and we ought to just say "well, if you want better plans, you're going to have to allow more memory for planning". If not, we have to decide how hard we want to work in which areas and how good the result needs to be in order to have this turned on by difficult. Honestly, I'm pretty impressed that we have added not one but two members to the RelOptKind enum without as little collateral damage as there has been. This issue is annoying and the discussion has gone on longer than anyone would probably prefer, but it's really pretty minor in the grand scheme of things, at least IMHO. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Aug 2, 2018 at 4:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Now, that's a bit of a problem for postgres_fdw, because it seems to >> insist on injecting WRVs even when the query text does not require any. >> Why is that, and can't we get rid of it? > I don't quite know what you mean here -- postgres_fdw does use > whole-row vars for EPQ handling, which may be what you're thinking > about. As far as I can see from the example that started this thread, postgres_fdw injects WRVs into a PWJ whether or not the query involves FOR UPDATE; that's why this bug is reproducible in a query without FOR UPDATE. But we shouldn't need any EPQ support in that case. > Honestly, I'm pretty impressed that we have added not one but two > members to the RelOptKind enum without as little collateral damage as > there has been. Color me a bit more skeptical about the bug density in that, given that enable_partitionwise_join is off by default; that means you're not getting a lot of testing. regards, tom lane
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Robert Haas
Date:
On Fri, Aug 3, 2018 at 10:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Aug 2, 2018 at 4:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Now, that's a bit of a problem for postgres_fdw, because it seems to >>> insist on injecting WRVs even when the query text does not require any. >>> Why is that, and can't we get rid of it? > >> I don't quite know what you mean here -- postgres_fdw does use >> whole-row vars for EPQ handling, which may be what you're thinking >> about. > > As far as I can see from the example that started this thread, > postgres_fdw injects WRVs into a PWJ whether or not the query involves > FOR UPDATE; that's why this bug is reproducible in a query without FOR > UPDATE. But we shouldn't need any EPQ support in that case. I might be missing something, but the original reproducer involves a FOR UPDATE clause, and the expected regression test output in postgres_fdw.out appears to show a whole-row var in the foreign scan's target list only in those examples where a locking clause is present. >> Honestly, I'm pretty impressed that we have added not one but two >> members to the RelOptKind enum without as little collateral damage as >> there has been. > > Color me a bit more skeptical about the bug density in that, given > that enable_partitionwise_join is off by default; that means you're > not getting a lot of testing. You're hard to please. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Aug 3, 2018 at 10:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> As far as I can see from the example that started this thread, >> postgres_fdw injects WRVs into a PWJ whether or not the query involves >> FOR UPDATE; that's why this bug is reproducible in a query without FOR >> UPDATE. But we shouldn't need any EPQ support in that case. > I might be missing something, but the original reproducer involves a > FOR UPDATE clause, and the expected regression test output in > postgres_fdw.out appears to show a whole-row var in the foreign scan's > target list only in those examples where a locking clause is present. Oh, sigh, never mind that --- I was thinking that only the first of the two example queries involved FOR UPDATE, but they both do. So no mystery there after all. Anyway, what I'm basically suggesting is that we just disable support for PWJ in the problematic cases in v11. As long as PWJ isn't even on by default, that's not much of a loss. Obviously we'll want to fix it in the future, but the hour grows late for v11, and I think either of these patches would need quite a bit more work to be committable. BTW, I forgot to respond to this: >> I do not trust Ashutosh's patch because of the point you noted that it >> will kick in on ConvertRowtypeExprs that are not related to partitioning. > I don't remember making that point -- can you clarify? I'm looking at https://www.postgresql.org/message-id/CA%2BTgmoZSaKq-fYALn5jf6c_X3%3D%3DRb2s8eqLDwGpV%3DLNNhTXYwg%40mail.gmail.com wherein you said > There are definitely some things not to like about this approach. In > particular, I definitely agree that treating a converted whole-row > reference specially is not very nice. It feels like it might be > substantially cleaner to be able to represent this idea as a single > node rather than a combination of ConvertRowtypeExpr and var with > varattno = 0. Perhaps in the future we ought to consider either (a) > trying to use a RowExpr for this rather than ConvertRowtypeExpr or (b) > inventing a new WholeRowExpr node that stores two RTIs, one for the > relation that's generating the whole-row reference and the other for > the relation that is controlling the column ordering of the result or > (c) allowing a Var to represent a whole-row expression that has to > produce its outputs according to the ordering of some other RTE. But > I don't think it's wise or necessary to whack that around just to fix > this bug; it is refactoring or improvement work best left to a future > release. I agree with all of that except your conclusion that it's unnecessary to do it to fix this bug. I find that entirely unsupported and over- optimistic, especially in view of the number of iterations Ashutosh went through trying to fix the fallout from not making a clear distinction. regards, tom lane
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Robert Haas
Date:
On Fri, Aug 3, 2018 at 10:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Anyway, what I'm basically suggesting is that we just disable support for > PWJ in the problematic cases in v11. As long as PWJ isn't even on by > default, that's not much of a loss. Obviously we'll want to fix it in the > future, but the hour grows late for v11, and I think either of these > patches would need quite a bit more work to be committable. That's not my first choice, but I'm OK with it. >> There are definitely some things not to like about this approach. In >> particular, I definitely agree that treating a converted whole-row >> reference specially is not very nice. It feels like it might be >> substantially cleaner to be able to represent this idea as a single >> node rather than a combination of ConvertRowtypeExpr and var with >> varattno = 0. Perhaps in the future we ought to consider either (a) >> trying to use a RowExpr for this rather than ConvertRowtypeExpr or (b) >> inventing a new WholeRowExpr node that stores two RTIs, one for the >> relation that's generating the whole-row reference and the other for >> the relation that is controlling the column ordering of the result or >> (c) allowing a Var to represent a whole-row expression that has to >> produce its outputs according to the ordering of some other RTE. But >> I don't think it's wise or necessary to whack that around just to fix >> this bug; it is refactoring or improvement work best left to a future >> release. > > I agree with all of that except your conclusion that it's unnecessary > to do it to fix this bug. I find that entirely unsupported and over- > optimistic, especially in view of the number of iterations Ashutosh > went through trying to fix the fallout from not making a clear > distinction. I don't think that the question of how a converted whole-row expr can really make a difference between being able to fix this bug and not being able to fix this bug. It's just a representational choice. If we decide that a converted whole-row expr is represented as ConvertRowTypeExpr on top of a Var, then we have to check for that. If we revise the representation to use a Var node directly, or to use a new WholeRowVar node, then we'll just need to check for those things instead. I think that the shape of the fix itself does not change. Moreover, it's pretty much a 1:1 mapping. It's not like, say, replacing plans with paths in the whole upper half of the planner, where the representation is different enough that things that would have been very difficult to manage in the old representation become relatively simple in the new one. It's basically just a question of how you spell it. Writing a test for ConvertRowTypeExpr atop Var and using it a bunch of places is not beautiful, and it's certainly going to be marginally slower than a direct test for WholeRowVar or some other representation that bundles the whole thing into one node, but I think it should work just fine, and the representation can be revised later in a separate patch once we agree on an approach. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/08/03 22:28), Etsuro Fujita wrote: > (2018/08/03 22:18), Etsuro Fujita wrote: >> Here is a patch for refusing to generate PWJ paths when WRVs are >> involved: >> >> * We no longer need to handle WRVs, so I've simplified >> build_joinrel_tlist() and setrefs.c to what they were before >> partition-wise join went in, as in the previous patch. >> >> * attr_needed data for each child is used for building child-joins' >> tlists, but I think we can build those by applying >> adjust_appendrel_attrs to the parent's tlists, without attr_needed. So, >> I've also removed that as in the previous patch. > > One thing to add: as for the latter, we don't need the changes to > placeholder.c either, so I've also simplified that to what they were > before partition-wise join went in. *** a/src/backend/optimizer/plan/planner.c --- b/src/backend/optimizer/plan/planner.c *************** *** 3960,3965 **** create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel, --- 3960,3967 ---- */ if (extra->patype != PARTITIONWISE_AGGREGATE_NONE && input_rel->part_scheme && input_rel->part_rels && + (input_rel->reloptkind == RELOPT_BASEREL || + input_rel->consider_partitionwise_join) && !IS_DUMMY_REL(input_rel)) { /* *************** *** 6913,6919 **** apply_scanjoin_target_to_paths(PlannerInfo *root, * projection-capable, that might save a separate Result node, and it also * is important for partitionwise aggregate. */ ! if (rel->part_scheme && rel->part_rels) { int partition_idx; List *live_children = NIL; --- 6915,6923 ---- * projection-capable, that might save a separate Result node, and it also * is important for partitionwise aggregate. */ ! if (rel->part_scheme && rel->part_rels && ! (rel->reloptkind == RELOPT_BASEREL || ! rel->consider_partitionwise_join)) { int partition_idx; List *live_children = NIL; In the above I used the test whether the relation's reloptkind is RELOPT_BASEREL or not, but I noticed that I had overlooked the case of a multi-level partitioned table. So I fixed that and added regression test cases for that. I also revised comments a bit. Attached is an updated version of the patch. Best regards, Etsuro Fujita
Attachment
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Robert Haas
Date:
On Mon, Aug 6, 2018 at 8:30 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > In the above I used the test whether the relation's reloptkind is > RELOPT_BASEREL or not, but I noticed that I had overlooked the case of a > multi-level partitioned table. So I fixed that and added regression test > cases for that. I also revised comments a bit. Attached is an updated > version of the patch. + /* If so, consider partitionwise joins for that join. */ + if (IS_PARTITIONED_REL(joinrel)) + joinrel->consider_partitionwise_join = true; Maybe this should assert that the inner and outer rels have consider_partitionwise_join set. There is an Assert quite a bit earlier in the function that the parent join have it set, but I think it might make sense to check the children have it set whenever we set the flag. Aside from that I don't really have any suggestions on this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/08/13 11:57), Robert Haas wrote: > On Mon, Aug 6, 2018 at 8:30 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> In the above I used the test whether the relation's reloptkind is >> RELOPT_BASEREL or not, but I noticed that I had overlooked the case of a >> multi-level partitioned table. So I fixed that and added regression test >> cases for that. I also revised comments a bit. Attached is an updated >> version of the patch. > > + /* If so, consider partitionwise joins for that join. */ > + if (IS_PARTITIONED_REL(joinrel)) > + joinrel->consider_partitionwise_join = true; > > Maybe this should assert that the inner and outer rels have > consider_partitionwise_join set. There is an Assert quite a bit > earlier in the function that the parent join have it set, but I think > it might make sense to check the children have it set whenever we set > the flag. Agreed. Done. One thing I noticed might be an improvement is to skip build_joinrel_partition_info if the given joinrel will be to have consider_partitionwise_join=false; in the previous patch, that function created the joinrel's partition info such as part_scheme and part_rels if the joinrel is considered as partitioned, independently of the flag consider_partitionwise_join for it, but if that flag is false, we don't generate PWJ paths for the joinrel, so we would not need to create that partition info at all. This would not only avoid unnecessary processing in that function, but also make unnecessary the changes I made to try_partitionwise_join, generate_partitionwise_join_paths, apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths. So I updated the patch that way. Please find attached an updated version of the patch. Thanks for the review! Best regards, Etsuro Fujita
Attachment
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Robert Haas
Date:
On Mon, Aug 13, 2018 at 12:32 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > One thing I noticed might be an improvement is to skip > build_joinrel_partition_info if the given joinrel will be to have > consider_partitionwise_join=false; in the previous patch, that function > created the joinrel's partition info such as part_scheme and part_rels if > the joinrel is considered as partitioned, independently of the flag > consider_partitionwise_join for it, but if that flag is false, we don't > generate PWJ paths for the joinrel, so we would not need to create that > partition info at all. This would not only avoid unnecessary processing in > that function, but also make unnecessary the changes I made to > try_partitionwise_join, generate_partitionwise_join_paths, > apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths. So I > updated the patch that way. Please find attached an updated version of the > patch. I guess the question is whether there are (or might be in the future) other dependencies on part_scheme. For example, it looks like partition pruning uses it. I'm not sure whether partition pruning supports a plan like: Append -> Nested Loop -> Seq Scan on p1 -> Index Scan on q1 <repeat the above for p2/q2 etc.> If it doesn't, that's just an implementation restriction; somebody might want to fix things so it works, if it doesn't already. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
"Jonathan S. Katz"
Date:
> On Aug 14, 2018, at 11:51 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Aug 13, 2018 at 12:32 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> One thing I noticed might be an improvement is to skip >> build_joinrel_partition_info if the given joinrel will be to have >> consider_partitionwise_join=false; in the previous patch, that function >> created the joinrel's partition info such as part_scheme and part_rels if >> the joinrel is considered as partitioned, independently of the flag >> consider_partitionwise_join for it, but if that flag is false, we don't >> generate PWJ paths for the joinrel, so we would not need to create that >> partition info at all. This would not only avoid unnecessary processing in >> that function, but also make unnecessary the changes I made to >> try_partitionwise_join, generate_partitionwise_join_paths, >> apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths. So I >> updated the patch that way. Please find attached an updated version of the >> patch. > > I guess the question is whether there are (or might be in the future) > other dependencies on part_scheme. For example, it looks like > partition pruning uses it. I'm not sure whether partition pruning > supports a plan like: > > Append > -> Nested Loop > -> Seq Scan on p1 > -> Index Scan on q1 > <repeat the above for p2/q2 etc.> > > If it doesn't, that's just an implementation restriction; somebody > might want to fix things so it works, if it doesn't already. While we (the RMT) are happy to see there has been progress on this thread, 11 Beta 3 was released containing this problem and the time to adequately test this feature prior to GA release is rapidly dwindling. The RMT would like to see this prioritized and resolved. At this point we have considered the option of having partitionwise joins completely disabled in PostgreSQL 11. This is preferably not the path we want to choose, but this option is now on the table and we will issue an ultimatum soon if we do not see further progress. In the previous discussion, the generally accepted solution for PostgreSQL 11 seemed to be to disable the problematic case and any further work would be for PostgreSQL 12 only. If something is different, please indicate why ASAP and work to resolve. The RMT will also conduct some additional technical analysis as well. Sincerely, Alvaro, Andres, Jonathan
Attachment
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/08/15 0:51), Robert Haas wrote: > On Mon, Aug 13, 2018 at 12:32 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> One thing I noticed might be an improvement is to skip >> build_joinrel_partition_info if the given joinrel will be to have >> consider_partitionwise_join=false; in the previous patch, that function >> created the joinrel's partition info such as part_scheme and part_rels if >> the joinrel is considered as partitioned, independently of the flag >> consider_partitionwise_join for it, but if that flag is false, we don't >> generate PWJ paths for the joinrel, so we would not need to create that >> partition info at all. This would not only avoid unnecessary processing in >> that function, but also make unnecessary the changes I made to >> try_partitionwise_join, generate_partitionwise_join_paths, >> apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths. So I >> updated the patch that way. Please find attached an updated version of the >> patch. > > I guess the question is whether there are (or might be in the future) > other dependencies on part_scheme. For example, it looks like > partition pruning uses it. I'm not sure whether partition pruning > supports a plan like: > > Append > -> Nested Loop > -> Seq Scan on p1 > -> Index Scan on q1 > <repeat the above for p2/q2 etc.> I'm not sure that either, but if a join relation doesn't have part_scheme set, it means that that relation is considered as non-partitioned, as in the case when enable_partitionwise_join is off, so there would be no PWJ paths generated for it, to begin with. So in that case, ISTM that we don't need to worry about that at least for partition pruning. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Amit Langote
Date:
On 2018/08/15 12:25, Etsuro Fujita wrote: > (2018/08/15 0:51), Robert Haas wrote: >> On Mon, Aug 13, 2018 at 12:32 PM, Etsuro Fujita >> <fujita.etsuro@lab.ntt.co.jp> wrote: >>> One thing I noticed might be an improvement is to skip >>> build_joinrel_partition_info if the given joinrel will be to have >>> consider_partitionwise_join=false; in the previous patch, that function >>> created the joinrel's partition info such as part_scheme and part_rels if >>> the joinrel is considered as partitioned, independently of the flag >>> consider_partitionwise_join for it, but if that flag is false, we don't >>> generate PWJ paths for the joinrel, so we would not need to create that >>> partition info at all. This would not only avoid unnecessary >>> processing in >>> that function, but also make unnecessary the changes I made to >>> try_partitionwise_join, generate_partitionwise_join_paths, >>> apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths. So I >>> updated the patch that way. Please find attached an updated version of >>> the >>> patch. >> >> I guess the question is whether there are (or might be in the future) >> other dependencies on part_scheme. For example, it looks like >> partition pruning uses it. I'm not sure whether partition pruning >> supports a plan like: >> >> Append >> -> Nested Loop >> -> Seq Scan on p1 >> -> Index Scan on q1 >> <repeat the above for p2/q2 etc.> > > I'm not sure that either, but if a join relation doesn't have part_scheme > set, it means that that relation is considered as non-partitioned, as in > the case when enable_partitionwise_join is off, so there would be no PWJ > paths generated for it, to begin with. So in that case, ISTM that we > don't need to worry about that at least for partition pruning. Fwiw, partition pruning works only for base rels, which applies to both planning-time pruning (pruning is performed only during base rel size estimation) and run-time pruning (we'll add pruning info to the Append plan only if the source AppendPath's parent rel is a base rel). Thanks, Amit
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/08/15 13:04), Amit Langote wrote: > On 2018/08/15 12:25, Etsuro Fujita wrote: >> (2018/08/15 0:51), Robert Haas wrote: >>> On Mon, Aug 13, 2018 at 12:32 PM, Etsuro Fujita >>> <fujita.etsuro@lab.ntt.co.jp> wrote: >>>> One thing I noticed might be an improvement is to skip >>>> build_joinrel_partition_info if the given joinrel will be to have >>>> consider_partitionwise_join=false; in the previous patch, that function >>>> created the joinrel's partition info such as part_scheme and part_rels if >>>> the joinrel is considered as partitioned, independently of the flag >>>> consider_partitionwise_join for it, but if that flag is false, we don't >>>> generate PWJ paths for the joinrel, so we would not need to create that >>>> partition info at all. This would not only avoid unnecessary >>>> processing in >>>> that function, but also make unnecessary the changes I made to >>>> try_partitionwise_join, generate_partitionwise_join_paths, >>>> apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths. So I >>>> updated the patch that way. Please find attached an updated version of >>>> the >>>> patch. >>> >>> I guess the question is whether there are (or might be in the future) >>> other dependencies on part_scheme. For example, it looks like >>> partition pruning uses it. I'm not sure whether partition pruning >>> supports a plan like: >>> >>> Append >>> -> Nested Loop >>> -> Seq Scan on p1 >>> -> Index Scan on q1 >>> <repeat the above for p2/q2 etc.> >> >> I'm not sure that either, but if a join relation doesn't have part_scheme >> set, it means that that relation is considered as non-partitioned, as in >> the case when enable_partitionwise_join is off, so there would be no PWJ >> paths generated for it, to begin with. So in that case, ISTM that we >> don't need to worry about that at least for partition pruning. > > Fwiw, partition pruning works only for base rels, which applies to both > planning-time pruning (pruning is performed only during base rel size > estimation) and run-time pruning (we'll add pruning info to the Append > plan only if the source AppendPath's parent rel is a base rel). Thanks for that, Amit! I looked into the question for the join or upper rel case, but couldn't find any places in the PG11 code where we assume that a join or upper rel has non-NULL part_scheme, as described below: * Both try_partitionwise_join() and generate_partitionwise_join_paths() check whether a join rel to be considered has non-NULL part_scheme, through the IS_PARTITIONED_REL macro: #define IS_PARTITIONED_REL(rel) \ ((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \ (rel)->part_rels && !(IS_DUMMY_REL(rel))) If IS_PARTITIONED_REL, the former adds paths to child-joins, and the latter builds PWJ paths; but both don't assume non-NULL part_scheme. * apply_scanjoin_target_to_paths() checks whether the topmost scan/join rel has non-NULL part_scheme, and if so, it recursively adjusts all partitions; it doesn't assume non-NULL part_scheme. * create_ordinary_grouping_paths() also checks whether the topmost scan/join rel adjusted by apply_scanjoin_target_to_paths() has non-NULL part_scheme, and if so, it considers PWA paths; it doesn't assume non-NULL part_scheme either. * add_paths_to_append_rel(), which is called from each of the above (ie, generate_partitionwise_join_paths(), apply_scanjoin_target_to_paths(), and create_partitionwise_grouping_paths() in create_ordinary_grouping_paths() for creating PWJ paths, adjusting PWJ paths, and creating PWA paths respectively) also checks whether a rel to be considered has non-NULL part_scheme, but never assumes that. And actually, if the rel's part_scheme is NULL, add_paths_to_append_rel() wouldn't be called for the rel, because of the reason described above. Thanks for the comments, Robert! Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Robert Haas
Date:
On Wed, Aug 15, 2018 at 7:35 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > Thanks for the comments, Robert! Given the comments from the RMT, and also on general principle, it seems like we really need to get on with committing something here. It's my understanding you plan to do that, since it's your patch. When? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/08/15 23:40), Robert Haas wrote: > Given the comments from the RMT, and also on general principle, it > seems like we really need to get on with committing something here. > It's my understanding you plan to do that, since it's your patch. > When? I plan to do that late next week as I'll go on leave until next Tuesday. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/08/16 12:00), Etsuro Fujita wrote: > (2018/08/15 23:40), Robert Haas wrote: >> Given the comments from the RMT, and also on general principle, it >> seems like we really need to get on with committing something here. >> It's my understanding you plan to do that, since it's your patch. >> When? > > I plan to do that late next week as I'll go on leave until next Tuesday. I added the commit message. Please find attached an updated version of the patch. Does that make sense? If there are not objections, I'll push this tomorrow. Best regards, Etsuro Fujita
Attachment
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/08/22 20:08), Etsuro Fujita wrote: > (2018/08/16 12:00), Etsuro Fujita wrote: >> (2018/08/15 23:40), Robert Haas wrote: >>> Given the comments from the RMT, and also on general principle, it >>> seems like we really need to get on with committing something here. >>> It's my understanding you plan to do that, since it's your patch. >>> When? >> >> I plan to do that late next week as I'll go on leave until next Tuesday. > > I added the commit message. Please find attached an updated version of > the patch. Does that make sense? If there are not objections, I'll push > this tomorrow. I tried this today, but doing git behind the corporate firewall doesn't work. I don't know the clear cause of that, so I'll investigate that tomorrow. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Michael Paquier
Date:
On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote: > I tried this today, but doing git behind the corporate firewall doesn't > work. I don't know the clear cause of that, so I'll investigate that > tomorrow. You may be able to tweak that by using https as origin point or proper git proxy settings? -- Michael
Attachment
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/08/24 11:47), Michael Paquier wrote: > On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote: >> I tried this today, but doing git behind the corporate firewall doesn't >> work. I don't know the clear cause of that, so I'll investigate that >> tomorrow. > > You may be able to tweak that by using https as origin point or proper > git proxy settings? Yeah, my proxy settings were not correct. With the help of my colleagues Horiguchi-san and Yamada-san, I corrected them but still can't clone the master repository. Running git with GIT_CURL_VERBOSE shows that there is another issue in my terminal environment, so I'm trying to resolve that. Thanks for the advice! Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
"Jonathan S. Katz"
Date:
> On Aug 24, 2018, at 8:38 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > > (2018/08/24 11:47), Michael Paquier wrote: >> On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote: >>> I tried this today, but doing git behind the corporate firewall doesn't >>> work. I don't know the clear cause of that, so I'll investigate that >>> tomorrow. >> >> You may be able to tweak that by using https as origin point or proper >> git proxy settings? > > Yeah, my proxy settings were not correct. With the help of my colleagues Horiguchi-san and Yamada-san, I corrected thembut still can't clone the master repository. Running git with GIT_CURL_VERBOSE shows that there is another issue inmy terminal environment, so I'm trying to resolve that. Are there any updates on getting this patch committed? Thanks, Jonathan
Attachment
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/08/29 0:21), Jonathan S. Katz wrote: >> On Aug 24, 2018, at 8:38 AM, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: >> (2018/08/24 11:47), Michael Paquier wrote: >>> On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote: >>>> I tried this today, but doing git behind the corporate firewall doesn't >>>> work. I don't know the clear cause of that, so I'll investigate that >>>> tomorrow. >>> >>> You may be able to tweak that by using https as origin point or proper >>> git proxy settings? >> >> Yeah, my proxy settings were not correct. With the help of my colleagues Horiguchi-san and Yamada-san, I corrected thembut still can't clone the master repository. Running git with GIT_CURL_VERBOSE shows that there is another issue inmy terminal environment, so I'm trying to resolve that. > > Are there any updates on getting this patch committed? That investigation has shown that the cause is my company firewall, not my terminal environment; that firewall has to be configured to allow me to access to that repository. So, I'm asking my company about that. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/08/29 18:40), Etsuro Fujita wrote: > (2018/08/29 0:21), Jonathan S. Katz wrote: >>> On Aug 24, 2018, at 8:38 AM, Etsuro >>> Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: >>> (2018/08/24 11:47), Michael Paquier wrote: >>>> On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote: >>>>> I tried this today, but doing git behind the corporate firewall >>>>> doesn't >>>>> work. I don't know the clear cause of that, so I'll investigate that >>>>> tomorrow. >>>> >>>> You may be able to tweak that by using https as origin point or proper >>>> git proxy settings? >>> >>> Yeah, my proxy settings were not correct. With the help of my >>> colleagues Horiguchi-san and Yamada-san, I corrected them but still >>> can't clone the master repository. Running git with GIT_CURL_VERBOSE >>> shows that there is another issue in my terminal environment, so I'm >>> trying to resolve that. >> >> Are there any updates on getting this patch committed? > > That investigation has shown that the cause is my company firewall, not > my terminal environment; that firewall has to be configured to allow me > to access to that repository. So, I'm asking my company about that. I got the approval from my boss today, so I think that I can have that access soon. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/08/30 20:25), Etsuro Fujita wrote: > (2018/08/29 18:40), Etsuro Fujita wrote: >> (2018/08/29 0:21), Jonathan S. Katz wrote: >>>> On Aug 24, 2018, at 8:38 AM, Etsuro >>>> Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: >>>> (2018/08/24 11:47), Michael Paquier wrote: >>>>> On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote: >>>>>> I tried this today, but doing git behind the corporate firewall >>>>>> doesn't >>>>>> work. I don't know the clear cause of that, so I'll investigate that >>>>>> tomorrow. >>>>> >>>>> You may be able to tweak that by using https as origin point or proper >>>>> git proxy settings? >>>> >>>> Yeah, my proxy settings were not correct. With the help of my >>>> colleagues Horiguchi-san and Yamada-san, I corrected them but still >>>> can't clone the master repository. Running git with GIT_CURL_VERBOSE >>>> shows that there is another issue in my terminal environment, so I'm >>>> trying to resolve that. >>> >>> Are there any updates on getting this patch committed? >> >> That investigation has shown that the cause is my company firewall, not >> my terminal environment; that firewall has to be configured to allow me >> to access to that repository. So, I'm asking my company about that. > > I got the approval from my boss today, so I think that I can have that > access soon. I fixed typos in the commit message, which Alvaro pointed out off-list, and revised the message a little bit. Also, I reread the patch and noticed that the latest version includes now-unnecessary regression test cases; those were added to the previous version (refuse-pwj-when-wrvs-involved-2.patch in [1]) to check that apply_scanjoin_target_to_paths() and create_ordinary_grouping_paths() work for cases where whole-row Vars are involved, because I modified those functions, but the latest version doesn't touch those functions anymore, so those test cases seems unnecessary. So I removed those (no other changes), and committed the patch. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/5B683F60.2070806%40lab.ntt.co.jp
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
"Jonathan S. Katz"
Date:
On 8/31/18 7:54 AM, Etsuro Fujita wrote: > (2018/08/30 20:25), Etsuro Fujita wrote: >> (2018/08/29 18:40), Etsuro Fujita wrote: >>> (2018/08/29 0:21), Jonathan S. Katz wrote: >>>>> On Aug 24, 2018, at 8:38 AM, Etsuro >>>>> Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: >>>>> (2018/08/24 11:47), Michael Paquier wrote: >>>>>> On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote: >>>>>>> I tried this today, but doing git behind the corporate firewall >>>>>>> doesn't >>>>>>> work. I don't know the clear cause of that, so I'll investigate >>>>>>> that >>>>>>> tomorrow. >>>>>> >>>>>> You may be able to tweak that by using https as origin point or >>>>>> proper >>>>>> git proxy settings? >>>>> >>>>> Yeah, my proxy settings were not correct. With the help of my >>>>> colleagues Horiguchi-san and Yamada-san, I corrected them but still >>>>> can't clone the master repository. Running git with GIT_CURL_VERBOSE >>>>> shows that there is another issue in my terminal environment, so I'm >>>>> trying to resolve that. >>>> >>>> Are there any updates on getting this patch committed? >>> >>> That investigation has shown that the cause is my company firewall, not >>> my terminal environment; that firewall has to be configured to allow me >>> to access to that repository. So, I'm asking my company about that. >> >> I got the approval from my boss today, so I think that I can have that >> access soon. > > I fixed typos in the commit message, which Alvaro pointed out > off-list, and revised the message a little bit. Also, I reread the > patch and noticed that the latest version includes now-unnecessary > regression test cases; those were added to the previous version > (refuse-pwj-when-wrvs-involved-2.patch in [1]) to check that > apply_scanjoin_target_to_paths() and create_ordinary_grouping_paths() > work for cases where whole-row Vars are involved, because I modified > those functions, but the latest version doesn't touch those functions > anymore, so those test cases seems unnecessary. So I removed those > (no other changes), and committed the patch. Thank you for taking care of that and for committing the patch. I have now closed this issues on the open items list. Jonathan
Attachment
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/08/31 21:30), Jonathan S. Katz wrote: > On 8/31/18 7:54 AM, Etsuro Fujita wrote: >> (2018/08/30 20:25), Etsuro Fujita wrote: >>> (2018/08/29 18:40), Etsuro Fujita wrote: >>>> (2018/08/29 0:21), Jonathan S. Katz wrote: >>>>>> On Aug 24, 2018, at 8:38 AM, Etsuro >>>>>> Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: >>>>>> (2018/08/24 11:47), Michael Paquier wrote: >>>>>>> On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote: >>>>>>>> I tried this today, but doing git behind the corporate firewall >>>>>>>> doesn't >>>>>>>> work. I don't know the clear cause of that, so I'll investigate >>>>>>>> that >>>>>>>> tomorrow. >>>>>>> >>>>>>> You may be able to tweak that by using https as origin point or >>>>>>> proper >>>>>>> git proxy settings? >>>>>> >>>>>> Yeah, my proxy settings were not correct. With the help of my >>>>>> colleagues Horiguchi-san and Yamada-san, I corrected them but still >>>>>> can't clone the master repository. Running git with GIT_CURL_VERBOSE >>>>>> shows that there is another issue in my terminal environment, so I'm >>>>>> trying to resolve that. >>>>> >>>>> Are there any updates on getting this patch committed? >>>> >>>> That investigation has shown that the cause is my company firewall, not >>>> my terminal environment; that firewall has to be configured to allow me >>>> to access to that repository. So, I'm asking my company about that. >>> >>> I got the approval from my boss today, so I think that I can have that >>> access soon. >> >> I fixed typos in the commit message, which Alvaro pointed out >> off-list, and revised the message a little bit. Also, I reread the >> patch and noticed that the latest version includes now-unnecessary >> regression test cases; those were added to the previous version >> (refuse-pwj-when-wrvs-involved-2.patch in [1]) to check that >> apply_scanjoin_target_to_paths() and create_ordinary_grouping_paths() >> work for cases where whole-row Vars are involved, because I modified >> those functions, but the latest version doesn't touch those functions >> anymore, so those test cases seems unnecessary. So I removed those >> (no other changes), and committed the patch. > > Thank you for taking care of that and for committing the patch. I have > now closed this issues on the open items list. Thanks! Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Amit Langote
Date:
On 2018/08/31 21:40, Etsuro Fujita wrote: > (2018/08/31 21:30), Jonathan S. Katz wrote: >> Thank you for taking care of that and for committing the patch. I have >> now closed this issues on the open items list. > > Thanks! I noticed that the CF entry for this was not closed. As of this morning, it's been moved to the next 2018-11 CF: https://commitfest.postgresql.org/20/1554/ I tried to close it as being committed, but couldn't do so, because I can't find Fujita-san's name in the list of committers in the CF app's drop down box that lists all committers. Thanks, Amit
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
From
Michael Paquier
Date:
On Tue, Oct 02, 2018 at 04:11:54PM +0900, Amit Langote wrote: > I tried to close it as being committed, but couldn't do so, because I > can't find Fujita-san's name in the list of committers in the CF app's > drop down box that lists all committers. Indeed, Fujita-san has been added to the list. And I switched this CF entry at the same time. -- Michael
Attachment
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
From
Etsuro Fujita
Date:
(2018/10/02 16:45), Michael Paquier wrote: > On Tue, Oct 02, 2018 at 04:11:54PM +0900, Amit Langote wrote: >> I tried to close it as being committed, but couldn't do so, because I >> can't find Fujita-san's name in the list of committers in the CF app's >> drop down box that lists all committers. > > Indeed, Fujita-san has been added to the list. And I switched this CF > entry at the same time. Thank you guys! Best regards, Etsuro Fujita
Hello. I was looking at enabling partition-wise join with whole row vars. My main motivation was to enable push down DML with partition-wise join in postgres_fdw. The work is based on the earlier patches of Ashutosh Bapat [1]. Partition-wise joins are disabled when whole row vars from relations, participating in join, are required to be selected. Using whole row vars is not frequent by itself, but happens when FDW is used in DML or SELECT FOR UPDATE/ FOR SHARE queries. I'd like to remove this restriction, because this will make it possible to do direct modify operations DML joins in postgres_fdw. Currently there's an issue in PostgreSQL that when we want to get whole row var of child table, we should translate the record type of child table to the record type of the parent table. So, there appear additional ConvertRowtyeExpressions to cast types and different places in planner (and in postgres_fdw) now should know about this. Here's the series of patches, originally proposed by Ashutosh Bapat, rebased and updated to work with current master. What was done: 1) Reverting changes, that forbid PWJ when whole row vars are required. This also restores logic in setrefs.c, necessary to deal with ConvertRowtyeExpressions. 2) Applying modified patch from the original series to handle ConvertRowtypeExprs in pull_vars_clause(). Unlike original patch, default pull_var_clause_walker() behavior is unchanged. However, when PVC_INCLUDE_CONVERTROWTYPES flag is specified, converted whole row references are returned as is and are not recursed to. This is done in such a way to avoid modifying all function consumers. 3) Modified one of the original patches to handle ConvertRowtypeExpr in find_computable_ec_member(). 4) The next patch modifies logic to search for ConvertRowtypeExpr in search_indexed_tlist_for_non_var() - at least for rowid vars varnullingrels can be omitted, so we avoid looking at them when comparing different ConvertRowtypeExprs. It seems to be working as expected, but I'm not deadly sure about this. Perhaps, we also should consider context->nrm_match? 5) The next patch is the original one - pulling ConvertRowtypeExpr in build_tlist_to_deparse() and deparsing ConvertRowtypeExpr(). 6) The rest is fix for postgres_fdw to correctly deparse DML. The first tricky part here is that in UPDATE SET clause arbitrary relations can appear, and currently there's no API to handle this. So I've modified get_translated_update_targetlist() to do proper adjustments. The second is work with returning lists - we should set tableoids for the returned values. This is done by modifying returning filters to save this information and later to set it in returned tuples. What this gives us - is the posibility of partition-wise joins for foreign DML, like following: EXPLAIN (COSTS OFF, VERBOSE) DELETE FROM fprt1 USING fprt2 WHERE fprt1.a = fprt2.b AND fprt2.a % 30 = 29; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Delete on public.fprt1 Foreign Delete on public.ftprt1_p1 fprt1_1 Foreign Delete on public.ftprt1_p2 fprt1_2 -> Append -> Foreign Delete Remote SQL: DELETE FROM public.fprt1_p1 r3 USING public.fprt2_p1 r5 WHERE ((r3.a = r5.b)) AND (((r5.a % 30) = 29)) -> Foreign Delete Remote SQL: DELETE FROM public.fprt1_p2 r4 USING public.fprt2_p2 r6 WHERE ((r4.a = r6.b)) AND (((r6.a % 30) = 29)) [1] https://www.postgresql.org/message-id/CAFjFpRc8ZoDm0%2Bzhx%2BMckwGyEqkOzWcpVqbvjaxwdGarZSNrmA%40mail.gmail.com -- Best regards, Alexander Pyhalov, Postgres Professional
Attachment
- 0006-postgres_fdw-fix-partition-wise-DML.patch
- 0005-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch
- 0004-Compare-converted-whole-row-vars-in-search_indexed_t.patch
- 0003-Handle-child-relation-s-ConvertRowtypeExpr-in-find_c.patch
- 0002-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch
- 0001-Allow-partition-wise-join-when-reltarget-contains-wh.patch
> On Tue, Oct 08, 2024 at 09:24:15AM GMT, Alexander Pyhalov wrote: > > Attaching rebased patches. Just to let you know, looks like CFBot tests are red again, but this time there are some unexpected differences in some test query plan.