Thread: [HACKERS] [sqlsmith] ERROR: badly formatted node string "RESTRICTINFO...
[HACKERS] [sqlsmith] ERROR: badly formatted node string "RESTRICTINFO...
From
Andreas Seltenreich
Date:
Hi, I see the above ERROR logged a lot when testing master at eef8c0069e with a postgres_fdw around. Below is a recipe to reproduce it on top of the regression DB. regards, Andreas create extension postgres_fdw; create server myself foreign data wrapper postgres_fdw; create schema fdw_postgres; create user fdw login; grant all on schema public to fdw; grant all on all tables in schema public to fdw; create user mapping for public server myself options (user 'fdw'); import foreign schema public from server myself into fdw_postgres; set max_parallel_workers_per_gather = 8; set parallel_setup_cost = 0; set parallel_tuple_cost = 0; regression=> select (select max(result) from fdw_postgres.num_result) from tt0; ERROR: badly formatted node string "RESTRICTINFO :clause {NULLTEST :"... CONTEXT: parallel worker regression=> explain select (select max(result) from fdw_postgres.num_result) from tt0; QUERY PLAN -------------------------------------------------------------------------------------------Gather (cost=100.06..122.51 rows=1947width=32) Workers Planned: 1 InitPlan 2 (returns $1) -> Result (cost=100.05..100.06 rows=1 width=32) InitPlan 1 (returns $0) -> Limit (cost=100.00..100.05 rows=1 width=74) -> Foreign Scanon num_result (cost=100.00..138.64 rows=831 width=74) -> Append (cost=0.00..22.45 rows=1145 width=0) -> ParallelSeq Scan on tt0 (cost=0.00..2.04 rows=104 width=0) -> Parallel Seq Scan on tt6 (cost=0.00..20.41 rows=1041width=0)
Andreas Seltenreich <seltenreich@gmx.de> writes: > regression=> select (select max(result) from fdw_postgres.num_result) from tt0; > ERROR: badly formatted node string "RESTRICTINFO :clause {NULLTEST :"... > CONTEXT: parallel worker Apparently, postgres_fdw is trying to store RestrictInfos in the fdw_private field of a ForeignScan node. That won't do; those aren't supposed to be present in a finished plan tree, so there's no readfuncs.c support for them (and we're not adding it). Don't know if this is a new bug, or ancient but not previously reachable. It seems to be nearly the inverse of the problem you found yesterday, in which postgres_fdw was stripping RestrictInfos sooner than it really ought to. Apparently that whole business needs a fresh look. regards, tom lane
I wrote: > Apparently, postgres_fdw is trying to store RestrictInfos in the > fdw_private field of a ForeignScan node. That won't do; those aren't > supposed to be present in a finished plan tree, so there's no readfuncs.c > support for them (and we're not adding it). > Don't know if this is a new bug, or ancient but not previously reachable. > It seems to be nearly the inverse of the problem you found yesterday, > in which postgres_fdw was stripping RestrictInfos sooner than it really > ought to. Apparently that whole business needs a fresh look. Attached is a proposed patch that cleans up the mess here --- and it was a mess. The comments for PgFdwRelationInfo claimed that the remote_conds and local_conds fields contained bare clauses, but actually they could contain either bare clauses or RestrictInfos or both, depending on where the clauses had come from. And there was some seriously obscure and undocumented logic around how the fdw_recheck_quals got set up for a foreign join node. (BTW, said logic is assuming that no EPQ recheck is needed for a foreign join. I commented it to that effect, but I'm not very sure that I believe it. If there's a bug there, though, it's independent of the immediate problem.) Anybody want to review this, or shall I just push it? (BTW, I've not yet looked to see if this needs to be back-ported.) regards, tom lane diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index c5149a0..1d5aa83 100644 *** a/contrib/postgres_fdw/deparse.c --- b/contrib/postgres_fdw/deparse.c *************** classifyConditions(PlannerInfo *root, *** 210,216 **** foreach(lc, input_conds) { ! RestrictInfo *ri = (RestrictInfo *) lfirst(lc); if (is_foreign_expr(root, baserel, ri->clause)) *remote_conds = lappend(*remote_conds, ri); --- 210,216 ---- foreach(lc, input_conds) { ! RestrictInfo *ri = lfirst_node(RestrictInfo, lc); if (is_foreign_expr(root, baserel, ri->clause)) *remote_conds = lappend(*remote_conds, ri); *************** build_tlist_to_deparse(RelOptInfo *forei *** 869,874 **** --- 869,875 ---- { List *tlist = NIL; PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private; + ListCell *lc; /* * For an upper relation, we have already built the target list while *************** build_tlist_to_deparse(RelOptInfo *forei *** 884,892 **** tlist = add_to_flat_tlist(tlist, pull_var_clause((Node *) foreignrel->reltarget->exprs, PVC_RECURSE_PLACEHOLDERS)); ! tlist = add_to_flat_tlist(tlist, ! pull_var_clause((Node *) fpinfo->local_conds, ! PVC_RECURSE_PLACEHOLDERS)); return tlist; } --- 885,898 ---- tlist = add_to_flat_tlist(tlist, pull_var_clause((Node *) foreignrel->reltarget->exprs, PVC_RECURSE_PLACEHOLDERS)); ! foreach(lc, fpinfo->local_conds) ! { ! RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc); ! ! tlist = add_to_flat_tlist(tlist, ! pull_var_clause((Node *) rinfo->clause, ! PVC_RECURSE_PLACEHOLDERS)); ! } return tlist; } *************** deparseSelectSql(List *tlist, bool is_su *** 1049,1054 **** --- 1055,1061 ---- * "buf". * * quals is the list of clauses to be included in the WHERE clause. + * (These may or may not include RestrictInfo decoration.) */ static void deparseFromExpr(List *quals, deparse_expr_cxt *context) *************** deparseLockingClause(deparse_expr_cxt *c *** 1262,1267 **** --- 1269,1277 ---- * * The conditions in the list are assumed to be ANDed. This function is used to * deparse WHERE clauses, JOIN .. ON clauses and HAVING clauses. + * + * Depending on the caller, the list elements might be either RestrictInfos + * or bare clauses. */ static void appendConditions(List *exprs, deparse_expr_cxt *context) *************** appendConditions(List *exprs, deparse_ex *** 1278,1293 **** { Expr *expr = (Expr *) lfirst(lc); ! /* ! * Extract clause from RestrictInfo, if required. See comments in ! * declaration of PgFdwRelationInfo for details. ! */ if (IsA(expr, RestrictInfo)) ! { ! RestrictInfo *ri = (RestrictInfo *) expr; ! ! expr = ri->clause; ! } /* Connect expressions with "AND" and parenthesize each condition. */ if (!is_first) --- 1288,1296 ---- { Expr *expr = (Expr *) lfirst(lc); ! /* Extract clause from RestrictInfo, if required */ if (IsA(expr, RestrictInfo)) ! expr = ((RestrictInfo *) expr)->clause; /* Connect expressions with "AND" and parenthesize each condition. */ if (!is_first) diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index 6670df5..28a945f 100644 *** a/contrib/postgres_fdw/postgres_fdw.c --- b/contrib/postgres_fdw/postgres_fdw.c *************** enum FdwScanPrivateIndex *** 64,70 **** { /* SQL statement to execute remotely (as a String node) */ FdwScanPrivateSelectSql, ! /* List of restriction clauses that can be executed remotely */ FdwScanPrivateRemoteConds, /* Integer list of attribute numbers retrieved by the SELECT */ FdwScanPrivateRetrievedAttrs, --- 64,71 ---- { /* SQL statement to execute remotely (as a String node) */ FdwScanPrivateSelectSql, ! /* List of qual clauses that can be executed remotely */ ! /* (DO NOT try to use these at runtime; see postgresGetForeignPlan) */ FdwScanPrivateRemoteConds, /* Integer list of attribute numbers retrieved by the SELECT */ FdwScanPrivateRetrievedAttrs, *************** postgresGetForeignRelSize(PlannerInfo *r *** 576,582 **** &fpinfo->attrs_used); foreach(lc, fpinfo->local_conds) { ! RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc); pull_varattnos((Node *) rinfo->clause, baserel->relid, &fpinfo->attrs_used); --- 577,583 ---- &fpinfo->attrs_used); foreach(lc, fpinfo->local_conds) { ! RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc); pull_varattnos((Node *) rinfo->clause, baserel->relid, &fpinfo->attrs_used); *************** postgresGetForeignPlan(PlannerInfo *root *** 1119,1132 **** PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private; Index scan_relid; List *fdw_private; - List *remote_conds = NIL; List *remote_exprs = NIL; List *local_exprs = NIL; List *params_list = NIL; List *retrieved_attrs; StringInfoData sql; ListCell *lc; - List *fdw_scan_tlist = NIL; /* * For base relations, set scan_relid as the relid of the relation. For --- 1120,1133 ---- PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private; Index scan_relid; List *fdw_private; List *remote_exprs = NIL; List *local_exprs = NIL; List *params_list = NIL; + List *fdw_scan_tlist = NIL; + List *fdw_recheck_quals = NIL; List *retrieved_attrs; StringInfoData sql; ListCell *lc; /* * For base relations, set scan_relid as the relid of the relation. For *************** postgresGetForeignPlan(PlannerInfo *root *** 1163,1171 **** * * This code must match "extract_actual_clauses(scan_clauses, false)" * except for the additional decision about remote versus local execution. - * Note however that we don't strip the RestrictInfo nodes from the - * remote_conds list, since appendWhereClause expects a list of - * RestrictInfos. */ foreach(lc, scan_clauses) { --- 1164,1169 ---- *************** postgresGetForeignPlan(PlannerInfo *root *** 1176,1201 **** continue; if (list_member_ptr(fpinfo->remote_conds, rinfo)) - { - remote_conds = lappend(remote_conds, rinfo); remote_exprs = lappend(remote_exprs, rinfo->clause); - } else if (list_member_ptr(fpinfo->local_conds, rinfo)) local_exprs = lappend(local_exprs, rinfo->clause); else if (is_foreign_expr(root, foreignrel, rinfo->clause)) - { - remote_conds = lappend(remote_conds, rinfo); remote_exprs = lappend(remote_exprs, rinfo->clause); - } else local_exprs = lappend(local_exprs, rinfo->clause); } if (IS_JOIN_REL(foreignrel) || IS_UPPER_REL(foreignrel)) { ! /* For a join relation, get the conditions from fdw_private structure */ ! remote_conds = fpinfo->remote_conds; ! local_exprs = fpinfo->local_conds; /* Build the list of columns to be fetched from the foreign server. */ fdw_scan_tlist = build_tlist_to_deparse(foreignrel); --- 1174,1198 ---- continue; if (list_member_ptr(fpinfo->remote_conds, rinfo)) remote_exprs = lappend(remote_exprs, rinfo->clause); else if (list_member_ptr(fpinfo->local_conds, rinfo)) local_exprs = lappend(local_exprs, rinfo->clause); else if (is_foreign_expr(root, foreignrel, rinfo->clause)) remote_exprs = lappend(remote_exprs, rinfo->clause); else local_exprs = lappend(local_exprs, rinfo->clause); } if (IS_JOIN_REL(foreignrel) || IS_UPPER_REL(foreignrel)) { ! /* ! * For a join relation, get the conditions from fdw_private structure. ! * Note that we leave fdw_recheck_quals empty in this case, since ! * there is no need for EPQ recheck at a join (and Vars or Aggrefs in ! * the qual might not be available locally anyway). ! */ ! remote_exprs = extract_actual_clauses(fpinfo->remote_conds, false); ! local_exprs = extract_actual_clauses(fpinfo->local_conds, false); /* Build the list of columns to be fetched from the foreign server. */ fdw_scan_tlist = build_tlist_to_deparse(foreignrel); *************** postgresGetForeignPlan(PlannerInfo *root *** 1238,1243 **** --- 1235,1248 ---- } } } + else + { + /* + * For a base-relation scan, we have to support EPQ recheck, which + * should recheck all remote quals + */ + fdw_recheck_quals = remote_exprs; + } /* * Build the query string to be sent for execution, and identify *************** postgresGetForeignPlan(PlannerInfo *root *** 1245,1251 **** */ initStringInfo(&sql); deparseSelectStmtForRel(&sql, root, foreignrel, fdw_scan_tlist, ! remote_conds, best_path->path.pathkeys, false, &retrieved_attrs, ¶ms_list); /* --- 1250,1256 ---- */ initStringInfo(&sql); deparseSelectStmtForRel(&sql, root, foreignrel, fdw_scan_tlist, ! remote_exprs, best_path->path.pathkeys, false, &retrieved_attrs, ¶ms_list); /* *************** postgresGetForeignPlan(PlannerInfo *root *** 1253,1259 **** * Items in the list must match order in enum FdwScanPrivateIndex. */ fdw_private = list_make4(makeString(sql.data), ! remote_conds, retrieved_attrs, makeInteger(fpinfo->fetch_size)); if (IS_JOIN_REL(foreignrel) || IS_UPPER_REL(foreignrel)) --- 1258,1264 ---- * Items in the list must match order in enum FdwScanPrivateIndex. */ fdw_private = list_make4(makeString(sql.data), ! remote_exprs, retrieved_attrs, makeInteger(fpinfo->fetch_size)); if (IS_JOIN_REL(foreignrel) || IS_UPPER_REL(foreignrel)) *************** postgresGetForeignPlan(PlannerInfo *root *** 1266,1271 **** --- 1271,1283 ---- * Note that the remote parameter expressions are stored in the fdw_exprs * field of the finished plan node; we can't keep them in private state * because then they wouldn't be subject to later planner processing. + * + * We have some foreign qual conditions hidden away within fdw_private's + * FdwScanPrivateRemoteConds item, which would be unsafe per the above + * consideration. But those will only be used by postgresPlanDirectModify, + * which may extract them to use in a rewritten plan. We assume that + * nothing will be done between here and there that would need to modify + * those expressions. */ return make_foreignscan(tlist, local_exprs, *************** postgresGetForeignPlan(PlannerInfo *root *** 1273,1279 **** params_list, fdw_private, fdw_scan_tlist, ! remote_exprs, outer_plan); } --- 1285,1291 ---- params_list, fdw_private, fdw_scan_tlist, ! fdw_recheck_quals, outer_plan); } *************** postgresPlanDirectModify(PlannerInfo *ro *** 2199,2205 **** rel = heap_open(rte->relid, NoLock); /* ! * Extract the baserestrictinfo clauses that can be evaluated remotely. */ remote_conds = (List *) list_nth(fscan->fdw_private, FdwScanPrivateRemoteConds); --- 2211,2219 ---- rel = heap_open(rte->relid, NoLock); /* ! * Extract the qual clauses that can be evaluated remotely. (These are ! * bare clauses not RestrictInfos, but deparse.c's appendConditions() ! * doesn't care.) */ remote_conds = (List *) list_nth(fscan->fdw_private, FdwScanPrivateRemoteConds); *************** foreign_join_ok(PlannerInfo *root, RelOp *** 4094,4102 **** if (fpinfo_o->local_conds || fpinfo_i->local_conds) return false; ! /* Separate restrict list into join quals and quals on join relation */ if (IS_OUTER_JOIN(jointype)) ! extract_actual_join_clauses(extra->restrictlist, &joinclauses, &otherclauses); else { /* --- 4108,4128 ---- if (fpinfo_o->local_conds || fpinfo_i->local_conds) return false; ! /* Separate restrict list into join quals and pushed-down (other) quals */ if (IS_OUTER_JOIN(jointype)) ! { ! /* Grovel through the clauses to separate into two lists */ ! joinclauses = otherclauses = NIL; ! foreach(lc, extra->restrictlist) ! { ! RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc); ! ! if (rinfo->is_pushed_down) ! otherclauses = lappend(otherclauses, rinfo); ! else ! joinclauses = lappend(joinclauses, rinfo); ! } ! } else { /* *************** foreign_join_ok(PlannerInfo *root, RelOp *** 4106,4121 **** * a join down to the foreign server even if some of its join quals * are not safe to pushdown. */ ! otherclauses = extract_actual_clauses(extra->restrictlist, false); joinclauses = NIL; } /* Join quals must be safe to push down. */ foreach(lc, joinclauses) { ! Expr *expr = (Expr *) lfirst(lc); ! if (!is_foreign_expr(root, joinrel, expr)) return false; } --- 4132,4147 ---- * a join down to the foreign server even if some of its join quals * are not safe to pushdown. */ ! otherclauses = extra->restrictlist; joinclauses = NIL; } /* Join quals must be safe to push down. */ foreach(lc, joinclauses) { ! RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc); ! if (!is_foreign_expr(root, joinrel, rinfo->clause)) return false; } *************** foreign_join_ok(PlannerInfo *root, RelOp *** 4152,4163 **** */ foreach(lc, otherclauses) { ! Expr *expr = (Expr *) lfirst(lc); ! if (!is_foreign_expr(root, joinrel, expr)) ! fpinfo->local_conds = lappend(fpinfo->local_conds, expr); else ! fpinfo->remote_conds = lappend(fpinfo->remote_conds, expr); } fpinfo->outerrel = outerrel; --- 4178,4189 ---- */ foreach(lc, otherclauses) { ! RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc); ! if (is_foreign_expr(root, joinrel, rinfo->clause)) ! fpinfo->remote_conds = lappend(fpinfo->remote_conds, rinfo); else ! fpinfo->local_conds = lappend(fpinfo->local_conds, rinfo); } fpinfo->outerrel = outerrel; *************** foreign_grouping_ok(PlannerInfo *root, R *** 4631,4641 **** foreach(lc, (List *) query->havingQual) { Expr *expr = (Expr *) lfirst(lc); ! if (!is_foreign_expr(root, grouped_rel, expr)) ! fpinfo->local_conds = lappend(fpinfo->local_conds, expr); else ! fpinfo->remote_conds = lappend(fpinfo->remote_conds, expr); } } --- 4657,4681 ---- foreach(lc, (List *) query->havingQual) { Expr *expr = (Expr *) lfirst(lc); + RestrictInfo *rinfo; ! /* ! * Currently, the core code doesn't wrap havingQuals in ! * RestrictInfos, so we must make our own. ! */ ! Assert(!IsA(expr, RestrictInfo)); ! rinfo = make_restrictinfo(expr, ! true, ! false, ! false, ! root->qual_security_level, ! grouped_rel->relids, ! NULL, ! NULL); ! if (is_foreign_expr(root, grouped_rel, expr)) ! fpinfo->remote_conds = lappend(fpinfo->remote_conds, rinfo); else ! fpinfo->local_conds = lappend(fpinfo->local_conds, rinfo); } } *************** foreign_grouping_ok(PlannerInfo *root, R *** 4645,4653 **** */ if (fpinfo->local_conds) { ListCell *lc; ! List *aggvars = pull_var_clause((Node *) fpinfo->local_conds, ! PVC_INCLUDE_AGGREGATES); foreach(lc, aggvars) { --- 4685,4701 ---- */ if (fpinfo->local_conds) { + List *aggvars = NIL; ListCell *lc; ! ! foreach(lc, fpinfo->local_conds) ! { ! RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc); ! ! aggvars = list_concat(aggvars, ! pull_var_clause((Node *) rinfo->clause, ! PVC_INCLUDE_AGGREGATES)); ! } foreach(lc, aggvars) { *************** foreign_grouping_ok(PlannerInfo *root, R *** 4664,4670 **** if (!is_foreign_expr(root, grouped_rel, expr)) return false; ! tlist = add_to_flat_tlist(tlist, aggvars); } } } --- 4712,4718 ---- if (!is_foreign_expr(root, grouped_rel, expr)) return false; ! tlist = add_to_flat_tlist(tlist, list_make1(expr)); } } } diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h index 57dbb79..df75518 100644 *** a/contrib/postgres_fdw/postgres_fdw.h --- b/contrib/postgres_fdw/postgres_fdw.h *************** typedef struct PgFdwRelationInfo *** 34,49 **** /* * Restriction clauses, divided into safe and unsafe to pushdown subsets. ! * ! * For a base foreign relation this is a list of clauses along-with ! * RestrictInfo wrapper. Keeping RestrictInfo wrapper helps while dividing ! * scan_clauses in postgresGetForeignPlan into safe and unsafe subsets. ! * Also it helps in estimating costs since RestrictInfo caches the ! * selectivity and qual cost for the clause in it. ! * ! * For a join relation, however, they are part of otherclause list ! * obtained from extract_actual_join_clauses, which strips RestrictInfo ! * construct. So, for a join relation they are list of bare clauses. */ List *remote_conds; List *local_conds; --- 34,41 ---- /* * Restriction clauses, divided into safe and unsafe to pushdown subsets. ! * All entries in these lists should have RestrictInfo wrappers; that ! * improves efficiency of selectivity and cost estimation. */ List *remote_conds; List *local_conds; *************** typedef struct PgFdwRelationInfo *** 91,97 **** RelOptInfo *outerrel; RelOptInfo *innerrel; JoinType jointype; ! List *joinclauses; /* Grouping information */ List *grouped_tlist; --- 83,89 ---- RelOptInfo *outerrel; RelOptInfo *innerrel; JoinType jointype; ! List *joinclauses; /* List of RestrictInfo */ /* Grouping information */ List *grouped_tlist; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
I wrote: > (BTW, I've not yet looked to see if this needs to be back-ported.) postgres_fdw will definitely include RestrictInfos in its fdw_private list in 9.6. However, I've been unable to provoke a visible failure. After some rooting around, the reason seems to be that: 1. postgres_fdw doesn't mark its plans as parallel-safe --- it doesn't even have a IsForeignScanParallelSafe method. So you'd think that one of its plans could never be transmitted to a parallel worker ... but: 2. Andreas' test query doesn't have the foreign scan in the main query tree, it's in an InitPlan. 9.6 did not transmit any subplans or initplans to parallel workers, but HEAD does. So HEAD sends the illegal structure to the worker which spits up on trying to read a RestrictInfo. I think the fact that we see this problem at all may indicate an oversight in commit 5e6d8d2bbbcace304450b309e79366c0da4063e4 ("Allow parallel workers to execute subplans"). If the worker were to actually run the initplan, bad things would happen (the worker would create its own remote connection, which we don't want). Now, in the problem plan the InitPlan is actually attached to the topmost Gather, which I think is safe because it'll be run by the master, but I wonder if we're being careful enough about non-parallel-safe plans for initplans/subplans. I do not think the planner tracks the locations of references to initplan outputs carefully enough to be very sure about what it's doing here. Also, even if the worker never actually executes the plan node, just doing ExecInitNode on it in a parallel worker might be more than a non-parallel-safe FDW is prepared to cope with. Anyway, at present it doesn't look like we need this patch in 9.6. regards, tom lane
On Tue, Apr 11, 2017 at 5:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> (BTW, I've not yet looked to see if this needs to be back-ported.) > > postgres_fdw will definitely include RestrictInfos in its fdw_private > list in 9.6. However, I've been unable to provoke a visible failure. > After some rooting around, the reason seems to be that: > > 1. postgres_fdw doesn't mark its plans as parallel-safe --- it doesn't > even have a IsForeignScanParallelSafe method. So you'd think that one > of its plans could never be transmitted to a parallel worker ... but: > > 2. Andreas' test query doesn't have the foreign scan in the main query > tree, it's in an InitPlan. 9.6 did not transmit any subplans or initplans > to parallel workers, but HEAD does. So HEAD sends the illegal structure > to the worker which spits up on trying to read a RestrictInfo. > > I think the fact that we see this problem at all may indicate an > oversight in commit 5e6d8d2bbbcace304450b309e79366c0da4063e4 ("Allow > parallel workers to execute subplans"). If the worker were to actually > run the initplan, bad things would happen (the worker would create its > own remote connection, which we don't want). Now, in the problem plan > the InitPlan is actually attached to the topmost Gather, which I think > is safe because it'll be run by the master, but I wonder if we're being > careful enough about non-parallel-safe plans for initplans/subplans. > Initplans are never marked parallel safe, only subplans that are generated for parallel safe paths are marked as parallel safe. > Also, even if the worker never actually executes the plan node, just > doing ExecInitNode on it in a parallel worker might be more than a > non-parallel-safe FDW is prepared to cope with. > I think there is a possibility of doing ExecInitNode in a parallel worker for a parallel-unsafe subplan, because we pass a list of all the sublans stored in planned statement. However, the worker will never execute such a plan as we don't generate a plan where unsafe sublan/initplan is referenced in the node passed to the worker. If we want to avoid passing parallel-unsafe subplans to workers, then I think we can maintain a list of parallel safe subplans along with subplans in PlannerGlobal and PlannedStmt or maybe keep a parallel safe flag in Plan so that we can pass only parallel safe plans to workers. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] [sqlsmith] ERROR: badly formatted node string "RESTRICTINFO...
From
Ashutosh Bapat
Date:
On Tue, Apr 11, 2017 at 3:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> Apparently, postgres_fdw is trying to store RestrictInfos in the >> fdw_private field of a ForeignScan node. That won't do; those aren't >> supposed to be present in a finished plan tree, so there's no readfuncs.c >> support for them (and we're not adding it). > >> Don't know if this is a new bug, or ancient but not previously reachable. >> It seems to be nearly the inverse of the problem you found yesterday, >> in which postgres_fdw was stripping RestrictInfos sooner than it really >> ought to. Apparently that whole business needs a fresh look. > > Attached is a proposed patch that cleans up the mess here --- and it was > a mess. The comments for PgFdwRelationInfo claimed that the remote_conds > and local_conds fields contained bare clauses, but actually they could > contain either bare clauses or RestrictInfos or both, depending on where > the clauses had come from. And there was some seriously obscure and > undocumented logic around how the fdw_recheck_quals got set up for a > foreign join node. (BTW, said logic is assuming that no EPQ recheck is > needed for a foreign join. I commented it to that effect, but I'm not > very sure that I believe it. If there's a bug there, though, it's > independent of the immediate problem.) > > Anybody want to review this, or shall I just push it? > Will be reviewing it in a couple of hours. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] [sqlsmith] ERROR: badly formatted node string "RESTRICTINFO...
From
Ashutosh Bapat
Date:
On Tue, Apr 11, 2017 at 3:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> Apparently, postgres_fdw is trying to store RestrictInfos in the >> fdw_private field of a ForeignScan node. That won't do; those aren't >> supposed to be present in a finished plan tree, so there's no readfuncs.c >> support for them (and we're not adding it). > >> Don't know if this is a new bug, or ancient but not previously reachable. >> It seems to be nearly the inverse of the problem you found yesterday, >> in which postgres_fdw was stripping RestrictInfos sooner than it really >> ought to. Apparently that whole business needs a fresh look. > > Attached is a proposed patch that cleans up the mess here --- and it was > a mess. The comments for PgFdwRelationInfo claimed that the remote_conds > and local_conds fields contained bare clauses, but actually they could > contain either bare clauses or RestrictInfos or both, depending on where > the clauses had come from. And there was some seriously obscure and > undocumented logic around how the fdw_recheck_quals got set up for a > foreign join node. (BTW, said logic is assuming that no EPQ recheck is > needed for a foreign join. I commented it to that effect, but I'm not > very sure that I believe it. If there's a bug there, though, it's > independent of the immediate problem.) > > Anybody want to review this, or shall I just push it? + * there is no need for EPQ recheck at a join (and Vars or Aggrefs in + * the qual might not be available locally anyway). I am not sure whether EPQ checks make sense for an upper relation, esp. a grouped relation. So mentioning Aggref can be confusing here. For joins, we execute EPQ by executing the (so called) outer plan created from fdw_outerpath. For this, we fetch whole-row references for the joining relations and build the output row by executing the local outer plan attached to the ForeignScanPlan. This whole-row references has values for all Vars, so even though Vars are not available, corresponding column values are available. So mentioning Vars is also confusing here. Attached patch has those comments modified, please check if that looks ok. - extract_actual_join_clauses(extra->restrictlist, &joinclauses, &otherclauses); + { + /* Grovel through the clauses to separate into two lists */ + joinclauses = otherclauses = NIL; + foreach(lc, extra->restrictlist) + { + RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc); + + if (rinfo->is_pushed_down) + otherclauses = lappend(otherclauses, rinfo); + else + joinclauses = lappend(joinclauses, rinfo); + } + } We are duplicating the logic to separate join and other clauses here. But then we are already using is_pushed_down to separate the clauses at various places e.g. compute_semi_anti_join_factors(), so probably this change is fine. We can club the code to separate other and join clauses, checking that all join clauses are shippable and separating other clauses into local and remote clauses in a single list traversal as done in the attached patch. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes: > On Tue, Apr 11, 2017 at 3:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > + * there is no need for EPQ recheck at a join (and Vars or Aggrefs in > + * the qual might not be available locally anyway). > I am not sure whether EPQ checks make sense for an upper relation, esp. a > grouped relation. So mentioning Aggref can be confusing here. For joins, we > execute EPQ by executing the (so called) outer plan created from fdw_outerpath. > For this, we fetch whole-row references for the joining relations and build the > output row by executing the local outer plan attached to the ForeignScanPlan. > This whole-row references has values for all Vars, so even though Vars are not > available, corresponding column values are available. So mentioning Vars is > also confusing here. Well, my first attempt at fixing this merged remote_conds and remote_exprs together, which in the previous version of the code resulted in always passing the remote conditions as fdw_recheck_quals too. And what happened was that I got "variable not found in subplan target list" errors for Vars used inside Aggrefs in pushed-to-the-remote HAVING clauses. Which is unsurprising -- it'd be impossible to return such a Var if the grouping is being done remotely. So I think it's important for this comment to explain that we *can't* put upperrel quals into fdw_recheck_quals, not just that "there's no need to". But pointing out that at a join, there's a different mechanism that's responsible for EPQ checks is good. I'll reword this again. > We can club the code to separate other and join clauses, checking that all join > clauses are shippable and separating other clauses into local and remote > clauses in a single list traversal as done in the attached patch. OK. I was trying to be noninvasive, but this does look more sensible. I think this might read better if we inverted the test (so that the outer-join-joinclause-must-be-pushable case would come first). If we're going for a more-than-minimal patch, I'm inclined to also move the first loop in postgresGetForeignPlan into the "else" branch, so that it doesn't get executed in the join/upperrel case. I realize that it's going to iterate zero times in that case, but it's just confusing to have it there when we don't expect it to do anything and we would only throw away the results if it did do anything. regards, tom lane
Andreas Seltenreich <seltenreich@gmx.de> writes: > I see the above ERROR logged a lot when testing master at eef8c0069e > with a postgres_fdw around. Below is a recipe to reproduce it on top of > the regression DB. I've pushed a fix that should get you past that problem, although I suspect we still have some issues with treatment of parallel-unsafe subplans. regards, tom lane
Re: [HACKERS] [sqlsmith] ERROR: badly formatted node string "RESTRICTINFO...
From
Ashutosh Bapat
Date:
On Tue, Apr 11, 2017 at 8:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes: >> On Tue, Apr 11, 2017 at 3:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> + * there is no need for EPQ recheck at a join (and Vars or Aggrefs in >> + * the qual might not be available locally anyway). > >> I am not sure whether EPQ checks make sense for an upper relation, esp. a >> grouped relation. So mentioning Aggref can be confusing here. For joins, we >> execute EPQ by executing the (so called) outer plan created from fdw_outerpath. >> For this, we fetch whole-row references for the joining relations and build the >> output row by executing the local outer plan attached to the ForeignScanPlan. >> This whole-row references has values for all Vars, so even though Vars are not >> available, corresponding column values are available. So mentioning Vars is >> also confusing here. > > Well, my first attempt at fixing this merged remote_conds and remote_exprs > together, which in the previous version of the code resulted in always > passing the remote conditions as fdw_recheck_quals too. And what happened > was that I got "variable not found in subplan target list" errors for Vars > used inside Aggrefs in pushed-to-the-remote HAVING clauses. Which is > unsurprising -- it'd be impossible to return such a Var if the grouping is > being done remotely. So I think it's important for this comment to > explain that we *can't* put upperrel quals into fdw_recheck_quals, not just > that "there's no need to". The comments in the committed versions are good. > But pointing out that at a join, there's a > different mechanism that's responsible for EPQ checks is good. I'll > reword this again. > >> We can club the code to separate other and join clauses, checking that all join >> clauses are shippable and separating other clauses into local and remote >> clauses in a single list traversal as done in the attached patch. > > OK. I was trying to be noninvasive, but this does look more sensible. > I think this might read better if we inverted the test (so that > the outer-join-joinclause-must-be-pushable case would come first). Ok. In fact, thinking more about it, we should probably test JOIN_INNER explicitly instead of !IS_OUTER_JOIN() since SEMI_JOINS are not considered as outer joins and I am not sure how would we be treating the quals for those. > > If we're going for a more-than-minimal patch, I'm inclined to also > move the first loop in postgresGetForeignPlan into the "else" branch, > so that it doesn't get executed in the join/upperrel case. I realize > that it's going to iterate zero times in that case, but it's just > confusing to have it there when we don't expect it to do anything > and we would only throw away the results if it did do anything. > Committed version looks quite clear. I noticed that you committed the patch, while I was writing this mail. Sending it anyway. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes: > On Tue, Apr 11, 2017 at 8:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> OK. I was trying to be noninvasive, but this does look more sensible. >> I think this might read better if we inverted the test (so that >> the outer-join-joinclause-must-be-pushable case would come first). > Ok. In fact, thinking more about it, we should probably test > JOIN_INNER explicitly instead of !IS_OUTER_JOIN() since SEMI_JOINS are > not considered as outer joins and I am not sure how would we be > treating the quals for those. No, that's correct as-is --- or at least, if it's not correct, there are a bunch of other places that are also not correct. Thinking about this further, though, it seems like a more straightforward solution to the original problem is to not store quals in a Plan's fdw_private list in the first place. Putting them there is at best useless overhead that the finished plan will have to drag around; and it's not terribly hard to imagine future changes that would make having never-processed-by-setrefs.c qual trees in a plan be broken in other ways. Can't we use a field in the rel's PgFdwRelationInfo to carry the remote_exprs forward from postgresGetForeignPlan to postgresPlanDirectModify? regards, tom lane
Amit Kapila <amit.kapila16@gmail.com> writes: > On Tue, Apr 11, 2017 at 5:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think the fact that we see this problem at all may indicate an >> oversight in commit 5e6d8d2bbbcace304450b309e79366c0da4063e4 ("Allow >> parallel workers to execute subplans"). If the worker were to actually >> run the initplan, bad things would happen (the worker would create its >> own remote connection, which we don't want). Now, in the problem plan >> the InitPlan is actually attached to the topmost Gather, which I think >> is safe because it'll be run by the master, but I wonder if we're being >> careful enough about non-parallel-safe plans for initplans/subplans. > Initplans are never marked parallel safe, only subplans that are > generated for parallel safe paths are marked as parallel safe. OK ... >> Also, even if the worker never actually executes the plan node, just >> doing ExecInitNode on it in a parallel worker might be more than a >> non-parallel-safe FDW is prepared to cope with. > I think there is a possibility of doing ExecInitNode in a parallel > worker for a parallel-unsafe subplan, because we pass a list of all > the sublans stored in planned statement. It's more than a possibility. If you run Andreas' test case against HEAD, now that the can't-transmit-RestrictInfo failure is fixed, you will find that the parallel worker actually creates and immediately destroys an FDW remote connection. (The easiest way to prove that is to turn on log_connections/log_disconnections. BTW, is there any convenient way to attach a debugger to a parallel worker process as it's being launched? I couldn't manage to catch the worker in the act.) So I think this is clearly a Bad Thing and we'd better do something about it. The consequences for postgres_fdw aren't so awful perhaps, but other non-parallel-safe FDWs might have bigger problems with this behavior. > However, the worker will > never execute such a plan as we don't generate a plan where unsafe > sublan/initplan is referenced in the node passed to the worker. If we > want to avoid passing parallel-unsafe subplans to workers, then I > think we can maintain a list of parallel safe subplans along with > subplans in PlannerGlobal and PlannedStmt or maybe keep a parallel > safe flag in Plan so that we can pass only parallel safe plans to > workers. Right, we could, say, leave a hole in the subplan list corresponding to any subplan that's not parallel-safe. That seems like a good idea anyway because right now there's clearly no cross-check preventing a worker from trying to run such a subplan. Anyone want to draft a patch for this? regards, tom lane
On Wed, Apr 12, 2017 at 12:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Kapila <amit.kapila16@gmail.com> writes: > >> I think there is a possibility of doing ExecInitNode in a parallel >> worker for a parallel-unsafe subplan, because we pass a list of all >> the sublans stored in planned statement. > > It's more than a possibility. If you run Andreas' test case against > HEAD, now that the can't-transmit-RestrictInfo failure is fixed, > you will find that the parallel worker actually creates and immediately > destroys an FDW remote connection. (The easiest way to prove that is > to turn on log_connections/log_disconnections. BTW, is there any > convenient way to attach a debugger to a parallel worker process as it's > being launched? > What I generally do to debug parallel worker case is to add a "while (1) {}" kind of loop in the beginning of ParallelQueryMain() or in ParallelWorkerMain() depending on area I want to debug, like in this case it would be okay to add such a loop in ParallelQueryMain(). Then as the worker will be waiting there attach the debugger to that process and resume debugging. It is easier to identify the worker process if we add an infinite loop, otherwise one can use sleep or some other form of wait or debug break mechanism as well. > I couldn't manage to catch the worker in the act.) > > So I think this is clearly a Bad Thing and we'd better do something > about it. The consequences for postgres_fdw aren't so awful perhaps, > but other non-parallel-safe FDWs might have bigger problems with this > behavior. > >> However, the worker will >> never execute such a plan as we don't generate a plan where unsafe >> sublan/initplan is referenced in the node passed to the worker. If we >> want to avoid passing parallel-unsafe subplans to workers, then I >> think we can maintain a list of parallel safe subplans along with >> subplans in PlannerGlobal and PlannedStmt or maybe keep a parallel >> safe flag in Plan so that we can pass only parallel safe plans to >> workers. > > Right, we could, say, leave a hole in the subplan list corresponding > to any subplan that's not parallel-safe. That seems like a good idea > anyway because right now there's clearly no cross-check preventing > a worker from trying to run such a subplan. > > Anyone want to draft a patch for this? > Yes, I will work on it in this week and possibly today or tomorrow and either produce a patch or if I face any problems, then will update about them here. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] [sqlsmith] ERROR: badly formatted node string "RESTRICTINFO...
From
Ashutosh Bapat
Date:
On Tue, Apr 11, 2017 at 10:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes: >> On Tue, Apr 11, 2017 at 8:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> OK. I was trying to be noninvasive, but this does look more sensible. >>> I think this might read better if we inverted the test (so that >>> the outer-join-joinclause-must-be-pushable case would come first). > >> Ok. In fact, thinking more about it, we should probably test >> JOIN_INNER explicitly instead of !IS_OUTER_JOIN() since SEMI_JOINS are >> not considered as outer joins and I am not sure how would we be >> treating the quals for those. > > No, that's correct as-is --- or at least, if it's not correct, there > are a bunch of other places that are also not correct. Hmm, when we support SEMI and ANTI join push-down, we will have a bunch of other places to change. This is one of them. > > Thinking about this further, though, it seems like a more straightforward > solution to the original problem is to not store quals in a Plan's > fdw_private list in the first place. Putting them there is at best > useless overhead that the finished plan will have to drag around; > and it's not terribly hard to imagine future changes that would make > having never-processed-by-setrefs.c qual trees in a plan be broken in > other ways. Can't we use a field in the rel's PgFdwRelationInfo to > carry the remote_exprs forward from postgresGetForeignPlan to > postgresPlanDirectModify? > I was thinking of using fdw_recheck_quals by renaming it. We don't push DML with joins down to the foreign server. So, it's ok to set fdw_recheck_quals (or whatever name we choose) to be NIL in join and upper relation case as we do today, without changing anything else. When we support DMLs with join pushdown, we will have to use subquery for the scan and thus will not require fdw_recheck_quals to be set. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On Wed, Apr 12, 2017 at 12:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Kapila <amit.kapila16@gmail.com> writes: > >> However, the worker will >> never execute such a plan as we don't generate a plan where unsafe >> sublan/initplan is referenced in the node passed to the worker. If we >> want to avoid passing parallel-unsafe subplans to workers, then I >> think we can maintain a list of parallel safe subplans along with >> subplans in PlannerGlobal and PlannedStmt or maybe keep a parallel >> safe flag in Plan so that we can pass only parallel safe plans to >> workers. > > Right, we could, say, leave a hole in the subplan list corresponding > to any subplan that's not parallel-safe. That seems like a good idea > anyway because right now there's clearly no cross-check preventing > a worker from trying to run such a subplan. > > Anyone want to draft a patch for this? > Please find patch attached based on above discussion. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Amit Kapila <amit.kapila16@gmail.com> writes: > On Wed, Apr 12, 2017 at 12:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Anyone want to draft a patch for this? > Please find patch attached based on above discussion. Thanks, I'll look at this later today. regards, tom lane
Amit Kapila <amit.kapila16@gmail.com> writes: > On Wed, Apr 12, 2017 at 12:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Anyone want to draft a patch for this? > Please find patch attached based on above discussion. This patch seems fairly incomplete: you can't just whack around major data structures like PlannedStmt and PlannerGlobal without doing the janitorial work of cleaning up support logic such as outfuncs/readfuncs. Also, when you think about what will happen when doing a copyObject() on a completed plan, it seems like a pretty bad idea to link subplans into two independent lists. We'll end up with two separate copies of those subtrees in places like the plan cache. I'm inclined to think the other approach of adding a parallel_safe flag to struct Plan is a better answer in the long run. It's basically free to have it there because of alignment considerations, and it's hard to believe that we're not going to need that labeling at some point in the future anyway. I had been a bit concerned about having to have some magic in outfuncs and/or readfuncs to avoid transferring the unsafe subplan(s), but I see from your patch there's a better way: we can have ExecSerializePlan modify the subplan list as it transfers it into its private PlannedStmt struct. But I think iterating over the list and examining each subplan's parallel_safe marking is a better way to do that. Will work on this approach. regards, tom lane
On Wed, Apr 12, 2017 at 10:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Kapila <amit.kapila16@gmail.com> writes: >> On Wed, Apr 12, 2017 at 12:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Anyone want to draft a patch for this? > >> Please find patch attached based on above discussion. > > This patch seems fairly incomplete: you can't just whack around major data > structures like PlannedStmt and PlannerGlobal without doing the janitorial > work of cleaning up support logic such as outfuncs/readfuncs. > Oops, missed it. > Also, when you think about what will happen when doing a copyObject() > on a completed plan, it seems like a pretty bad idea to link subplans > into two independent lists. We'll end up with two separate copies of > those subtrees in places like the plan cache. > > I'm inclined to think the other approach of adding a parallel_safe > flag to struct Plan is a better answer in the long run. It's basically > free to have it there because of alignment considerations, and it's > hard to believe that we're not going to need that labeling at some > point in the future anyway. > > I had been a bit concerned about having to have some magic in outfuncs > and/or readfuncs to avoid transferring the unsafe subplan(s), but I see > from your patch there's a better way: we can have ExecSerializePlan modify > the subplan list as it transfers it into its private PlannedStmt struct. > But I think iterating over the list and examining each subplan's > parallel_safe marking is a better way to do that. > > Will work on this approach. > Thanks, I see that you have committed patch on those lines. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com