Thread: Bogus lateral-reference-propagation logic in create_lateral_join_info

Bogus lateral-reference-propagation logic in create_lateral_join_info

From
Tom Lane
Date:
While poking at bug #15613 (in which FDWs are failing to mark created
Paths with correct outer-reference sets), I thought it'd be a good idea
to add some asserts to Path creation saying that every Path should be
parameterized by at least whatever the relation's required LATERAL
references are.  I did that as per the added assertions in relnode.c
below.  I didn't expect any failures in the existing regression tests,
since we know those don't exercise bug #15613, but darn if the addition
to get_appendrel_parampathinfo didn't blow up in the core tests.

A bit of excavation later, it turns out that this is a bug since day 1
in create_lateral_join_info.  It needs to propagate lateral_relids
and related fields from appendrel parents to children, but it was not,
in the original code, accounting for the possibility of grandchildren.
Those need to get the lateral fields propagated from their topmost
ancestor too, but they weren't.  This leads to having an intermediate
appendrel that is marked with some lateral_relids when its children
are not, causing add_paths_to_append_rel to believe that unparameterized
paths can be built, triggering the new assertion.  I'm not sure if there
are any worse consequences; the regression test case that's triggering
the Assert seems to work otherwise, so we might be accidentally failing
to fail.  But it's not supposed to be like that.

Commit 0a480502b hacked this code up to deal with grandchildren for
the case of partitioned tables, but the regression test that's falling
over involves nested UNION ALL subqueries, which are not that.  Rather
than add another RTE-kind exception, though, I think we ought to rewrite
it completely and get rid of the nested loops in favor of one traversal
of the append_rel_list.  This does require assuming that the
append_rel_list has ancestor entries before descendant entries, but
that's okay because of the way the list is built.  (I note that 0a480502b
is effectively assuming that ancestors appear before children in the RTE
list, which is no safer an assumption.)

Also, I'd really like to know why I had to put in the exception seen
in the loop for AppendRelInfos that do not point to a valid parent.
It seems to me that that is almost certainly working around a bug in
the partitioning logic.  (Without it, the partition_prune regression test
crashes.)  Or would somebody like to own up to having created that state
of affairs intentionally?  If so why?

Anyway, I propose to commit and back-patch the initsplan.c part of the
attached.  The added asserts in relnode.c should probably not go in
until we have a bug #15613 fix that will prevent postgres_fdw from
triggering them, so I'll do that later.

            regards, tom lane

diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index d6ffa78..22a6da3 100644
*** a/src/backend/optimizer/plan/initsplan.c
--- b/src/backend/optimizer/plan/initsplan.c
*************** void
*** 419,424 ****
--- 419,425 ----
  create_lateral_join_info(PlannerInfo *root)
  {
      bool        found_laterals = false;
+     Relids        prev_parents PG_USED_FOR_ASSERTS_ONLY = NULL;
      Index        rti;
      ListCell   *lc;

*************** create_lateral_join_info(PlannerInfo *ro
*** 627,679 ****
       * every child anyway, and there's no value in forcing extra
       * reparameterize_path() calls.  Similarly, a lateral reference to the
       * parent prevents use of otherwise-movable join rels for each child.
       */
!     for (rti = 1; rti < root->simple_rel_array_size; rti++)
      {
!         RelOptInfo *brel = root->simple_rel_array[rti];
!         RangeTblEntry *brte = root->simple_rte_array[rti];
!
!         /*
!          * Skip empty slots. Also skip non-simple relations i.e. dead
!          * relations.
!          */
!         if (brel == NULL || !IS_SIMPLE_REL(brel))
!             continue;

          /*
!          * In the case of table inheritance, the parent RTE is directly linked
!          * to every child table via an AppendRelInfo.  In the case of table
!          * partitioning, the inheritance hierarchy is expanded one level at a
!          * time rather than flattened.  Therefore, an other member rel that is
!          * a partitioned table may have children of its own, and must
!          * therefore be marked with the appropriate lateral info so that those
!          * children eventually get marked also.
           */
!         Assert(brte);
!         if (brel->reloptkind == RELOPT_OTHER_MEMBER_REL &&
!             (brte->rtekind != RTE_RELATION ||
!              brte->relkind != RELKIND_PARTITIONED_TABLE))
              continue;

!         if (brte->inh)
!         {
!             foreach(lc, root->append_rel_list)
!             {
!                 AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
!                 RelOptInfo *childrel;

!                 if (appinfo->parent_relid != rti)
!                     continue;
!                 childrel = root->simple_rel_array[appinfo->child_relid];
!                 Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
!                 Assert(childrel->direct_lateral_relids == NULL);
!                 childrel->direct_lateral_relids = brel->direct_lateral_relids;
!                 Assert(childrel->lateral_relids == NULL);
!                 childrel->lateral_relids = brel->lateral_relids;
!                 Assert(childrel->lateral_referencers == NULL);
!                 childrel->lateral_referencers = brel->lateral_referencers;
!             }
!         }
      }
  }

--- 628,667 ----
       * every child anyway, and there's no value in forcing extra
       * reparameterize_path() calls.  Similarly, a lateral reference to the
       * parent prevents use of otherwise-movable join rels for each child.
+      *
+      * It's possible for child rels to have their own children, in which case
+      * the topmost parent's lateral info must be propagated all the way down.
+      * This code handles that case correctly so long as append_rel_list has
+      * entries for child relationships before grandchild relationships, which
+      * is an okay assumption right now, but we'll need to be careful to
+      * preserve it.  The assertions below check for incorrect ordering.
       */
!     foreach(lc, root->append_rel_list)
      {
!         AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
!         RelOptInfo *parentrel = root->simple_rel_array[appinfo->parent_relid];
!         RelOptInfo *childrel = root->simple_rel_array[appinfo->child_relid];

          /*
!          * Apparently append_rel_list can contain bogus parent rels nowadays
           */
!         if (parentrel == NULL)
              continue;

!         /* Verify that children are processed before grandchildren */
! #ifdef USE_ASSERT_CHECKING
!         prev_parents = bms_add_member(prev_parents, appinfo->parent_relid);
!         Assert(!bms_is_member(appinfo->child_relid, prev_parents));
! #endif

!         /* OK, propagate info down */
!         Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
!         Assert(childrel->direct_lateral_relids == NULL);
!         childrel->direct_lateral_relids = parentrel->direct_lateral_relids;
!         Assert(childrel->lateral_relids == NULL);
!         childrel->lateral_relids = parentrel->lateral_relids;
!         Assert(childrel->lateral_referencers == NULL);
!         childrel->lateral_referencers = parentrel->lateral_referencers;
      }
  }

diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index f04c6b7..4130514 100644
*** a/src/backend/optimizer/util/relnode.c
--- b/src/backend/optimizer/util/relnode.c
*************** get_baserel_parampathinfo(PlannerInfo *r
*** 1225,1230 ****
--- 1225,1233 ----
      double        rows;
      ListCell   *lc;

+     /* If rel has LATERAL refs, every path for it should account for them */
+     Assert(bms_is_subset(baserel->lateral_relids, required_outer));
+
      /* Unparameterized paths have no ParamPathInfo */
      if (bms_is_empty(required_outer))
          return NULL;
*************** get_joinrel_parampathinfo(PlannerInfo *r
*** 1320,1325 ****
--- 1323,1331 ----
      double        rows;
      ListCell   *lc;

+     /* If rel has LATERAL refs, every path for it should account for them */
+     Assert(bms_is_subset(joinrel->lateral_relids, required_outer));
+
      /* Unparameterized paths have no ParamPathInfo or extra join clauses */
      if (bms_is_empty(required_outer))
          return NULL;
*************** get_appendrel_parampathinfo(RelOptInfo *
*** 1511,1516 ****
--- 1517,1525 ----
  {
      ParamPathInfo *ppi;

+     /* If rel has LATERAL refs, every path for it should account for them */
+     Assert(bms_is_subset(appendrel->lateral_relids, required_outer));
+
      /* Unparameterized paths have no ParamPathInfo */
      if (bms_is_empty(required_outer))
          return NULL;

Re: Bogus lateral-reference-propagation logic in create_lateral_join_info

From
David Rowley
Date:
On Wed, 6 Feb 2019 at 10:57, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Also, I'd really like to know why I had to put in the exception seen
> in the loop for AppendRelInfos that do not point to a valid parent.
> It seems to me that that is almost certainly working around a bug in
> the partitioning logic.  (Without it, the partition_prune regression test
> crashes.)  Or would somebody like to own up to having created that state
> of affairs intentionally?  If so why?

Sounds pretty strange to me.   What exactly do you mean by not
pointing to a valid parent? Do you mean the parent_relid index is not
a valid RelOptInfo?

Can you point to the regression test that's doing this?

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Bogus lateral-reference-propagation logic in create_lateral_join_info

From
Tom Lane
Date:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On Wed, 6 Feb 2019 at 10:57, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Also, I'd really like to know why I had to put in the exception seen
>> in the loop for AppendRelInfos that do not point to a valid parent.
>> It seems to me that that is almost certainly working around a bug in
>> the partitioning logic.  (Without it, the partition_prune regression test
>> crashes.)  Or would somebody like to own up to having created that state
>> of affairs intentionally?  If so why?

> Sounds pretty strange to me.   What exactly do you mean by not
> pointing to a valid parent? Do you mean the parent_relid index is not
> a valid RelOptInfo?

Exactly --- the parent_relid index does not have a RelOptInfo in
simple_rel_array[].

> Can you point to the regression test that's doing this?

Yeah, I misspoke above, it's in partition_join not partition_prune,
specifically

DELETE FROM prt1_l
WHERE EXISTS (
  SELECT 1
    FROM int4_tbl,
         LATERAL (SELECT int4_tbl.f1 FROM int8_tbl LIMIT 2) ss
    WHERE prt1_l.c IS NULL);

I didn't run this totally to bottom yet, but what seems to be
happening is that inheritance_planner is creating a partition-specific
subplan for the DELETE, and it's allowing AppendRelInfos from the
parent query to propagate into the subquery even though they have
nothing to do with that subquery.

We could just decide that it's okay for code dealing with the subquery
to ignore the irrelevant AppendRelInfos (which is basically what my
draft patch did), but I find that to be an uncomfortable answer: it
seems *way* too likely to result in code that can mask real bugs.
I'd be happier fixing things so that inheritance_planner doesn't
propagate anything into the subquery that doesn't make sense in the
subquery's context.  But perhaps that's unreasonably hard?  Not enough
data yet.

            regards, tom lane


Re: Bogus lateral-reference-propagation logic increate_lateral_join_info

From
Amit Langote
Date:
On 2019/02/06 12:11, Tom Lane wrote:
> Yeah, I misspoke above, it's in partition_join not partition_prune,
> specifically
> 
> DELETE FROM prt1_l
> WHERE EXISTS (
>   SELECT 1
>     FROM int4_tbl,
>          LATERAL (SELECT int4_tbl.f1 FROM int8_tbl LIMIT 2) ss
>     WHERE prt1_l.c IS NULL);
> 
> I didn't run this totally to bottom yet, but what seems to be
> happening is that inheritance_planner is creating a partition-specific
> subplan for the DELETE, and it's allowing AppendRelInfos from the
> parent query to propagate into the subquery even though they have
> nothing to do with that subquery.
>
> We could just decide that it's okay for code dealing with the subquery
> to ignore the irrelevant AppendRelInfos (which is basically what my
> draft patch did), but I find that to be an uncomfortable answer: it
> seems *way* too likely to result in code that can mask real bugs.
> I'd be happier fixing things so that inheritance_planner doesn't
> propagate anything into the subquery that doesn't make sense in the
> subquery's context.  But perhaps that's unreasonably hard?  Not enough
> data yet.

The target-relation specific entries in the append_rel_list of the
original PlannerInfo are *only* for inheritance_planner to use, so it
seems OK to ignore them during child target planning in any way possible,
which in your patch's case is by ignoring the AppendRelInfos whose
parent_relid fetches a NULL base rel.  The reason for them to be NULL, as
might be clear to you, is that reference to the parent target relation in
the child query's jointree gets replaced by the reference to child target
relation.  Maybe we could document that in the comment, instead of this
done by your patch:

+         * Apparently append_rel_list can contain bogus parent rels nowadays
+         *
+        if (parentrel == NULL)
+            continue;*/

Note that a RelOptInfo is never built for the *original* target relation
of the query in the inheritance case (only children, including parent in
its role as child in the regular inheritance case), so there's no
possibility of LATERAL info (or anything that's initialized by
query_planner for that matter) ever being associated with that relation.

Thanks,
Amit



Re: Bogus lateral-reference-propagation logic in create_lateral_join_info

From
Tom Lane
Date:
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
> On 2019/02/06 12:11, Tom Lane wrote:
>> I didn't run this totally to bottom yet, but what seems to be
>> happening is that inheritance_planner is creating a partition-specific
>> subplan for the DELETE, and it's allowing AppendRelInfos from the
>> parent query to propagate into the subquery even though they have
>> nothing to do with that subquery.
>> 
>> We could just decide that it's okay for code dealing with the subquery
>> to ignore the irrelevant AppendRelInfos (which is basically what my
>> draft patch did), but I find that to be an uncomfortable answer: it
>> seems *way* too likely to result in code that can mask real bugs.
>> I'd be happier fixing things so that inheritance_planner doesn't
>> propagate anything into the subquery that doesn't make sense in the
>> subquery's context.  But perhaps that's unreasonably hard?  Not enough
>> data yet.

> The target-relation specific entries in the append_rel_list of the
> original PlannerInfo are *only* for inheritance_planner to use, so it
> seems OK to ignore them during child target planning in any way possible,
> which in your patch's case is by ignoring the AppendRelInfos whose
> parent_relid fetches a NULL base rel.

I experimented with having inheritance_planner remove AppendRelInfos that
aren't relevant to the current child query.  While it's quite easy to get
rid of everything at or above the current child, as per the attached,
that's not enough to stop initsplan.c from seeing irrelevant entries:
there might still be some linking siblings of the current child to their
children, or descendants of those.  So I think we'll have to put up with
using a test-for-NULL hack in create_lateral_join_info.  I'll adjust the
comment to explain why we need it.

I'm posting the attached mainly because I'm wondering if we should
apply it despite it not being able to remove every irrelevant entry.
In many cases (particularly with wide partition trees) it'd greatly
reduce the length of the append_rel_list passed down to each
subquery, and maybe that'd save enough cycles to make it worth doing
just on performance grounds.  I've not attempted to measure though.

            regards, tom lane

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b223972..fc0f08e 100644
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*************** inheritance_planner(PlannerInfo *root)
*** 1307,1312 ****
--- 1307,1313 ----
          RangeTblEntry *child_rte;
          RelOptInfo *sub_final_rel;
          Path       *subpath;
+         ListCell   *lc2;

          /* append_rel_list contains all append rels; ignore others */
          if (!bms_is_member(appinfo->parent_relid, parent_relids))
*************** inheritance_planner(PlannerInfo *root)
*** 1416,1439 ****
           * want to copy items that actually contain such references; the rest
           * can just get linked into the subroot's append_rel_list.
           *
!          * If we know there are no such references, we can just use the outer
!          * append_rel_list unmodified.
           */
!         if (modifiableARIindexes != NULL)
          {
!             ListCell   *lc2;

!             subroot->append_rel_list = NIL;
!             foreach(lc2, parent_root->append_rel_list)
!             {
!                 AppendRelInfo *appinfo2 = lfirst_node(AppendRelInfo, lc2);

!                 if (bms_is_member(appinfo2->child_relid, modifiableARIindexes))
!                     appinfo2 = copyObject(appinfo2);

!                 subroot->append_rel_list = lappend(subroot->append_rel_list,
!                                                    appinfo2);
!             }
          }

          /*
--- 1417,1441 ----
           * want to copy items that actually contain such references; the rest
           * can just get linked into the subroot's append_rel_list.
           *
!          * While we're at it, drop the AppendRelInfo items that link the
!          * current parent to its children from the subquery's append_rel_list.
!          * Those items are irrelevant for the subquery, so keeping them just
!          * wastes cycles during subquery planning, and could confuse code
!          * elsewhere.
           */
!         subroot->append_rel_list = NIL;
!         foreach(lc2, parent_root->append_rel_list)
          {
!             AppendRelInfo *appinfo2 = lfirst_node(AppendRelInfo, lc2);

!             if (appinfo2->parent_relid == appinfo->parent_relid)
!                 continue;

!             if (bms_is_member(appinfo2->child_relid, modifiableARIindexes))
!                 appinfo2 = copyObject(appinfo2);

!             subroot->append_rel_list = lappend(subroot->append_rel_list,
!                                                appinfo2);
          }

          /*