Re: Fwd: Core dump with nested CREATE TEMP TABLE - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Fwd: Core dump with nested CREATE TEMP TABLE
Date
Msg-id 7541.1441317080@sss.pgh.pa.us
Whole thread Raw
In response to Re: Fwd: Core dump with nested CREATE TEMP TABLE  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Fwd: Core dump with nested CREATE TEMP TABLE
List pgsql-hackers
I wrote:
> I'm inclined to think that the only real fix for this type of problem
> is that at subtransaction abort, we have to prevent further execution
> of any Portals that have executed at all within the subxact (ie force
> them into PORTAL_FAILED state and clean out their resources, see
> MarkPortalFailed).  It's not good enough to kill the one(s) that are
> actively executing at the instant of failure, because anything that's
> run at all since the subxact started might be holding a reference to an
> object we're about to delete.

> I'm a little worried that that will break usage patterns that work today,
> though.

I experimented with the attached patch.  It seems to work to stop the
crash Michael exhibited (I've not tried to duplicate Jim's original
example, though).  However, it also causes a regression test failure,
because transactions.sql does this:

BEGIN;
    DECLARE c CURSOR FOR SELECT unique2 FROM tenk1 ORDER BY unique2;
    SAVEPOINT one;
        FETCH 10 FROM c;
    ROLLBACK TO SAVEPOINT one;
        FETCH 10 FROM c;

which is exactly the case we're trying to reject now.  So that fills
me with fear that this approach would break existing applications.
On the other hand, I do not really see a good alternative :-(.

I thought about trying to detect whether the Portal actually had any
references to "new in subtransaction" objects to decide whether to
kill it or not, but that seems awfully fragile.

Ideas?

            regards, tom lane

diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 2794537..327b2a5 100644
*** a/src/backend/commands/portalcmds.c
--- b/src/backend/commands/portalcmds.c
*************** PersistHoldablePortal(Portal portal)
*** 335,345 ****
      /*
       * Check for improper portal use, and mark portal active.
       */
!     if (portal->status != PORTAL_READY)
!         ereport(ERROR,
!                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
!                  errmsg("portal \"%s\" cannot be run", portal->name)));
!     portal->status = PORTAL_ACTIVE;

      /*
       * Set up global portal context pointers.
--- 335,341 ----
      /*
       * Check for improper portal use, and mark portal active.
       */
!     MarkPortalActive(portal);

      /*
       * Set up global portal context pointers.
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 9c14e8a..0df86a2 100644
*** a/src/backend/tcop/pquery.c
--- b/src/backend/tcop/pquery.c
*************** PortalRun(Portal portal, long count, boo
*** 734,744 ****
      /*
       * Check for improper portal use, and mark portal active.
       */
!     if (portal->status != PORTAL_READY)
!         ereport(ERROR,
!                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
!                  errmsg("portal \"%s\" cannot be run", portal->name)));
!     portal->status = PORTAL_ACTIVE;

      /*
       * Set up global portal context pointers.
--- 734,740 ----
      /*
       * Check for improper portal use, and mark portal active.
       */
!     MarkPortalActive(portal);

      /*
       * Set up global portal context pointers.
*************** PortalRunFetch(Portal portal,
*** 1398,1408 ****
      /*
       * Check for improper portal use, and mark portal active.
       */
!     if (portal->status != PORTAL_READY)
!         ereport(ERROR,
!                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
!                  errmsg("portal \"%s\" cannot be run", portal->name)));
!     portal->status = PORTAL_ACTIVE;

      /*
       * Set up global portal context pointers.
--- 1394,1400 ----
      /*
       * Check for improper portal use, and mark portal active.
       */
!     MarkPortalActive(portal);

      /*
       * Set up global portal context pointers.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 7530498..74f5089 100644
*** a/src/backend/utils/mmgr/portalmem.c
--- b/src/backend/utils/mmgr/portalmem.c
*************** CreatePortal(const char *name, bool allo
*** 232,237 ****
--- 232,238 ----
      portal->status = PORTAL_NEW;
      portal->cleanup = PortalCleanup;
      portal->createSubid = GetCurrentSubTransactionId();
+     portal->activeSubid = portal->createSubid;
      portal->strategy = PORTAL_MULTI_QUERY;
      portal->cursorOptions = CURSOR_OPT_NO_SCROLL;
      portal->atStart = true;
*************** UnpinPortal(Portal portal)
*** 403,408 ****
--- 404,428 ----
  }

  /*
+  * MarkPortalActive
+  *        Transition a portal from READY to ACTIVE state.
+  *
+  * NOTE: never set portal->status = PORTAL_ACTIVE directly; call this instead.
+  */
+ void
+ MarkPortalActive(Portal portal)
+ {
+     /* For safety, this is a runtime test not just an Assert */
+     if (portal->status != PORTAL_READY)
+         ereport(ERROR,
+                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                  errmsg("portal \"%s\" cannot be run", portal->name)));
+     /* Perform the state transition */
+     portal->status = PORTAL_ACTIVE;
+     portal->activeSubid = GetCurrentSubTransactionId();
+ }
+
+ /*
   * MarkPortalDone
   *        Transition a portal from ACTIVE to DONE state.
   *
*************** PreCommit_Portals(bool isPrepare)
*** 690,695 ****
--- 710,716 ----
               * not belonging to this transaction.
               */
              portal->createSubid = InvalidSubTransactionId;
+             portal->activeSubid = InvalidSubTransactionId;

              /* Report we changed state */
              result = true;
*************** AtCleanup_Portals(void)
*** 836,843 ****
  /*
   * Pre-subcommit processing for portals.
   *
!  * Reassign the portals created in the current subtransaction to the parent
!  * subtransaction.
   */
  void
  AtSubCommit_Portals(SubTransactionId mySubid,
--- 857,864 ----
  /*
   * Pre-subcommit processing for portals.
   *
!  * Reassign portals created or used in the current subtransaction to the
!  * parent subtransaction.
   */
  void
  AtSubCommit_Portals(SubTransactionId mySubid,
*************** AtSubCommit_Portals(SubTransactionId myS
*** 859,872 ****
              if (portal->resowner)
                  ResourceOwnerNewParent(portal->resowner, parentXactOwner);
          }
      }
  }

  /*
   * Subtransaction abort handling for portals.
   *
!  * Deactivate portals created during the failed subtransaction.
!  * Note that per AtSubCommit_Portals, this will catch portals created
   * in descendants of the subtransaction too.
   *
   * We don't destroy any portals here; that's done in AtSubCleanup_Portals.
--- 880,895 ----
              if (portal->resowner)
                  ResourceOwnerNewParent(portal->resowner, parentXactOwner);
          }
+         if (portal->activeSubid == mySubid)
+             portal->activeSubid = parentSubid;
      }
  }

  /*
   * Subtransaction abort handling for portals.
   *
!  * Deactivate portals created or used during the failed subtransaction.
!  * Note that per AtSubCommit_Portals, this will catch portals created/used
   * in descendants of the subtransaction too.
   *
   * We don't destroy any portals here; that's done in AtSubCleanup_Portals.
*************** AtSubAbort_Portals(SubTransactionId mySu
*** 885,900 ****
      {
          Portal        portal = hentry->portal;

          if (portal->createSubid != mySubid)
              continue;

          /*
           * Force any live portals of my own subtransaction into FAILED state.
           * We have to do this because they might refer to objects created or
           * changed in the failed subtransaction, leading to crashes if
           * execution is resumed, or even if we just try to run ExecutorEnd.
-          * (Note we do NOT do this to upper-level portals, since they cannot
-          * have such references and hence may be able to continue.)
           */
          if (portal->status == PORTAL_READY ||
              portal->status == PORTAL_ACTIVE)
--- 908,956 ----
      {
          Portal        portal = hentry->portal;

+         /* Was it created in this subtransaction? */
          if (portal->createSubid != mySubid)
+         {
+             /* No, but maybe it was used in this subtransaction? */
+             if (portal->activeSubid == mySubid)
+             {
+                 /* Maintain activeSubid until the portal is removed */
+                 portal->activeSubid = parentSubid;
+
+                 /*
+                  * Upper-level portals that were used in this subxact must be
+                  * forced into FAILED state, for the same reasons discussed
+                  * below.
+                  */
+                 if (portal->status == PORTAL_READY ||
+                     portal->status == PORTAL_ACTIVE)
+                     MarkPortalFailed(portal);
+
+                 /*
+                  * Also, if we failed it during the current subxact (either
+                  * just above, or earlier), reattach its resowner to the
+                  * current subtransaction's resowner so that its resources get
+                  * cleaned up while we abort this subtransaction.  This is
+                  * needed to avoid getting Assert failures or worse when we
+                  * clean up objects created in the subtransaction.
+                  */
+                 if (portal->status == PORTAL_FAILED &&
+                     portal->resowner)
+                 {
+                     ResourceOwnerNewParent(portal->resowner,
+                                            CurrentResourceOwner);
+                     portal->resowner = NULL;
+                 }
+             }
+             /* If it wasn't created in this subxact, nothing more to do */
              continue;
+         }

          /*
           * Force any live portals of my own subtransaction into FAILED state.
           * We have to do this because they might refer to objects created or
           * changed in the failed subtransaction, leading to crashes if
           * execution is resumed, or even if we just try to run ExecutorEnd.
           */
          if (portal->status == PORTAL_READY ||
              portal->status == PORTAL_ACTIVE)
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 1267b9e..7215e46 100644
*** a/src/include/utils/portal.h
--- b/src/include/utils/portal.h
*************** typedef struct PortalData
*** 119,130 ****
      MemoryContext heap;            /* subsidiary memory for portal */
      ResourceOwner resowner;        /* resources owned by portal */
      void        (*cleanup) (Portal portal);        /* cleanup hook */
-     SubTransactionId createSubid;        /* the ID of the creating subxact */

      /*
!      * if createSubid is InvalidSubTransactionId, the portal is held over from
!      * a previous transaction
       */

      /* The query or queries the portal will execute */
      const char *sourceText;        /* text of query (as of 8.4, never NULL) */
--- 119,136 ----
      MemoryContext heap;            /* subsidiary memory for portal */
      ResourceOwner resowner;        /* resources owned by portal */
      void        (*cleanup) (Portal portal);        /* cleanup hook */

      /*
!      * State data for remembering which subtransaction(s) the portal was
!      * created or used in.  If the portal is held over from a previous
!      * transaction, both subxids are InvalidSubTransactionId.  Otherwise,
!      * createSubid is the creating subxact and activeSubid is the last subxact
!      * in which we ran the portal.  We must deactivate the portal if either of
!      * these subxacts becomes aborted, since it might hold references to
!      * objects created in said subxact.
       */
+     SubTransactionId createSubid;        /* the creating subxact */
+     SubTransactionId activeSubid;        /* the last subxact with activity */

      /* The query or queries the portal will execute */
      const char *sourceText;        /* text of query (as of 8.4, never NULL) */
*************** extern Portal CreatePortal(const char *n
*** 207,212 ****
--- 213,219 ----
  extern Portal CreateNewPortal(void);
  extern void PinPortal(Portal portal);
  extern void UnpinPortal(Portal portal);
+ extern void MarkPortalActive(Portal portal);
  extern void MarkPortalDone(Portal portal);
  extern void MarkPortalFailed(Portal portal);
  extern void PortalDrop(Portal portal, bool isTopCommit);

pgsql-hackers by date:

Previous
From: Petr Jelinek
Date:
Subject: Re: Freeze avoidance of very large table.
Next
From: Corey Huinker
Date:
Subject: Re: [POC] FETCH limited by bytes.