Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)" - Mailing list pgsql-hackers

From Tom Lane
Subject Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)"
Date
Msg-id 4993.1531773411@sss.pgh.pa.us
Whole thread Raw
In response to Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)"  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)"
List pgsql-hackers
I wrote:
> So I said I didn't want to do extra work on this, but I am looking into
> fixing it by having these aux process types run a ResourceOwner that can
> be told to clean up any open buffer pins at exit.  We could be sure the
> coverage is complete by dint of removing the special-case code in
> resowner.c that allows buffer pins to be taken with no active resowner.
> Then CheckForBufferLeaks can be left as-is, ie something we do only
> in assert builds.

That turned out to be a larger can of worms than I'd anticipated, as it
soon emerged that we'd acquired a whole lot of cargo-cult programming
around ResourceOwners.  Having a ResourceOwner in itself does nothing
for you: there has to be code someplace that ensures we'll call
ResourceOwnerRelease at an appropriate time.  There was basically noplace
outside xact.c and portalmem.c that got this completely right.  bgwriter.c
and a couple of other places at least tried a little, by doing a
ResourceOwnerRelease in their sigsetjmp-catching stanzas, but that didn't
account for the process-exit code path.  Other places just created a
ResourceOwner and did *nothing* about cleaning it up.

I decided that the most expedient way of dealing with this was to create
a self-contained facility in resowner.c that would create a standalone
ResourceOwner and register a shmem-exit callback to clean it up.
That way it's easier (less code) to do it right than to do it wrong.
That led to the attached, which passes check-world as well as Michael's
full-disk test script.

I'm mostly pretty happy with this, but I think there are a couple of
loose ends in logicalfuncs.c and slotfuncs.c: those are creating
non-standalone ResourceOwners (children of whatever the active
ResourceOwner is) and doing nothing much to clean them up.  That seems
pretty bogus.  It's not a permanent resource leak, because sooner or
later the parent ResourceOwner will get cleaned up and that will
recurse to the child ... but then why bother with a child ResourceOwner
at all?  I added asserts to these calls to verify that there was a
parent resowner (if there isn't, the code is just broken), but I think
that we should either add more code to clean up the child resowner
promptly, or just not bother with a child at all.

Not very sure what to do with this.  I don't think we should try to
backpatch it, because it's essentially changing the API requirements
around buffer access in auxiliary processes --- but it might not be
too late to squeeze it into v11.  It is a bug fix, in that the current
code can leak buffer pins in some code paths and we won't notice in
non-assert builds; but we've not really seen any field reports of that
happening, so I'm not sure how important this is.

            regards, tom lane

diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 4e32cff..3a5cd0e 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -1221,8 +1221,7 @@ ParallelWorkerMain(Datum main_arg)
     memcpy(&ParallelWorkerNumber, MyBgworkerEntry->bgw_extra, sizeof(int));

     /* Set up a memory context and resource owner. */
-    Assert(CurrentResourceOwner == NULL);
-    CurrentResourceOwner = ResourceOwnerCreate(NULL, "parallel toplevel");
+    CreateAuxProcessResourceOwner();
     CurrentMemoryContext = AllocSetContextCreate(TopMemoryContext,
                                                  "Parallel worker",
                                                  ALLOCSET_DEFAULT_SIZES);
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4049deb..5d01edd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6356,6 +6356,15 @@ StartupXLOG(void)
     struct stat st;

     /*
+     * We should have an aux process resource owner to use, and we should not
+     * be in a transaction that's installed some other resowner.
+     */
+    Assert(AuxProcessResourceOwner != NULL);
+    Assert(CurrentResourceOwner == NULL ||
+           CurrentResourceOwner == AuxProcessResourceOwner);
+    CurrentResourceOwner = AuxProcessResourceOwner;
+
+    /*
      * Verify XLOG status looks valid.
      */
     if (ControlFile->state < DB_SHUTDOWNED ||
@@ -8467,6 +8476,15 @@ GetNextXidAndEpoch(TransactionId *xid, uint32 *epoch)
 void
 ShutdownXLOG(int code, Datum arg)
 {
+    /*
+     * We should have an aux process resource owner to use, and we should not
+     * be in a transaction that's installed some other resowner.
+     */
+    Assert(AuxProcessResourceOwner != NULL);
+    Assert(CurrentResourceOwner == NULL ||
+           CurrentResourceOwner == AuxProcessResourceOwner);
+    CurrentResourceOwner = AuxProcessResourceOwner;
+
     /* Don't be chatty in standalone mode */
     ereport(IsPostmasterEnvironment ? LOG : NOTICE,
             (errmsg("shutting down")));
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 7e34bee..cdd71a9 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -403,6 +403,13 @@ AuxiliaryProcessMain(int argc, char *argv[])
         /* finish setting up bufmgr.c */
         InitBufferPoolBackend();

+        /*
+         * Auxiliary processes don't run transactions, but they may need a
+         * resource owner anyway to manage buffer pins acquired outside
+         * transactions (and, perhaps, other things in future).
+         */
+        CreateAuxProcessResourceOwner();
+
         /* Initialize backend status information */
         pgstat_initialize();
         pgstat_bestart();
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 78e4c85..1d9cfc6 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -522,13 +522,9 @@ AutoVacLauncherMain(int argc, char *argv[])
         pgstat_report_wait_end();
         AbortBufferIO();
         UnlockBuffers();
-        if (CurrentResourceOwner)
-        {
-            ResourceOwnerRelease(CurrentResourceOwner,
-                                 RESOURCE_RELEASE_BEFORE_LOCKS,
-                                 false, true);
-            /* we needn't bother with the other ResourceOwnerRelease phases */
-        }
+        /* this is probably dead code, but let's be safe: */
+        if (AuxProcessResourceOwner)
+            ReleaseAuxProcessResources(false);
         AtEOXact_Buffers(false);
         AtEOXact_SMgr();
         AtEOXact_Files(false);
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index d5ce685..960d3de 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -142,12 +142,6 @@ BackgroundWriterMain(void)
     sigdelset(&BlockSig, SIGQUIT);

     /*
-     * Create a resource owner to keep track of our resources (currently only
-     * buffer pins).
-     */
-    CurrentResourceOwner = ResourceOwnerCreate(NULL, "Background Writer");
-
-    /*
      * We just started, assume there has been either a shutdown or
      * end-of-recovery snapshot.
      */
@@ -191,11 +185,7 @@ BackgroundWriterMain(void)
         ConditionVariableCancelSleep();
         AbortBufferIO();
         UnlockBuffers();
-        /* buffer pins are released here: */
-        ResourceOwnerRelease(CurrentResourceOwner,
-                             RESOURCE_RELEASE_BEFORE_LOCKS,
-                             false, true);
-        /* we needn't bother with the other ResourceOwnerRelease phases */
+        ReleaseAuxProcessResources(false);
         AtEOXact_Buffers(false);
         AtEOXact_SMgr();
         AtEOXact_Files(false);
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 0950ada..de1b22d 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -232,12 +232,6 @@ CheckpointerMain(void)
     last_checkpoint_time = last_xlog_switch_time = (pg_time_t) time(NULL);

     /*
-     * Create a resource owner to keep track of our resources (currently only
-     * buffer pins).
-     */
-    CurrentResourceOwner = ResourceOwnerCreate(NULL, "Checkpointer");
-
-    /*
      * Create a memory context that we will do all our work in.  We do this so
      * that we can reset the context during error recovery and thereby avoid
      * possible memory leaks.  Formerly this code just ran in
@@ -275,11 +269,7 @@ CheckpointerMain(void)
         pgstat_report_wait_end();
         AbortBufferIO();
         UnlockBuffers();
-        /* buffer pins are released here: */
-        ResourceOwnerRelease(CurrentResourceOwner,
-                             RESOURCE_RELEASE_BEFORE_LOCKS,
-                             false, true);
-        /* we needn't bother with the other ResourceOwnerRelease phases */
+        ReleaseAuxProcessResources(false);
         AtEOXact_Buffers(false);
         AtEOXact_SMgr();
         AtEOXact_Files(false);
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index 688d5b5..eceed1b 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -130,12 +130,6 @@ WalWriterMain(void)
     sigdelset(&BlockSig, SIGQUIT);

     /*
-     * Create a resource owner to keep track of our resources (not clear that
-     * we need this, but may as well have one).
-     */
-    CurrentResourceOwner = ResourceOwnerCreate(NULL, "Wal Writer");
-
-    /*
      * Create a memory context that we will do all our work in.  We do this so
      * that we can reset the context during error recovery and thereby avoid
      * possible memory leaks.  Formerly this code just ran in
@@ -172,11 +166,7 @@ WalWriterMain(void)
         pgstat_report_wait_end();
         AbortBufferIO();
         UnlockBuffers();
-        /* buffer pins are released here: */
-        ResourceOwnerRelease(CurrentResourceOwner,
-                             RESOURCE_RELEASE_BEFORE_LOCKS,
-                             false, true);
-        /* we needn't bother with the other ResourceOwnerRelease phases */
+        ReleaseAuxProcessResources(false);
         AtEOXact_Buffers(false);
         AtEOXact_SMgr();
         AtEOXact_Files(false);
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 54c25f1..6fac37b 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -279,6 +279,10 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
          */
         startptr = MyReplicationSlot->data.restart_lsn;

+        /*
+         * Record our resource usage in a child of the current resowner.
+         */
+        Assert(CurrentResourceOwner != NULL);
         CurrentResourceOwner = ResourceOwnerCreate(CurrentResourceOwner, "logical decoding");

         /* invalidate non-timetravel entries */
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 0d2b795..62c7d66 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1602,9 +1602,8 @@ ApplyWorkerMain(Datum main_arg)
     /* Load the libpq-specific functions */
     load_file("libpqwalreceiver", false);

-    Assert(CurrentResourceOwner == NULL);
-    CurrentResourceOwner = ResourceOwnerCreate(NULL,
-                                               "logical replication apply");
+    /* Set up resource owner */
+    CreateAuxProcessResourceOwner();

     /* Run as replica session replication role. */
     SetConfigOption("session_replication_role", "replica",
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 450f737..f2985ef 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -365,6 +365,10 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
                                     logical_read_local_xlog_page,
                                     NULL, NULL, NULL);

+        /*
+         * Record our resource usage in a child of the current resowner.
+         */
+        Assert(CurrentResourceOwner != NULL);
         CurrentResourceOwner = ResourceOwnerCreate(CurrentResourceOwner,
                                                    "logical decoding");

diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 987bb84..7c292d8 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -292,12 +292,6 @@ WalReceiverMain(void)
     if (WalReceiverFunctions == NULL)
         elog(ERROR, "libpqwalreceiver didn't initialize correctly");

-    /*
-     * Create a resource owner to keep track of our resources (not clear that
-     * we need this, but may as well have one).
-     */
-    CurrentResourceOwner = ResourceOwnerCreate(NULL, "Wal Receiver");
-
     /* Unblock signals (they were blocked when the postmaster forked us) */
     PG_SETMASK(&UnBlockSig);

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3a0106b..8e9e47b 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -265,7 +265,7 @@ InitWalSender(void)
     InitWalSenderSlot();

     /* Set up resource owner */
-    CurrentResourceOwner = ResourceOwnerCreate(NULL, "walsender top-level resource owner");
+    CreateAuxProcessResourceOwner();

     /*
      * Let postmaster know that we're a WAL sender. Once we've declared us as
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 09e0df2..5ef6315 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -632,8 +632,19 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
          * We are either a bootstrap process or a standalone backend. Either
          * way, start up the XLOG machinery, and register to have it closed
          * down at exit.
+         *
+         * We don't yet have an aux-process resource owner, but StartupXLOG
+         * and ShutdownXLOG will need one.  Hence, create said resource owner
+         * (and register a callback to clean it up after ShutdownXLOG runs).
          */
+        CreateAuxProcessResourceOwner();
+
         StartupXLOG();
+        /* Release (and warn about) any buffer pins leaked in StartupXLOG */
+        ReleaseAuxProcessResources(true);
+        /* Reset CurrentResourceOwner to nothing for the moment */
+        CurrentResourceOwner = NULL;
+
         on_shmem_exit(ShutdownXLOG, 0);
     }

diff --git a/src/backend/utils/resowner/resowner.c b/src/backend/utils/resowner/resowner.c
index bce021e..c708ebd 100644
--- a/src/backend/utils/resowner/resowner.c
+++ b/src/backend/utils/resowner/resowner.c
@@ -22,6 +22,7 @@

 #include "access/hash.h"
 #include "jit/jit.h"
+#include "storage/ipc.h"
 #include "storage/predicate.h"
 #include "storage/proc.h"
 #include "utils/memutils.h"
@@ -140,6 +141,7 @@ typedef struct ResourceOwnerData
 ResourceOwner CurrentResourceOwner = NULL;
 ResourceOwner CurTransactionResourceOwner = NULL;
 ResourceOwner TopTransactionResourceOwner = NULL;
+ResourceOwner AuxProcessResourceOwner = NULL;

 /*
  * List of add-on callbacks for resource releasing
@@ -165,6 +167,7 @@ static void ResourceOwnerReleaseInternal(ResourceOwner owner,
                              ResourceReleasePhase phase,
                              bool isCommit,
                              bool isTopLevel);
+static void ReleaseAuxProcessResourcesCallback(int code, Datum arg);
 static void PrintRelCacheLeakWarning(Relation rel);
 static void PrintPlanCacheLeakWarning(CachedPlan *plan);
 static void PrintTupleDescLeakWarning(TupleDesc tupdesc);
@@ -823,6 +826,60 @@ UnregisterResourceReleaseCallback(ResourceReleaseCallback callback, void *arg)
     }
 }

+/*
+ * Establish an AuxProcessResourceOwner for the current process.
+ */
+void
+CreateAuxProcessResourceOwner(void)
+{
+    Assert(AuxProcessResourceOwner == NULL);
+    Assert(CurrentResourceOwner == NULL);
+    AuxProcessResourceOwner = ResourceOwnerCreate(NULL, "AuxiliaryProcess");
+    CurrentResourceOwner = AuxProcessResourceOwner;
+
+    /*
+     * Register a shmem-exit callback for cleanup of aux-process resource
+     * owner.  (This needs to run after, e.g., ShutdownXLOG.)
+     */
+    on_shmem_exit(ReleaseAuxProcessResourcesCallback, 0);
+
+}
+
+/*
+ * Convenience routine to release all resources tracked in
+ * AuxProcessResourceOwner (but that resowner is not destroyed here).
+ * Warn about leaked resources if isCommit is true.
+ */
+void
+ReleaseAuxProcessResources(bool isCommit)
+{
+    /*
+     * At this writing, the only thing that could actually get released is
+     * buffer pins; but we may as well do the full release protocol.
+     */
+    ResourceOwnerRelease(AuxProcessResourceOwner,
+                         RESOURCE_RELEASE_BEFORE_LOCKS,
+                         isCommit, true);
+    ResourceOwnerRelease(AuxProcessResourceOwner,
+                         RESOURCE_RELEASE_LOCKS,
+                         isCommit, true);
+    ResourceOwnerRelease(AuxProcessResourceOwner,
+                         RESOURCE_RELEASE_AFTER_LOCKS,
+                         isCommit, true);
+}
+
+/*
+ * Shmem-exit callback for the same.
+ * Warn about leaked resources if process exit code is zero (ie normal).
+ */
+static void
+ReleaseAuxProcessResourcesCallback(int code, Datum arg)
+{
+    bool        isCommit = (code == 0);
+
+    ReleaseAuxProcessResources(isCommit);
+}
+

 /*
  * Make sure there is room for at least one more entry in a ResourceOwner's
@@ -830,15 +887,10 @@ UnregisterResourceReleaseCallback(ResourceReleaseCallback callback, void *arg)
  *
  * This is separate from actually inserting an entry because if we run out
  * of memory, it's critical to do so *before* acquiring the resource.
- *
- * We allow the case owner == NULL because the bufmgr is sometimes invoked
- * outside any transaction (for example, during WAL recovery).
  */
 void
 ResourceOwnerEnlargeBuffers(ResourceOwner owner)
 {
-    if (owner == NULL)
-        return;
     ResourceArrayEnlarge(&(owner->bufferarr));
 }

@@ -846,29 +898,19 @@ ResourceOwnerEnlargeBuffers(ResourceOwner owner)
  * Remember that a buffer pin is owned by a ResourceOwner
  *
  * Caller must have previously done ResourceOwnerEnlargeBuffers()
- *
- * We allow the case owner == NULL because the bufmgr is sometimes invoked
- * outside any transaction (for example, during WAL recovery).
  */
 void
 ResourceOwnerRememberBuffer(ResourceOwner owner, Buffer buffer)
 {
-    if (owner == NULL)
-        return;
     ResourceArrayAdd(&(owner->bufferarr), BufferGetDatum(buffer));
 }

 /*
  * Forget that a buffer pin is owned by a ResourceOwner
- *
- * We allow the case owner == NULL because the bufmgr is sometimes invoked
- * outside any transaction (for example, during WAL recovery).
  */
 void
 ResourceOwnerForgetBuffer(ResourceOwner owner, Buffer buffer)
 {
-    if (owner == NULL)
-        return;
     if (!ResourceArrayRemove(&(owner->bufferarr), BufferGetDatum(buffer)))
         elog(ERROR, "buffer %d is not owned by resource owner %s",
              buffer, owner->name);
diff --git a/src/include/utils/resowner.h b/src/include/utils/resowner.h
index fe7f491..fa03942 100644
--- a/src/include/utils/resowner.h
+++ b/src/include/utils/resowner.h
@@ -33,6 +33,7 @@ typedef struct ResourceOwnerData *ResourceOwner;
 extern PGDLLIMPORT ResourceOwner CurrentResourceOwner;
 extern PGDLLIMPORT ResourceOwner CurTransactionResourceOwner;
 extern PGDLLIMPORT ResourceOwner TopTransactionResourceOwner;
+extern PGDLLIMPORT ResourceOwner AuxProcessResourceOwner;

 /*
  * Resource releasing is done in three phases: pre-locks, locks, and
@@ -78,5 +79,7 @@ extern void RegisterResourceReleaseCallback(ResourceReleaseCallback callback,
                                 void *arg);
 extern void UnregisterResourceReleaseCallback(ResourceReleaseCallback callback,
                                   void *arg);
+extern void CreateAuxProcessResourceOwner(void);
+extern void ReleaseAuxProcessResources(bool isCommit);

 #endif                            /* RESOWNER_H */
diff --git a/src/test/modules/test_shm_mq/worker.c b/src/test/modules/test_shm_mq/worker.c
index bcb992e..e9bc30d 100644
--- a/src/test/modules/test_shm_mq/worker.c
+++ b/src/test/modules/test_shm_mq/worker.c
@@ -75,7 +75,7 @@ test_shm_mq_main(Datum main_arg)
      * of contents so we can locate the various data structures we'll need to
      * find within the segment.
      */
-    CurrentResourceOwner = ResourceOwnerCreate(NULL, "test_shm_mq worker");
+    CreateAuxProcessResourceOwner();
     seg = dsm_attach(DatumGetInt32(main_arg));
     if (seg == NULL)
         ereport(ERROR,

pgsql-hackers by date:

Previous
From: Adrien Nayrat
Date:
Subject: Re: New GUC to sample log queries
Next
From: Tomas Vondra
Date:
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists