Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)" - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)" |
Date | |
Msg-id | 4993.1531773411@sss.pgh.pa.us Whole thread Raw |
In response to | Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)" (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)"
|
List | pgsql-hackers |
I wrote: > So I said I didn't want to do extra work on this, but I am looking into > fixing it by having these aux process types run a ResourceOwner that can > be told to clean up any open buffer pins at exit. We could be sure the > coverage is complete by dint of removing the special-case code in > resowner.c that allows buffer pins to be taken with no active resowner. > Then CheckForBufferLeaks can be left as-is, ie something we do only > in assert builds. That turned out to be a larger can of worms than I'd anticipated, as it soon emerged that we'd acquired a whole lot of cargo-cult programming around ResourceOwners. Having a ResourceOwner in itself does nothing for you: there has to be code someplace that ensures we'll call ResourceOwnerRelease at an appropriate time. There was basically noplace outside xact.c and portalmem.c that got this completely right. bgwriter.c and a couple of other places at least tried a little, by doing a ResourceOwnerRelease in their sigsetjmp-catching stanzas, but that didn't account for the process-exit code path. Other places just created a ResourceOwner and did *nothing* about cleaning it up. I decided that the most expedient way of dealing with this was to create a self-contained facility in resowner.c that would create a standalone ResourceOwner and register a shmem-exit callback to clean it up. That way it's easier (less code) to do it right than to do it wrong. That led to the attached, which passes check-world as well as Michael's full-disk test script. I'm mostly pretty happy with this, but I think there are a couple of loose ends in logicalfuncs.c and slotfuncs.c: those are creating non-standalone ResourceOwners (children of whatever the active ResourceOwner is) and doing nothing much to clean them up. That seems pretty bogus. It's not a permanent resource leak, because sooner or later the parent ResourceOwner will get cleaned up and that will recurse to the child ... but then why bother with a child ResourceOwner at all? I added asserts to these calls to verify that there was a parent resowner (if there isn't, the code is just broken), but I think that we should either add more code to clean up the child resowner promptly, or just not bother with a child at all. Not very sure what to do with this. I don't think we should try to backpatch it, because it's essentially changing the API requirements around buffer access in auxiliary processes --- but it might not be too late to squeeze it into v11. It is a bug fix, in that the current code can leak buffer pins in some code paths and we won't notice in non-assert builds; but we've not really seen any field reports of that happening, so I'm not sure how important this is. regards, tom lane diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c index 4e32cff..3a5cd0e 100644 --- a/src/backend/access/transam/parallel.c +++ b/src/backend/access/transam/parallel.c @@ -1221,8 +1221,7 @@ ParallelWorkerMain(Datum main_arg) memcpy(&ParallelWorkerNumber, MyBgworkerEntry->bgw_extra, sizeof(int)); /* Set up a memory context and resource owner. */ - Assert(CurrentResourceOwner == NULL); - CurrentResourceOwner = ResourceOwnerCreate(NULL, "parallel toplevel"); + CreateAuxProcessResourceOwner(); CurrentMemoryContext = AllocSetContextCreate(TopMemoryContext, "Parallel worker", ALLOCSET_DEFAULT_SIZES); diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 4049deb..5d01edd 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6356,6 +6356,15 @@ StartupXLOG(void) struct stat st; /* + * We should have an aux process resource owner to use, and we should not + * be in a transaction that's installed some other resowner. + */ + Assert(AuxProcessResourceOwner != NULL); + Assert(CurrentResourceOwner == NULL || + CurrentResourceOwner == AuxProcessResourceOwner); + CurrentResourceOwner = AuxProcessResourceOwner; + + /* * Verify XLOG status looks valid. */ if (ControlFile->state < DB_SHUTDOWNED || @@ -8467,6 +8476,15 @@ GetNextXidAndEpoch(TransactionId *xid, uint32 *epoch) void ShutdownXLOG(int code, Datum arg) { + /* + * We should have an aux process resource owner to use, and we should not + * be in a transaction that's installed some other resowner. + */ + Assert(AuxProcessResourceOwner != NULL); + Assert(CurrentResourceOwner == NULL || + CurrentResourceOwner == AuxProcessResourceOwner); + CurrentResourceOwner = AuxProcessResourceOwner; + /* Don't be chatty in standalone mode */ ereport(IsPostmasterEnvironment ? LOG : NOTICE, (errmsg("shutting down"))); diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c index 7e34bee..cdd71a9 100644 --- a/src/backend/bootstrap/bootstrap.c +++ b/src/backend/bootstrap/bootstrap.c @@ -403,6 +403,13 @@ AuxiliaryProcessMain(int argc, char *argv[]) /* finish setting up bufmgr.c */ InitBufferPoolBackend(); + /* + * Auxiliary processes don't run transactions, but they may need a + * resource owner anyway to manage buffer pins acquired outside + * transactions (and, perhaps, other things in future). + */ + CreateAuxProcessResourceOwner(); + /* Initialize backend status information */ pgstat_initialize(); pgstat_bestart(); diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 78e4c85..1d9cfc6 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -522,13 +522,9 @@ AutoVacLauncherMain(int argc, char *argv[]) pgstat_report_wait_end(); AbortBufferIO(); UnlockBuffers(); - if (CurrentResourceOwner) - { - ResourceOwnerRelease(CurrentResourceOwner, - RESOURCE_RELEASE_BEFORE_LOCKS, - false, true); - /* we needn't bother with the other ResourceOwnerRelease phases */ - } + /* this is probably dead code, but let's be safe: */ + if (AuxProcessResourceOwner) + ReleaseAuxProcessResources(false); AtEOXact_Buffers(false); AtEOXact_SMgr(); AtEOXact_Files(false); diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c index d5ce685..960d3de 100644 --- a/src/backend/postmaster/bgwriter.c +++ b/src/backend/postmaster/bgwriter.c @@ -142,12 +142,6 @@ BackgroundWriterMain(void) sigdelset(&BlockSig, SIGQUIT); /* - * Create a resource owner to keep track of our resources (currently only - * buffer pins). - */ - CurrentResourceOwner = ResourceOwnerCreate(NULL, "Background Writer"); - - /* * We just started, assume there has been either a shutdown or * end-of-recovery snapshot. */ @@ -191,11 +185,7 @@ BackgroundWriterMain(void) ConditionVariableCancelSleep(); AbortBufferIO(); UnlockBuffers(); - /* buffer pins are released here: */ - ResourceOwnerRelease(CurrentResourceOwner, - RESOURCE_RELEASE_BEFORE_LOCKS, - false, true); - /* we needn't bother with the other ResourceOwnerRelease phases */ + ReleaseAuxProcessResources(false); AtEOXact_Buffers(false); AtEOXact_SMgr(); AtEOXact_Files(false); diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index 0950ada..de1b22d 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -232,12 +232,6 @@ CheckpointerMain(void) last_checkpoint_time = last_xlog_switch_time = (pg_time_t) time(NULL); /* - * Create a resource owner to keep track of our resources (currently only - * buffer pins). - */ - CurrentResourceOwner = ResourceOwnerCreate(NULL, "Checkpointer"); - - /* * Create a memory context that we will do all our work in. We do this so * that we can reset the context during error recovery and thereby avoid * possible memory leaks. Formerly this code just ran in @@ -275,11 +269,7 @@ CheckpointerMain(void) pgstat_report_wait_end(); AbortBufferIO(); UnlockBuffers(); - /* buffer pins are released here: */ - ResourceOwnerRelease(CurrentResourceOwner, - RESOURCE_RELEASE_BEFORE_LOCKS, - false, true); - /* we needn't bother with the other ResourceOwnerRelease phases */ + ReleaseAuxProcessResources(false); AtEOXact_Buffers(false); AtEOXact_SMgr(); AtEOXact_Files(false); diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c index 688d5b5..eceed1b 100644 --- a/src/backend/postmaster/walwriter.c +++ b/src/backend/postmaster/walwriter.c @@ -130,12 +130,6 @@ WalWriterMain(void) sigdelset(&BlockSig, SIGQUIT); /* - * Create a resource owner to keep track of our resources (not clear that - * we need this, but may as well have one). - */ - CurrentResourceOwner = ResourceOwnerCreate(NULL, "Wal Writer"); - - /* * Create a memory context that we will do all our work in. We do this so * that we can reset the context during error recovery and thereby avoid * possible memory leaks. Formerly this code just ran in @@ -172,11 +166,7 @@ WalWriterMain(void) pgstat_report_wait_end(); AbortBufferIO(); UnlockBuffers(); - /* buffer pins are released here: */ - ResourceOwnerRelease(CurrentResourceOwner, - RESOURCE_RELEASE_BEFORE_LOCKS, - false, true); - /* we needn't bother with the other ResourceOwnerRelease phases */ + ReleaseAuxProcessResources(false); AtEOXact_Buffers(false); AtEOXact_SMgr(); AtEOXact_Files(false); diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c index 54c25f1..6fac37b 100644 --- a/src/backend/replication/logical/logicalfuncs.c +++ b/src/backend/replication/logical/logicalfuncs.c @@ -279,6 +279,10 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin */ startptr = MyReplicationSlot->data.restart_lsn; + /* + * Record our resource usage in a child of the current resowner. + */ + Assert(CurrentResourceOwner != NULL); CurrentResourceOwner = ResourceOwnerCreate(CurrentResourceOwner, "logical decoding"); /* invalidate non-timetravel entries */ diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c index 0d2b795..62c7d66 100644 --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -1602,9 +1602,8 @@ ApplyWorkerMain(Datum main_arg) /* Load the libpq-specific functions */ load_file("libpqwalreceiver", false); - Assert(CurrentResourceOwner == NULL); - CurrentResourceOwner = ResourceOwnerCreate(NULL, - "logical replication apply"); + /* Set up resource owner */ + CreateAuxProcessResourceOwner(); /* Run as replica session replication role. */ SetConfigOption("session_replication_role", "replica", diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c index 450f737..f2985ef 100644 --- a/src/backend/replication/slotfuncs.c +++ b/src/backend/replication/slotfuncs.c @@ -365,6 +365,10 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto) logical_read_local_xlog_page, NULL, NULL, NULL); + /* + * Record our resource usage in a child of the current resowner. + */ + Assert(CurrentResourceOwner != NULL); CurrentResourceOwner = ResourceOwnerCreate(CurrentResourceOwner, "logical decoding"); diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index 987bb84..7c292d8 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -292,12 +292,6 @@ WalReceiverMain(void) if (WalReceiverFunctions == NULL) elog(ERROR, "libpqwalreceiver didn't initialize correctly"); - /* - * Create a resource owner to keep track of our resources (not clear that - * we need this, but may as well have one). - */ - CurrentResourceOwner = ResourceOwnerCreate(NULL, "Wal Receiver"); - /* Unblock signals (they were blocked when the postmaster forked us) */ PG_SETMASK(&UnBlockSig); diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 3a0106b..8e9e47b 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -265,7 +265,7 @@ InitWalSender(void) InitWalSenderSlot(); /* Set up resource owner */ - CurrentResourceOwner = ResourceOwnerCreate(NULL, "walsender top-level resource owner"); + CreateAuxProcessResourceOwner(); /* * Let postmaster know that we're a WAL sender. Once we've declared us as diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index 09e0df2..5ef6315 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -632,8 +632,19 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username, * We are either a bootstrap process or a standalone backend. Either * way, start up the XLOG machinery, and register to have it closed * down at exit. + * + * We don't yet have an aux-process resource owner, but StartupXLOG + * and ShutdownXLOG will need one. Hence, create said resource owner + * (and register a callback to clean it up after ShutdownXLOG runs). */ + CreateAuxProcessResourceOwner(); + StartupXLOG(); + /* Release (and warn about) any buffer pins leaked in StartupXLOG */ + ReleaseAuxProcessResources(true); + /* Reset CurrentResourceOwner to nothing for the moment */ + CurrentResourceOwner = NULL; + on_shmem_exit(ShutdownXLOG, 0); } diff --git a/src/backend/utils/resowner/resowner.c b/src/backend/utils/resowner/resowner.c index bce021e..c708ebd 100644 --- a/src/backend/utils/resowner/resowner.c +++ b/src/backend/utils/resowner/resowner.c @@ -22,6 +22,7 @@ #include "access/hash.h" #include "jit/jit.h" +#include "storage/ipc.h" #include "storage/predicate.h" #include "storage/proc.h" #include "utils/memutils.h" @@ -140,6 +141,7 @@ typedef struct ResourceOwnerData ResourceOwner CurrentResourceOwner = NULL; ResourceOwner CurTransactionResourceOwner = NULL; ResourceOwner TopTransactionResourceOwner = NULL; +ResourceOwner AuxProcessResourceOwner = NULL; /* * List of add-on callbacks for resource releasing @@ -165,6 +167,7 @@ static void ResourceOwnerReleaseInternal(ResourceOwner owner, ResourceReleasePhase phase, bool isCommit, bool isTopLevel); +static void ReleaseAuxProcessResourcesCallback(int code, Datum arg); static void PrintRelCacheLeakWarning(Relation rel); static void PrintPlanCacheLeakWarning(CachedPlan *plan); static void PrintTupleDescLeakWarning(TupleDesc tupdesc); @@ -823,6 +826,60 @@ UnregisterResourceReleaseCallback(ResourceReleaseCallback callback, void *arg) } } +/* + * Establish an AuxProcessResourceOwner for the current process. + */ +void +CreateAuxProcessResourceOwner(void) +{ + Assert(AuxProcessResourceOwner == NULL); + Assert(CurrentResourceOwner == NULL); + AuxProcessResourceOwner = ResourceOwnerCreate(NULL, "AuxiliaryProcess"); + CurrentResourceOwner = AuxProcessResourceOwner; + + /* + * Register a shmem-exit callback for cleanup of aux-process resource + * owner. (This needs to run after, e.g., ShutdownXLOG.) + */ + on_shmem_exit(ReleaseAuxProcessResourcesCallback, 0); + +} + +/* + * Convenience routine to release all resources tracked in + * AuxProcessResourceOwner (but that resowner is not destroyed here). + * Warn about leaked resources if isCommit is true. + */ +void +ReleaseAuxProcessResources(bool isCommit) +{ + /* + * At this writing, the only thing that could actually get released is + * buffer pins; but we may as well do the full release protocol. + */ + ResourceOwnerRelease(AuxProcessResourceOwner, + RESOURCE_RELEASE_BEFORE_LOCKS, + isCommit, true); + ResourceOwnerRelease(AuxProcessResourceOwner, + RESOURCE_RELEASE_LOCKS, + isCommit, true); + ResourceOwnerRelease(AuxProcessResourceOwner, + RESOURCE_RELEASE_AFTER_LOCKS, + isCommit, true); +} + +/* + * Shmem-exit callback for the same. + * Warn about leaked resources if process exit code is zero (ie normal). + */ +static void +ReleaseAuxProcessResourcesCallback(int code, Datum arg) +{ + bool isCommit = (code == 0); + + ReleaseAuxProcessResources(isCommit); +} + /* * Make sure there is room for at least one more entry in a ResourceOwner's @@ -830,15 +887,10 @@ UnregisterResourceReleaseCallback(ResourceReleaseCallback callback, void *arg) * * This is separate from actually inserting an entry because if we run out * of memory, it's critical to do so *before* acquiring the resource. - * - * We allow the case owner == NULL because the bufmgr is sometimes invoked - * outside any transaction (for example, during WAL recovery). */ void ResourceOwnerEnlargeBuffers(ResourceOwner owner) { - if (owner == NULL) - return; ResourceArrayEnlarge(&(owner->bufferarr)); } @@ -846,29 +898,19 @@ ResourceOwnerEnlargeBuffers(ResourceOwner owner) * Remember that a buffer pin is owned by a ResourceOwner * * Caller must have previously done ResourceOwnerEnlargeBuffers() - * - * We allow the case owner == NULL because the bufmgr is sometimes invoked - * outside any transaction (for example, during WAL recovery). */ void ResourceOwnerRememberBuffer(ResourceOwner owner, Buffer buffer) { - if (owner == NULL) - return; ResourceArrayAdd(&(owner->bufferarr), BufferGetDatum(buffer)); } /* * Forget that a buffer pin is owned by a ResourceOwner - * - * We allow the case owner == NULL because the bufmgr is sometimes invoked - * outside any transaction (for example, during WAL recovery). */ void ResourceOwnerForgetBuffer(ResourceOwner owner, Buffer buffer) { - if (owner == NULL) - return; if (!ResourceArrayRemove(&(owner->bufferarr), BufferGetDatum(buffer))) elog(ERROR, "buffer %d is not owned by resource owner %s", buffer, owner->name); diff --git a/src/include/utils/resowner.h b/src/include/utils/resowner.h index fe7f491..fa03942 100644 --- a/src/include/utils/resowner.h +++ b/src/include/utils/resowner.h @@ -33,6 +33,7 @@ typedef struct ResourceOwnerData *ResourceOwner; extern PGDLLIMPORT ResourceOwner CurrentResourceOwner; extern PGDLLIMPORT ResourceOwner CurTransactionResourceOwner; extern PGDLLIMPORT ResourceOwner TopTransactionResourceOwner; +extern PGDLLIMPORT ResourceOwner AuxProcessResourceOwner; /* * Resource releasing is done in three phases: pre-locks, locks, and @@ -78,5 +79,7 @@ extern void RegisterResourceReleaseCallback(ResourceReleaseCallback callback, void *arg); extern void UnregisterResourceReleaseCallback(ResourceReleaseCallback callback, void *arg); +extern void CreateAuxProcessResourceOwner(void); +extern void ReleaseAuxProcessResources(bool isCommit); #endif /* RESOWNER_H */ diff --git a/src/test/modules/test_shm_mq/worker.c b/src/test/modules/test_shm_mq/worker.c index bcb992e..e9bc30d 100644 --- a/src/test/modules/test_shm_mq/worker.c +++ b/src/test/modules/test_shm_mq/worker.c @@ -75,7 +75,7 @@ test_shm_mq_main(Datum main_arg) * of contents so we can locate the various data structures we'll need to * find within the segment. */ - CurrentResourceOwner = ResourceOwnerCreate(NULL, "test_shm_mq worker"); + CreateAuxProcessResourceOwner(); seg = dsm_attach(DatumGetInt32(main_arg)); if (seg == NULL) ereport(ERROR,
pgsql-hackers by date: