Re: Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records |
Date | |
Msg-id | 2138765.1664924048@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records
|
List | pgsql-hackers |
[ redirecting to -hackers because patch attached ] David Rowley <dgrowleyml@gmail.com> writes: > So that confirms there were 950k relations in the xl_standby_locks. > The contents of that message seem to be produced by standby_desc(). > That should be the same WAL record that's processed by standby_redo() > which adds the 950k locks to the RecoveryLockListsEntry. > I'm not seeing why 950k becomes 134m. I figured out what the problem is. The standby's startup process retains knowledge of all these locks in standby.c's RecoveryLockLists data structure, which *has no de-duplication capability*. It'll add another entry to the per-XID list any time it's told about a given exclusive lock. And checkpoints cause us to regurgitate the entire set of currently-held exclusive locks into the WAL. So if you have a process holding a lot of exclusive locks, and sitting on them across multiple checkpoints, standby startup processes will bloat. It's not a true leak, in that we know where the memory is and we'll release it whenever we see that XID commit/abort. And I doubt that this is a common usage pattern, which probably explains the lack of previous complaints. Still, bloat bad. PFA a quick-hack fix that solves this issue by making per-transaction subsidiary hash tables. That's overkill perhaps; I'm a little worried about whether this slows down normal cases more than it's worth. But we ought to do something about this, because aside from the duplication aspect the current storage of these lists seems mighty space-inefficient. regards, tom lane diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c index 9dab931990..14b86cd76a 100644 --- a/src/backend/storage/ipc/standby.c +++ b/src/backend/storage/ipc/standby.c @@ -42,7 +42,26 @@ int max_standby_archive_delay = 30 * 1000; int max_standby_streaming_delay = 30 * 1000; bool log_recovery_conflict_waits = false; -static HTAB *RecoveryLockLists; +/* + * Keep track of all the exclusive locks owned by original transactions. + * For each original transaction that is known to have any such locks, + * there is a RecoveryLockHashEntry in the RecoveryLockHash hash table, + * pointing to a subsidiary hash table containing RecoveryLockEntry items. + */ +typedef struct RecoveryLockHashEntry +{ + TransactionId xid; /* hash key -- must be first */ + HTAB *hashtab; /* table of locks belonging to xid */ +} RecoveryLockHashEntry; + +typedef struct RecoveryLockEntry +{ + /* Both Oids are included in the hash key; there is no payload per se */ + Oid dbOid; /* DB containing table */ + Oid relOid; /* OID of table */ +} RecoveryLockEntry; + +static HTAB *RecoveryLockHash = NULL; /* Flags set by timeout handlers */ static volatile sig_atomic_t got_standby_deadlock_timeout = false; @@ -58,15 +77,6 @@ static XLogRecPtr LogCurrentRunningXacts(RunningTransactions CurrRunningXacts); static void LogAccessExclusiveLocks(int nlocks, xl_standby_lock *locks); static const char *get_recovery_conflict_desc(ProcSignalReason reason); -/* - * Keep track of all the locks owned by a given transaction. - */ -typedef struct RecoveryLockListsEntry -{ - TransactionId xid; - List *locks; -} RecoveryLockListsEntry; - /* * InitRecoveryTransactionEnvironment * Initialize tracking of our primary's in-progress transactions. @@ -85,16 +95,18 @@ InitRecoveryTransactionEnvironment(void) VirtualTransactionId vxid; HASHCTL hash_ctl; + Assert(RecoveryLockHash == NULL); /* don't run this twice */ + /* * Initialize the hash table for tracking the list of locks held by each * transaction. */ hash_ctl.keysize = sizeof(TransactionId); - hash_ctl.entrysize = sizeof(RecoveryLockListsEntry); - RecoveryLockLists = hash_create("RecoveryLockLists", - 64, - &hash_ctl, - HASH_ELEM | HASH_BLOBS); + hash_ctl.entrysize = sizeof(RecoveryLockHashEntry); + RecoveryLockHash = hash_create("RecoveryLockHash", + 64, + &hash_ctl, + HASH_ELEM | HASH_BLOBS); /* * Initialize shared invalidation management for Startup process, being @@ -140,12 +152,12 @@ void ShutdownRecoveryTransactionEnvironment(void) { /* - * Do nothing if RecoveryLockLists is NULL because which means that - * transaction tracking has not been yet initialized or has been already - * shutdowned. This prevents transaction tracking from being shutdowned - * unexpectedly more than once. + * Do nothing if RecoveryLockHash is NULL because that means that + * transaction tracking has not yet been initialized or has already been + * shut down. This makes it safe to have possibly-redundant calls of this + * function during process exit. */ - if (RecoveryLockLists == NULL) + if (RecoveryLockHash == NULL) return; /* Mark all tracked in-progress transactions as finished. */ @@ -155,8 +167,8 @@ ShutdownRecoveryTransactionEnvironment(void) StandbyReleaseAllLocks(); /* Destroy the hash table of locks. */ - hash_destroy(RecoveryLockLists); - RecoveryLockLists = NULL; + hash_destroy(RecoveryLockHash); + RecoveryLockHash = NULL; /* Cleanup our VirtualTransaction */ VirtualXactLockTableCleanup(); @@ -932,12 +944,13 @@ StandbyLockTimeoutHandler(void) * We only keep track of AccessExclusiveLocks, which are only ever held by * one transaction on one relation. * - * We keep a hash table of lists of locks in local memory keyed by xid, - * RecoveryLockLists, so we can keep track of the various entries made by - * the Startup process's virtual xid in the shared lock table. - * - * List elements use type xl_standby_lock, since the WAL record type exactly - * matches the information that we need to keep track of. + * We keep a hash table of hashes of locks in local memory keyed by xid, + * RecoveryLockHash, so we can keep track of the various entries made by + * the Startup process's virtual xid in the shared lock table. This data + * structure allows us to efficiently find all the locks held by a given + * original transaction. It also lets us efficiently de-duplicate locks, + * which is important because checkpoints will re-report the same locks + * already held. * * We use session locks rather than normal locks so we don't need * ResourceOwners. @@ -947,9 +960,8 @@ StandbyLockTimeoutHandler(void) void StandbyAcquireAccessExclusiveLock(TransactionId xid, Oid dbOid, Oid relOid) { - RecoveryLockListsEntry *entry; - xl_standby_lock *newlock; - LOCKTAG locktag; + RecoveryLockHashEntry *entry; + RecoveryLockEntry lockent; bool found; /* Already processed? */ @@ -964,62 +976,84 @@ StandbyAcquireAccessExclusiveLock(TransactionId xid, Oid dbOid, Oid relOid) /* dbOid is InvalidOid when we are locking a shared relation. */ Assert(OidIsValid(relOid)); - /* Create a new list for this xid, if we don't have one already. */ - entry = hash_search(RecoveryLockLists, &xid, HASH_ENTER, &found); + /* Create a new hash table for this xid, if we don't have one already. */ + entry = hash_search(RecoveryLockHash, &xid, HASH_ENTER, &found); if (!found) { - entry->xid = xid; - entry->locks = NIL; + HASHCTL hash_ctl; + char hash_label[64]; + + Assert(entry->xid == xid); /* dynahash should have set this */ + + hash_ctl.keysize = sizeof(RecoveryLockEntry); + hash_ctl.entrysize = sizeof(RecoveryLockEntry); + snprintf(hash_label, sizeof(hash_label), "RecoveryLockHash_%u", xid); + entry->hashtab = hash_create(hash_label, + 64, + &hash_ctl, + HASH_ELEM | HASH_BLOBS); } - newlock = palloc(sizeof(xl_standby_lock)); - newlock->xid = xid; - newlock->dbOid = dbOid; - newlock->relOid = relOid; - entry->locks = lappend(entry->locks, newlock); + /* Create a new hash entry for this lock, unless we have one already. */ + lockent.dbOid = dbOid; + lockent.relOid = relOid; + (void) hash_search(entry->hashtab, &lockent, HASH_ENTER, &found); + if (!found) + { + /* It's new, so acquire the lock during replay. */ + LOCKTAG locktag; - SET_LOCKTAG_RELATION(locktag, newlock->dbOid, newlock->relOid); + SET_LOCKTAG_RELATION(locktag, dbOid, relOid); - (void) LockAcquire(&locktag, AccessExclusiveLock, true, false); + (void) LockAcquire(&locktag, AccessExclusiveLock, true, false); + } } +/* + * Release all the locks recorded in this RecoveryLockHashEntry, + * and destroy the subsidiary hash table. + */ static void -StandbyReleaseLockList(List *locks) +StandbyReleaseLockHash(RecoveryLockHashEntry *entry) { - ListCell *lc; + HASH_SEQ_STATUS status; + RecoveryLockEntry *lockent; - foreach(lc, locks) + hash_seq_init(&status, entry->hashtab); + while ((lockent = hash_seq_search(&status))) { - xl_standby_lock *lock = (xl_standby_lock *) lfirst(lc); LOCKTAG locktag; elog(trace_recovery(DEBUG4), "releasing recovery lock: xid %u db %u rel %u", - lock->xid, lock->dbOid, lock->relOid); - SET_LOCKTAG_RELATION(locktag, lock->dbOid, lock->relOid); + entry->xid, lockent->dbOid, lockent->relOid); + SET_LOCKTAG_RELATION(locktag, lockent->dbOid, lockent->relOid); if (!LockRelease(&locktag, AccessExclusiveLock, true)) { elog(LOG, - "RecoveryLockLists contains entry for lock no longer recorded by lock manager: xid %u database %u relation%u", - lock->xid, lock->dbOid, lock->relOid); + "RecoveryLockHash contains entry for lock no longer recorded by lock manager: xid %u database %u relation%u", + entry->xid, lockent->dbOid, lockent->relOid); Assert(false); } } - list_free_deep(locks); + hash_destroy(entry->hashtab); } +/* + * Release locks for specific XID, or all locks if it's InvalidXid. + */ static void StandbyReleaseLocks(TransactionId xid) { - RecoveryLockListsEntry *entry; + RecoveryLockHashEntry *entry; if (TransactionIdIsValid(xid)) { - if ((entry = hash_search(RecoveryLockLists, &xid, HASH_FIND, NULL))) + if ((entry = hash_search(RecoveryLockHash, &xid, HASH_FIND, NULL))) { - StandbyReleaseLockList(entry->locks); - hash_search(RecoveryLockLists, entry, HASH_REMOVE, NULL); + StandbyReleaseLockHash(entry); + hash_search(RecoveryLockHash, entry, HASH_REMOVE, NULL); } } else @@ -1028,7 +1062,7 @@ StandbyReleaseLocks(TransactionId xid) /* * Release locks for a transaction tree, starting at xid down, from - * RecoveryLockLists. + * RecoveryLockHash. * * Called during WAL replay of COMMIT/ROLLBACK when in hot standby mode, * to remove any AccessExclusiveLocks requested by a transaction. @@ -1051,15 +1085,15 @@ void StandbyReleaseAllLocks(void) { HASH_SEQ_STATUS status; - RecoveryLockListsEntry *entry; + RecoveryLockHashEntry *entry; elog(trace_recovery(DEBUG2), "release all standby locks"); - hash_seq_init(&status, RecoveryLockLists); + hash_seq_init(&status, RecoveryLockHash); while ((entry = hash_seq_search(&status))) { - StandbyReleaseLockList(entry->locks); - hash_search(RecoveryLockLists, entry, HASH_REMOVE, NULL); + StandbyReleaseLockHash(entry); + hash_search(RecoveryLockHash, entry, HASH_REMOVE, NULL); } } @@ -1072,9 +1106,9 @@ void StandbyReleaseOldLocks(TransactionId oldxid) { HASH_SEQ_STATUS status; - RecoveryLockListsEntry *entry; + RecoveryLockHashEntry *entry; - hash_seq_init(&status, RecoveryLockLists); + hash_seq_init(&status, RecoveryLockHash); while ((entry = hash_seq_search(&status))) { Assert(TransactionIdIsValid(entry->xid)); @@ -1088,8 +1122,8 @@ StandbyReleaseOldLocks(TransactionId oldxid) continue; /* Remove all locks and hash table entry. */ - StandbyReleaseLockList(entry->locks); - hash_search(RecoveryLockLists, entry, HASH_REMOVE, NULL); + StandbyReleaseLockHash(entry); + hash_search(RecoveryLockHash, entry, HASH_REMOVE, NULL); } }
pgsql-hackers by date: