Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock |
Date | |
Msg-id | 20220810052830.g7vf2jktxix22wmt@awork3.anarazel.de Whole thread Raw |
In response to | Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock (Mark Dilger <mark.dilger@enterprisedb.com>) |
Responses |
Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock |
List | pgsql-hackers |
Hi, On 2022-08-09 20:21:19 -0700, Mark Dilger wrote: > > On Aug 9, 2022, at 7:26 PM, Andres Freund <andres@anarazel.de> wrote: > > > > The relevant code triggering it: > > > > newbuf = XLogInitBufferForRedo(record, 1); > > _hash_initbuf(newbuf, xlrec->new_bucket, xlrec->new_bucket, > > xlrec->new_bucket_flag, true); > > if (!IsBufferCleanupOK(newbuf)) > > elog(PANIC, "hash_xlog_split_allocate_page: failed to acquire cleanup lock"); > > > > Why do we just crash if we don't already have a cleanup lock? That can't be > > right. Or is there supposed to be a guarantee this can't happen? > > Perhaps the code assumes that when xl_hash_split_allocate_page record was > written, the new_bucket field referred to an unused page, and so during > replay it should also refer to an unused page, and being unused, that nobody > will have it pinned. But at least in heap we sometimes pin unused pages > just long enough to examine them and to see that they are unused. Maybe > something like that is happening here? I don't think it's a safe assumption that nobody would hold a pin on such a page during recovery. While not the case here, somebody else could have used pg_prewarm to read it in. But also, the checkpointer or bgwriter could have it temporarily pinned, to write it out, or another backend could try to write it out as a victim buffer and have it temporarily pinned. static int SyncOneBuffer(int buf_id, bool skip_recently_used, WritebackContext *wb_context) { ... /* * Pin it, share-lock it, write it. (FlushBuffer will do nothing if the * buffer is clean by the time we've locked it.) */ PinBuffer_Locked(bufHdr); LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED); As you can see we acquire a pin without holding a lock on the page (and that can't be changed!). I assume this is trying to defend against some sort of deadlock by not actually getting a cleanup lock (by passing get_cleanup_lock = true to XLogReadBufferForRedoExtended()). I don't think it's possible to rely on a dirty page to never be pinned by another backend. All you can rely on with a cleanup lock is that there's no *prior* references to the buffer, and thus it's safe to reorganize the buffer, because the pin-holder hasn't yet gotten a lock on the page. > I'd be curious to see the count returned by > BUF_STATE_GET_REFCOUNT(LockBufHdr(newbuf)) right before this panic. If it's > just 1, then it's not another backend, but our own, and we'd want to debug > why we're pinning the same page twice (or more) while replaying wal. This was the first time in a couple hundred runs on that I have seen this, so I don't think it's that easily debuggable for me. > Otherwise, maybe it's a race condition with some other process that > transiently pins a buffer and occasionally causes this code to panic? As pointed out above, it's legal to have a transient pin on a page, so this just looks like a bad assumption in the hash code to me. Greetings, Andres Freund
pgsql-hackers by date: