Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock - Mailing list pgsql-hackers

From Andres Freund
Subject Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Date
Msg-id 20220810052830.g7vf2jktxix22wmt@awork3.anarazel.de
Whole thread Raw
In response to Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock  (Mark Dilger <mark.dilger@enterprisedb.com>)
Responses Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
List pgsql-hackers
Hi,

On 2022-08-09 20:21:19 -0700, Mark Dilger wrote:
> > On Aug 9, 2022, at 7:26 PM, Andres Freund <andres@anarazel.de> wrote:
> >
> > The relevant code triggering it:
> >
> >     newbuf = XLogInitBufferForRedo(record, 1);
> >     _hash_initbuf(newbuf, xlrec->new_bucket, xlrec->new_bucket,
> >                   xlrec->new_bucket_flag, true);
> >     if (!IsBufferCleanupOK(newbuf))
> >         elog(PANIC, "hash_xlog_split_allocate_page: failed to acquire cleanup lock");
> >
> > Why do we just crash if we don't already have a cleanup lock? That can't be
> > right. Or is there supposed to be a guarantee this can't happen?
>
> Perhaps the code assumes that when xl_hash_split_allocate_page record was
> written, the new_bucket field referred to an unused page, and so during
> replay it should also refer to an unused page, and being unused, that nobody
> will have it pinned.  But at least in heap we sometimes pin unused pages
> just long enough to examine them and to see that they are unused.  Maybe
> something like that is happening here?

I don't think it's a safe assumption that nobody would hold a pin on such a
page during recovery. While not the case here, somebody else could have used
pg_prewarm to read it in.

But also, the checkpointer or bgwriter could have it temporarily pinned, to
write it out, or another backend could try to write it out as a victim buffer
and have it temporarily pinned.


static int
SyncOneBuffer(int buf_id, bool skip_recently_used, WritebackContext *wb_context)
{
...
    /*
     * Pin it, share-lock it, write it.  (FlushBuffer will do nothing if the
     * buffer is clean by the time we've locked it.)
     */
    PinBuffer_Locked(bufHdr);
    LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);


As you can see we acquire a pin without holding a lock on the page (and that
can't be changed!).


I assume this is trying to defend against some sort of deadlock by not
actually getting a cleanup lock (by passing get_cleanup_lock = true to
XLogReadBufferForRedoExtended()).

I don't think it's possible to rely on a dirty page to never be pinned by
another backend. All you can rely on with a cleanup lock is that there's no
*prior* references to the buffer, and thus it's safe to reorganize the buffer,
because the pin-holder hasn't yet gotten a lock on the page.


> I'd be curious to see the count returned by
> BUF_STATE_GET_REFCOUNT(LockBufHdr(newbuf)) right before this panic.  If it's
> just 1, then it's not another backend, but our own, and we'd want to debug
> why we're pinning the same page twice (or more) while replaying wal.

This was the first time in a couple hundred runs on that I have seen this, so
I don't think it's that easily debuggable for me.


> Otherwise, maybe it's a race condition with some other process that
> transiently pins a buffer and occasionally causes this code to panic?

As pointed out above, it's legal to have a transient pin on a page, so this
just looks like a bad assumption in the hash code to me.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Blocking the use of TRIGGER privilege
Next
From: Thomas Munro
Date:
Subject: Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock