Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Date
Msg-id CAA4eK1LA3_vs8s4dtSNq-=tOvp6wgTnrZJOpSxJX3i5SLC5iOw@mail.gmail.com
Whole thread Raw
In response to Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock  (Andres Freund <andres@anarazel.de>)
Responses Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
List pgsql-hackers
On Wed, Aug 10, 2022 at 10:58 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-08-09 20:21:19 -0700, Mark Dilger wrote:
> > > On Aug 9, 2022, at 7:26 PM, Andres Freund <andres@anarazel.de> wrote:
> > >
> > > The relevant code triggering it:
> > >
> > >     newbuf = XLogInitBufferForRedo(record, 1);
> > >     _hash_initbuf(newbuf, xlrec->new_bucket, xlrec->new_bucket,
> > >                               xlrec->new_bucket_flag, true);
> > >     if (!IsBufferCleanupOK(newbuf))
> > >             elog(PANIC, "hash_xlog_split_allocate_page: failed to acquire cleanup lock");
> > >
> > > Why do we just crash if we don't already have a cleanup lock? That can't be
> > > right. Or is there supposed to be a guarantee this can't happen?
> >
> > Perhaps the code assumes that when xl_hash_split_allocate_page record was
> > written, the new_bucket field referred to an unused page, and so during
> > replay it should also refer to an unused page, and being unused, that nobody
> > will have it pinned.  But at least in heap we sometimes pin unused pages
> > just long enough to examine them and to see that they are unused.  Maybe
> > something like that is happening here?
>
> I don't think it's a safe assumption that nobody would hold a pin on such a
> page during recovery. While not the case here, somebody else could have used
> pg_prewarm to read it in.
>
> But also, the checkpointer or bgwriter could have it temporarily pinned, to
> write it out, or another backend could try to write it out as a victim buffer
> and have it temporarily pinned.
>
>
> static int
> SyncOneBuffer(int buf_id, bool skip_recently_used, WritebackContext *wb_context)
> {
> ...
>         /*
>          * Pin it, share-lock it, write it.  (FlushBuffer will do nothing if the
>          * buffer is clean by the time we've locked it.)
>          */
>         PinBuffer_Locked(bufHdr);
>         LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
>
>
> As you can see we acquire a pin without holding a lock on the page (and that
> can't be changed!).
>

I think this could be the probable reason for failure though I didn't
try to debug/reproduce this yet. AFAIU, this is possible during
recovery/replay of WAL record XLOG_HASH_SPLIT_ALLOCATE_PAGE as via
XLogReadBufferForRedoExtended, we can mark the buffer dirty while
restoring from full page image. OTOH, because during normal operation
we didn't mark the page dirty SyncOneBuffer would have skipped it due
to check (if (!(buf_state & BM_VALID) || !(buf_state & BM_DIRTY))).

>
> I assume this is trying to defend against some sort of deadlock by not
> actually getting a cleanup lock (by passing get_cleanup_lock = true to
> XLogReadBufferForRedoExtended()).
>

IIRC, this is just following what we do during normal operation and
based on the theory that the meta-page is not updated yet so no
backend will access it. I think we can do what you wrote unless there
is some other reason behind this failure.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: pg_get_constraintdef: Schema qualify foreign tables unless pretty printing is enabled
Next
From: Peter Smith
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply