Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock |
Date | |
Msg-id | CALDaNm2Avar8KWTx-DxxOGW1pE9VtYrYMcue7gQPj5pJ-5ttbw@mail.gmail.com Whole thread Raw |
In response to | Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock (Amit Kapila <amit.kapila16@gmail.com>) |
List | pgsql-hackers |
On Wed, Aug 10, 2022 at 2:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Aug 10, 2022 at 10:58 AM Andres Freund <andres@anarazel.de> wrote: > > > > Hi, > > > > On 2022-08-09 20:21:19 -0700, Mark Dilger wrote: > > > > On Aug 9, 2022, at 7:26 PM, Andres Freund <andres@anarazel.de> wrote: > > > > > > > > The relevant code triggering it: > > > > > > > > newbuf = XLogInitBufferForRedo(record, 1); > > > > _hash_initbuf(newbuf, xlrec->new_bucket, xlrec->new_bucket, > > > > xlrec->new_bucket_flag, true); > > > > if (!IsBufferCleanupOK(newbuf)) > > > > elog(PANIC, "hash_xlog_split_allocate_page: failed to acquire cleanup lock"); > > > > > > > > Why do we just crash if we don't already have a cleanup lock? That can't be > > > > right. Or is there supposed to be a guarantee this can't happen? > > > > > > Perhaps the code assumes that when xl_hash_split_allocate_page record was > > > written, the new_bucket field referred to an unused page, and so during > > > replay it should also refer to an unused page, and being unused, that nobody > > > will have it pinned. But at least in heap we sometimes pin unused pages > > > just long enough to examine them and to see that they are unused. Maybe > > > something like that is happening here? > > > > I don't think it's a safe assumption that nobody would hold a pin on such a > > page during recovery. While not the case here, somebody else could have used > > pg_prewarm to read it in. > > > > But also, the checkpointer or bgwriter could have it temporarily pinned, to > > write it out, or another backend could try to write it out as a victim buffer > > and have it temporarily pinned. > > > > > > static int > > SyncOneBuffer(int buf_id, bool skip_recently_used, WritebackContext *wb_context) > > { > > ... > > /* > > * Pin it, share-lock it, write it. (FlushBuffer will do nothing if the > > * buffer is clean by the time we've locked it.) > > */ > > PinBuffer_Locked(bufHdr); > > LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED); > > > > > > As you can see we acquire a pin without holding a lock on the page (and that > > can't be changed!). > > > > I think this could be the probable reason for failure though I didn't > try to debug/reproduce this yet. AFAIU, this is possible during > recovery/replay of WAL record XLOG_HASH_SPLIT_ALLOCATE_PAGE as via > XLogReadBufferForRedoExtended, we can mark the buffer dirty while > restoring from full page image. OTOH, because during normal operation > we didn't mark the page dirty SyncOneBuffer would have skipped it due > to check (if (!(buf_state & BM_VALID) || !(buf_state & BM_DIRTY))). I'm trying to simulate the scenario in streaming replication using the below: CREATE TABLE pvactst (i INT, a INT[], p POINT) with (autovacuum_enabled = off); CREATE INDEX hash_pvactst ON pvactst USING hash (i); INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM generate_series(1,1000) i; With the above scenario, it will be able to replay allocation of page for split operation. I will slightly change the above statements and try to debug and see if we can make the background writer process to pin this buffer and simulate the scenario. I will post my findings once I'm done with the analysis. Regards, Vignesh
pgsql-hackers by date: