Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock - Mailing list pgsql-hackers

From Andres Freund
Subject Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Date
Msg-id 20220811211246.fyalckv3y6tizfwj@awork3.anarazel.de
Whole thread Raw
In response to Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Hi,

On 2022-08-10 14:52:36 +0530, Amit Kapila wrote:
> I think this could be the probable reason for failure though I didn't
> try to debug/reproduce this yet. AFAIU, this is possible during
> recovery/replay of WAL record XLOG_HASH_SPLIT_ALLOCATE_PAGE as via
> XLogReadBufferForRedoExtended, we can mark the buffer dirty while
> restoring from full page image. OTOH, because during normal operation
> we didn't mark the page dirty SyncOneBuffer would have skipped it due
> to check (if (!(buf_state & BM_VALID) || !(buf_state & BM_DIRTY))).

I think there might still be short-lived references from other paths, even if
not marked dirty, but it isn't realy important.


> > I assume this is trying to defend against some sort of deadlock by not
> > actually getting a cleanup lock (by passing get_cleanup_lock = true to
> > XLogReadBufferForRedoExtended()).
> >
> 
> IIRC, this is just following what we do during normal operation and
> based on the theory that the meta-page is not updated yet so no
> backend will access it. I think we can do what you wrote unless there
> is some other reason behind this failure.

Well, it's not really the same if you silently continue in normal operation
and PANIC during recovery... If it's an optional operation the tiny race
around not getting the cleanup lock is fine, but it's a totally different
story during recovery.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Zhihong Yu
Date:
Subject: Re: avoid negating LONG_MIN in cash_out()
Next
From: Souvik Bhattacherjee
Date:
Subject: Reducing planning time of large IN queries on primary key / unique columns