Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock - Mailing list pgsql-hackers

From Andres Freund
Subject Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Date
Msg-id 20220817183623.w3fsoerpaunt7exe@awork3.anarazel.de
Whole thread Raw
In response to Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Hi,

On 2022-08-17 10:18:14 +0530, Amit Kapila wrote:
> > Looking at the non-recovery code makes me even more suspicious:
> >
> >         /*
> >          * Physically allocate the new bucket's primary page.  We want to do this
> >          * before changing the metapage's mapping info, in case we can't get the
> >          * disk space.  Ideally, we don't need to check for cleanup lock on new
> >          * bucket as no other backend could find this bucket unless meta page is
> >          * updated.  However, it is good to be consistent with old bucket locking.
> >          */
> >         buf_nblkno = _hash_getnewbuf(rel, start_nblkno, MAIN_FORKNUM);
> >         if (!IsBufferCleanupOK(buf_nblkno))
> >         {
> >                 _hash_relbuf(rel, buf_oblkno);
> >                 _hash_relbuf(rel, buf_nblkno);
> >                 goto fail;
> >         }
> >
> >
> > _hash_getnewbuf() calls _hash_pageinit() which calls PageInit(), which
> > memset(0)s the whole page. What does it even mean to check whether you
> > effectively have a cleanup lock after you zeroed out the page?
> >
> > Reading the README and the comment above makes me wonder if this whole cleanup
> > lock business here is just cargo culting and could be dropped?
> >
> 
> I think it is okay to not acquire a clean-up lock on the new bucket
> page both in recovery and non-recovery paths. It is primarily required
> on the old bucket page to avoid concurrent scans/inserts. As mentioned
> in the comments and as per my memory serves, it is mainly for keeping
> it consistent with old bucket locking.

It's not keeping it consistent with bucket locking to zero out a page before
getting a cleanup lock, hopefully at least. This code is just broken on
multiple fronts, and consistency isn't a defense.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: cataloguing NOT NULL constraints
Next
From: Andres Freund
Date:
Subject: Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock