Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock - Mailing list pgsql-hackers

From Robert Haas
Subject Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Date
Msg-id CA+TgmoYruVb7Nh5TUt47sTyYui2zE8Ke9T3DcHeB1wSkb=uSCw@mail.gmail.com
Whole thread Raw
In response to Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock  (Andres Freund <andres@anarazel.de>)
Responses Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
List pgsql-hackers
On Fri, Oct 14, 2022 at 2:21 PM Andres Freund <andres@anarazel.de> wrote:
> On 2022-10-14 10:40:11 +0530, Amit Kapila wrote:
> > On Fri, Oct 14, 2022 at 2:25 AM Andres Freund <andres@anarazel.de> wrote:
> > Fair point. How about something like: "XXX Do we really need to check
> > for cleanup lock on the new bucket? Here, we initialize the page, so
> > ideally we don't need to perform any operation that requires such a
> > check."?.
>
> This still seems to omit that the code is quite broken.

I don't think it's the job of a commit which is trying to fix a
certain bug to document all the other bugs it isn't fixing. If that
were the standard, we'd never be able to commit any bug fixes, which
is exactly what's happening on this thread. The first hunk of 0001
demonstrably fixes a real bug that has real consequences and, as far
as I can see, makes nothing worse in any way. We should commit that
hunk - and only that hunk - back-patch it all the way, and be happy to
have gotten something done. We've known what the correct fix is here
for 2 months and we're not doing anything about it because there's
some other problem that we're trying to worry about at the same time.
Let's stop doing that.

To put that another way, we don't need to fix or document anything
else to have a conditional cleanup lock acquisition shouldn't be
conditional. If it shouldn't be a cleanup lock either, well, that's a
separate patch.

Alternatively, if we all agree that 0001 and 0002 are both safe and
correct, then let's just merge the two patches together and commit it
with an explanatory message like:

===
Don't require a cleanup lock on the new page when splitting a hash index bucket.

The previous code took a cleanup lock conditionally and panicked if it
failed, which is wrong, because a process such as the background
writer or checkpointer can transiently pin pages on a standby. We
could make the cleanup lock acquisition unconditional, but it turns
out that it isn't needed at all, because no scan can examine the new
bucket before the metapage has been updated, and thus an exclusive
lock on the new bucket's primary page is sufficient. Note that we
still need a cleanup lock on the old bucket's primary page, because
that one is visible to scans.
===

> > Feel free to suggest something better.
>
> How about something like:
>
>   XXX: This code is wrong, we're overwriting the buffer before "acquiring" the
>   cleanup lock. Currently this is not known to have bad consequences because
>   XYZ and the fix seems a bit too risky for the backbranches.

I think here you're talking about the code that runs in
normal-running, not recovery, in _hash_expandtable, where we call
_hash_getnewbuf and then IsBufferCleanupOK. That code indeed seems
stupid, because as you say, there's no point in calling
_hash_getnewbuf() and thus overwriting the buffer and then only
afterwards checking IsBufferCleanupOK. By then the die is cast. But at
the same time, I think that it's not wrong in any way that matters to
the best of our current knowledge. That new buffer that we just went
and got might be pinned by somebody else, but they can't be doing
anything interesting with it, because we wouldn't be allocating it as
a new page if it were already in use for anything, and only one
process is allowed to be doing such an allocation at a time. That both
means that we can likely remove the cleanup lock acquisition, but it
also means that if we don't, there is no correctness problem here,
strictly speaking.

So I would suggest that if we feel we absolutely must put a comment
here, we could make it say something like "XXX. It doesn't make sense
to call _hash_getnewbuf() first, zeroing the buffer, and then only
afterwards check whether we have a cleanup lock. However, since no
scan can be accessing the new buffer yet, any concurrent accesses will
just be from processes like the bgwriter or checkpointer which don't
care about its contents, so it doesn't really matter."

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Matthias van de Meent
Date:
Subject: Re: PATCH: Using BRIN indexes for sorted output
Next
From: Andrew Dunstan
Date:
Subject: Re: ssl tests aren't concurrency safe due to get_free_port()