Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock |
Date | |
Msg-id | CA+TgmoYruVb7Nh5TUt47sTyYui2zE8Ke9T3DcHeB1wSkb=uSCw@mail.gmail.com Whole thread Raw |
In response to | Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock (Andres Freund <andres@anarazel.de>) |
Responses |
Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
|
List | pgsql-hackers |
On Fri, Oct 14, 2022 at 2:21 PM Andres Freund <andres@anarazel.de> wrote: > On 2022-10-14 10:40:11 +0530, Amit Kapila wrote: > > On Fri, Oct 14, 2022 at 2:25 AM Andres Freund <andres@anarazel.de> wrote: > > Fair point. How about something like: "XXX Do we really need to check > > for cleanup lock on the new bucket? Here, we initialize the page, so > > ideally we don't need to perform any operation that requires such a > > check."?. > > This still seems to omit that the code is quite broken. I don't think it's the job of a commit which is trying to fix a certain bug to document all the other bugs it isn't fixing. If that were the standard, we'd never be able to commit any bug fixes, which is exactly what's happening on this thread. The first hunk of 0001 demonstrably fixes a real bug that has real consequences and, as far as I can see, makes nothing worse in any way. We should commit that hunk - and only that hunk - back-patch it all the way, and be happy to have gotten something done. We've known what the correct fix is here for 2 months and we're not doing anything about it because there's some other problem that we're trying to worry about at the same time. Let's stop doing that. To put that another way, we don't need to fix or document anything else to have a conditional cleanup lock acquisition shouldn't be conditional. If it shouldn't be a cleanup lock either, well, that's a separate patch. Alternatively, if we all agree that 0001 and 0002 are both safe and correct, then let's just merge the two patches together and commit it with an explanatory message like: === Don't require a cleanup lock on the new page when splitting a hash index bucket. The previous code took a cleanup lock conditionally and panicked if it failed, which is wrong, because a process such as the background writer or checkpointer can transiently pin pages on a standby. We could make the cleanup lock acquisition unconditional, but it turns out that it isn't needed at all, because no scan can examine the new bucket before the metapage has been updated, and thus an exclusive lock on the new bucket's primary page is sufficient. Note that we still need a cleanup lock on the old bucket's primary page, because that one is visible to scans. === > > Feel free to suggest something better. > > How about something like: > > XXX: This code is wrong, we're overwriting the buffer before "acquiring" the > cleanup lock. Currently this is not known to have bad consequences because > XYZ and the fix seems a bit too risky for the backbranches. I think here you're talking about the code that runs in normal-running, not recovery, in _hash_expandtable, where we call _hash_getnewbuf and then IsBufferCleanupOK. That code indeed seems stupid, because as you say, there's no point in calling _hash_getnewbuf() and thus overwriting the buffer and then only afterwards checking IsBufferCleanupOK. By then the die is cast. But at the same time, I think that it's not wrong in any way that matters to the best of our current knowledge. That new buffer that we just went and got might be pinned by somebody else, but they can't be doing anything interesting with it, because we wouldn't be allocating it as a new page if it were already in use for anything, and only one process is allowed to be doing such an allocation at a time. That both means that we can likely remove the cleanup lock acquisition, but it also means that if we don't, there is no correctness problem here, strictly speaking. So I would suggest that if we feel we absolutely must put a comment here, we could make it say something like "XXX. It doesn't make sense to call _hash_getnewbuf() first, zeroing the buffer, and then only afterwards check whether we have a cleanup lock. However, since no scan can be accessing the new buffer yet, any concurrent accesses will just be from processes like the bgwriter or checkpointer which don't care about its contents, so it doesn't really matter." -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: