Thread: hash_xlog_split_allocate_page: failed to acquire cleanup lock

hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Andres Freund

Date:

10 August 2022, 02:26:17

Hi,

One CI run for the meson branch just failed in a way I hadn't seen before on
windows, when nothing had changed on windows

https://cirrus-ci.com/task/6111743586861056

027_stream_regress.pl ended up failing due to a timeout. Which in turn was
caused by the standby crashing.

2022-08-10 01:46:20.731 GMT [2212][startup] PANIC:  hash_xlog_split_allocate_page: failed to acquire cleanup lock
2022-08-10 01:46:20.731 GMT [2212][startup] CONTEXT:  WAL redo at 0/7A6EED8 for Hash/SPLIT_ALLOCATE_PAGE: new_bucket
31,meta_page_masks_updated F, issplitpoint_changed F; blkref #0: rel 1663/16384/24210, blk 23; blkref #1: rel
1663/16384/24210,blk 45; blkref #2: rel 1663/16384/24210, blk 0
 
abort() has been called2022-08-10 01:46:31.919 GMT [7560][checkpointer] LOG:  restartpoint starting: time
2022-08-10 01:46:32.430 GMT [8304][postmaster] LOG:  startup process (PID 2212) was terminated by exception 0xC0000354

stack dump:

https://api.cirrus-ci.com/v1/artifact/task/6111743586861056/crashlog/crashlog-postgres.exe_21c8_2022-08-10_01-46-28-215.txt

The relevant code triggering it:

    newbuf = XLogInitBufferForRedo(record, 1);
    _hash_initbuf(newbuf, xlrec->new_bucket, xlrec->new_bucket,
                  xlrec->new_bucket_flag, true);
    if (!IsBufferCleanupOK(newbuf))
        elog(PANIC, "hash_xlog_split_allocate_page: failed to acquire cleanup lock");

Why do we just crash if we don't already have a cleanup lock? That can't be
right. Or is there supposed to be a guarantee this can't happen?


Greetings,

Andres Freund

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Mark Dilger

Date:

10 August 2022, 03:21:19

> On Aug 9, 2022, at 7:26 PM, Andres Freund <andres@anarazel.de> wrote:
>
> The relevant code triggering it:
>
>     newbuf = XLogInitBufferForRedo(record, 1);
>     _hash_initbuf(newbuf, xlrec->new_bucket, xlrec->new_bucket,
>                   xlrec->new_bucket_flag, true);
>     if (!IsBufferCleanupOK(newbuf))
>         elog(PANIC, "hash_xlog_split_allocate_page: failed to acquire cleanup lock");
>
> Why do we just crash if we don't already have a cleanup lock? That can't be
> right. Or is there supposed to be a guarantee this can't happen?

Perhaps the code assumes that when xl_hash_split_allocate_page record was written, the new_bucket field referred to an
unusedpage, and so during replay it should also refer to an unused page, and being unused, that nobody will have it
pinned. But at least in heap we sometimes pin unused pages just long enough to examine them and to see that they are
unused. Maybe something like that is happening here? 

I'd be curious to see the count returned by BUF_STATE_GET_REFCOUNT(LockBufHdr(newbuf)) right before this panic.  If
it'sjust 1, then it's not another backend, but our own, and we'd want to debug why we're pinning the same page twice
(ormore) while replaying wal.  Otherwise, maybe it's a race condition with some other process that transiently pins a
bufferand occasionally causes this code to panic? 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Thomas Munro

Date:

10 August 2022, 04:38:45

On Wed, Aug 10, 2022 at 3:21 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> > On Aug 9, 2022, at 7:26 PM, Andres Freund <andres@anarazel.de> wrote:
> > The relevant code triggering it:
> >
> >       newbuf = XLogInitBufferForRedo(record, 1);
> >       _hash_initbuf(newbuf, xlrec->new_bucket, xlrec->new_bucket,
> >                                 xlrec->new_bucket_flag, true);
> >       if (!IsBufferCleanupOK(newbuf))
> >               elog(PANIC, "hash_xlog_split_allocate_page: failed to acquire cleanup lock");
> >
> > Why do we just crash if we don't already have a cleanup lock? That can't be
> > right. Or is there supposed to be a guarantee this can't happen?
>
> Perhaps the code assumes that when xl_hash_split_allocate_page record was written, the new_bucket field referred to
anunused page, and so during replay it should also refer to an unused page, and being unused, that nobody will have it
pinned. But at least in heap we sometimes pin unused pages just long enough to examine them and to see that they are
unused. Maybe something like that is happening here? 

Here's an email about that:

https://www.postgresql.org/message-id/CAE9k0P=OXww6RQCGrmDNa8=L3EeB01SGbYuP23y-qZJ=4td38Q@mail.gmail.com

> I'd be curious to see the count returned by BUF_STATE_GET_REFCOUNT(LockBufHdr(newbuf)) right before this panic.  If
it'sjust 1, then it's not another backend, but our own, and we'd want to debug why we're pinning the same page twice
(ormore) while replaying wal.  Otherwise, maybe it's a race condition with some other process that transiently pins a
bufferand occasionally causes this code to panic? 

But which backend could that be?  We aren't starting any at that point
in the test.

Someone might wonder if it's the startup process itself via the new
WAL prefetching machinery, but that doesn't pin pages, it only probes
the buffer mapping table to see if future pages are cached already
(see bufmgr.c PrefetchSharedBuffer()).  (This is a topic I've thought
about a bit because I have another installment of recovery prefetching
in development using real AIO that *does* pin pages in advance, and
has to deal with code that wants cleanup locks like this...)

It's possible that git log src/backend/access/hash/ can explain a
behaviour change, as there were some recent changes there, but it's
not jumping out at me.  Maybe 4f1f5a7f "Remove fls(), use
pg_leftmost_one_pos32() instead." has a maths error, but I don't see
it.  Maybe e09d7a12 "Improve speed of hash index build." accidentally
reaches a new state and triggers a latent bug.  Maybe a latent bug
showed up now just because we started testing recovery not too long
ago... but all of that still needs another backend involved.  We can
see which blocks the startup process has pinned, 23 != 45.  Hmmm.

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Andres Freund

Date:

10 August 2022, 05:28:30

Hi,

On 2022-08-09 20:21:19 -0700, Mark Dilger wrote:
> > On Aug 9, 2022, at 7:26 PM, Andres Freund <andres@anarazel.de> wrote:
> >
> > The relevant code triggering it:
> >
> >     newbuf = XLogInitBufferForRedo(record, 1);
> >     _hash_initbuf(newbuf, xlrec->new_bucket, xlrec->new_bucket,
> >                   xlrec->new_bucket_flag, true);
> >     if (!IsBufferCleanupOK(newbuf))
> >         elog(PANIC, "hash_xlog_split_allocate_page: failed to acquire cleanup lock");
> >
> > Why do we just crash if we don't already have a cleanup lock? That can't be
> > right. Or is there supposed to be a guarantee this can't happen?
>
> Perhaps the code assumes that when xl_hash_split_allocate_page record was
> written, the new_bucket field referred to an unused page, and so during
> replay it should also refer to an unused page, and being unused, that nobody
> will have it pinned.  But at least in heap we sometimes pin unused pages
> just long enough to examine them and to see that they are unused.  Maybe
> something like that is happening here?

I don't think it's a safe assumption that nobody would hold a pin on such a
page during recovery. While not the case here, somebody else could have used
pg_prewarm to read it in.

But also, the checkpointer or bgwriter could have it temporarily pinned, to
write it out, or another backend could try to write it out as a victim buffer
and have it temporarily pinned.

static int
SyncOneBuffer(int buf_id, bool skip_recently_used, WritebackContext *wb_context)
{
...
    /*
     * Pin it, share-lock it, write it.  (FlushBuffer will do nothing if the
     * buffer is clean by the time we've locked it.)
     */
    PinBuffer_Locked(bufHdr);
    LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);

As you can see we acquire a pin without holding a lock on the page (and that
can't be changed!).

I assume this is trying to defend against some sort of deadlock by not
actually getting a cleanup lock (by passing get_cleanup_lock = true to
XLogReadBufferForRedoExtended()).

I don't think it's possible to rely on a dirty page to never be pinned by
another backend. All you can rely on with a cleanup lock is that there's no
*prior* references to the buffer, and thus it's safe to reorganize the buffer,
because the pin-holder hasn't yet gotten a lock on the page.

> I'd be curious to see the count returned by
> BUF_STATE_GET_REFCOUNT(LockBufHdr(newbuf)) right before this panic.  If it's
> just 1, then it's not another backend, but our own, and we'd want to debug
> why we're pinning the same page twice (or more) while replaying wal.

This was the first time in a couple hundred runs on that I have seen this, so
I don't think it's that easily debuggable for me.

> Otherwise, maybe it's a race condition with some other process that
> transiently pins a buffer and occasionally causes this code to panic?

As pointed out above, it's legal to have a transient pin on a page, so this
just looks like a bad assumption in the hash code to me.

Greetings,

Andres Freund

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Thomas Munro

Date:

10 August 2022, 05:35:09

On Wed, Aug 10, 2022 at 5:28 PM Andres Freund <andres@anarazel.de> wrote:
> I don't think it's a safe assumption that nobody would hold a pin on such a
> page during recovery. While not the case here, somebody else could have used
> pg_prewarm to read it in.
>
> But also, the checkpointer or bgwriter could have it temporarily pinned, to
> write it out, or another backend could try to write it out as a victim buffer
> and have it temporarily pinned.

Right, of course.  So it's just that hash indexes didn't get xlog'd
until 2017, and still aren't very popular, and then recovery didn't
get regression tested until 2021, so nobody ever hit it.

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Thomas Munro

Date:

10 August 2022, 05:36:41

On Wed, Aug 10, 2022 at 5:35 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> 2021

Or, rather, 14 days into 2022 :-)

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Amit Kapila

Date:

10 August 2022, 09:22:36

On Wed, Aug 10, 2022 at 10:58 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-08-09 20:21:19 -0700, Mark Dilger wrote:
> > > On Aug 9, 2022, at 7:26 PM, Andres Freund <andres@anarazel.de> wrote:
> > >
> > > The relevant code triggering it:
> > >
> > >     newbuf = XLogInitBufferForRedo(record, 1);
> > >     _hash_initbuf(newbuf, xlrec->new_bucket, xlrec->new_bucket,
> > >                               xlrec->new_bucket_flag, true);
> > >     if (!IsBufferCleanupOK(newbuf))
> > >             elog(PANIC, "hash_xlog_split_allocate_page: failed to acquire cleanup lock");
> > >
> > > Why do we just crash if we don't already have a cleanup lock? That can't be
> > > right. Or is there supposed to be a guarantee this can't happen?
> >
> > Perhaps the code assumes that when xl_hash_split_allocate_page record was
> > written, the new_bucket field referred to an unused page, and so during
> > replay it should also refer to an unused page, and being unused, that nobody
> > will have it pinned.  But at least in heap we sometimes pin unused pages
> > just long enough to examine them and to see that they are unused.  Maybe
> > something like that is happening here?
>
> I don't think it's a safe assumption that nobody would hold a pin on such a
> page during recovery. While not the case here, somebody else could have used
> pg_prewarm to read it in.
>
> But also, the checkpointer or bgwriter could have it temporarily pinned, to
> write it out, or another backend could try to write it out as a victim buffer
> and have it temporarily pinned.
>
>
> static int
> SyncOneBuffer(int buf_id, bool skip_recently_used, WritebackContext *wb_context)
> {
> ...
>         /*
>          * Pin it, share-lock it, write it.  (FlushBuffer will do nothing if the
>          * buffer is clean by the time we've locked it.)
>          */
>         PinBuffer_Locked(bufHdr);
>         LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
>
>
> As you can see we acquire a pin without holding a lock on the page (and that
> can't be changed!).
>

I think this could be the probable reason for failure though I didn't
try to debug/reproduce this yet. AFAIU, this is possible during
recovery/replay of WAL record XLOG_HASH_SPLIT_ALLOCATE_PAGE as via
XLogReadBufferForRedoExtended, we can mark the buffer dirty while
restoring from full page image. OTOH, because during normal operation
we didn't mark the page dirty SyncOneBuffer would have skipped it due
to check (if (!(buf_state & BM_VALID) || !(buf_state & BM_DIRTY))).

>
> I assume this is trying to defend against some sort of deadlock by not
> actually getting a cleanup lock (by passing get_cleanup_lock = true to
> XLogReadBufferForRedoExtended()).
>

IIRC, this is just following what we do during normal operation and
based on the theory that the meta-page is not updated yet so no
backend will access it. I think we can do what you wrote unless there
is some other reason behind this failure.

-- 
With Regards,
Amit Kapila.

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Robert Haas

Date:

10 August 2022, 13:09:16

On Wed, Aug 10, 2022 at 12:39 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> Here's an email about that:
>
> https://www.postgresql.org/message-id/CAE9k0P=OXww6RQCGrmDNa8=L3EeB01SGbYuP23y-qZJ=4td38Q@mail.gmail.com

Hmm. If I'm reading that email correctly, it indicates that I noticed
this problem before commit and asked for it to be changed, but then
for some reason it wasn't changed and I still committed it.

I can't immediately think of a reason why it wouldn't be safe to
insist on acquiring a cleanup lock there.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

vignesh C

Date:

11 August 2022, 16:36:38

On Wed, Aug 10, 2022 at 2:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Aug 10, 2022 at 10:58 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2022-08-09 20:21:19 -0700, Mark Dilger wrote:
> > > > On Aug 9, 2022, at 7:26 PM, Andres Freund <andres@anarazel.de> wrote:
> > > >
> > > > The relevant code triggering it:
> > > >
> > > >     newbuf = XLogInitBufferForRedo(record, 1);
> > > >     _hash_initbuf(newbuf, xlrec->new_bucket, xlrec->new_bucket,
> > > >                               xlrec->new_bucket_flag, true);
> > > >     if (!IsBufferCleanupOK(newbuf))
> > > >             elog(PANIC, "hash_xlog_split_allocate_page: failed to acquire cleanup lock");
> > > >
> > > > Why do we just crash if we don't already have a cleanup lock? That can't be
> > > > right. Or is there supposed to be a guarantee this can't happen?
> > >
> > > Perhaps the code assumes that when xl_hash_split_allocate_page record was
> > > written, the new_bucket field referred to an unused page, and so during
> > > replay it should also refer to an unused page, and being unused, that nobody
> > > will have it pinned.  But at least in heap we sometimes pin unused pages
> > > just long enough to examine them and to see that they are unused.  Maybe
> > > something like that is happening here?
> >
> > I don't think it's a safe assumption that nobody would hold a pin on such a
> > page during recovery. While not the case here, somebody else could have used
> > pg_prewarm to read it in.
> >
> > But also, the checkpointer or bgwriter could have it temporarily pinned, to
> > write it out, or another backend could try to write it out as a victim buffer
> > and have it temporarily pinned.
> >
> >
> > static int
> > SyncOneBuffer(int buf_id, bool skip_recently_used, WritebackContext *wb_context)
> > {
> > ...
> >         /*
> >          * Pin it, share-lock it, write it.  (FlushBuffer will do nothing if the
> >          * buffer is clean by the time we've locked it.)
> >          */
> >         PinBuffer_Locked(bufHdr);
> >         LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
> >
> >
> > As you can see we acquire a pin without holding a lock on the page (and that
> > can't be changed!).
> >
>
> I think this could be the probable reason for failure though I didn't
> try to debug/reproduce this yet. AFAIU, this is possible during
> recovery/replay of WAL record XLOG_HASH_SPLIT_ALLOCATE_PAGE as via
> XLogReadBufferForRedoExtended, we can mark the buffer dirty while
> restoring from full page image. OTOH, because during normal operation
> we didn't mark the page dirty SyncOneBuffer would have skipped it due
> to check (if (!(buf_state & BM_VALID) || !(buf_state & BM_DIRTY))).

I'm trying to simulate the scenario in streaming replication using the below:
CREATE TABLE pvactst (i INT, a INT[], p POINT) with (autovacuum_enabled = off);
CREATE INDEX hash_pvactst ON pvactst USING hash (i);
INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM
generate_series(1,1000) i;

With the above scenario, it will be able to replay allocation of page
for split operation. I will slightly change the above statements and
try to debug and see if we can make the background writer process to
pin this buffer and simulate the scenario. I will post my findings
once I'm done with the analysis.

Regards,
Vignesh

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Andres Freund

Date:

11 August 2022, 21:12:46

Hi,

On 2022-08-10 14:52:36 +0530, Amit Kapila wrote:
> I think this could be the probable reason for failure though I didn't
> try to debug/reproduce this yet. AFAIU, this is possible during
> recovery/replay of WAL record XLOG_HASH_SPLIT_ALLOCATE_PAGE as via
> XLogReadBufferForRedoExtended, we can mark the buffer dirty while
> restoring from full page image. OTOH, because during normal operation
> we didn't mark the page dirty SyncOneBuffer would have skipped it due
> to check (if (!(buf_state & BM_VALID) || !(buf_state & BM_DIRTY))).

I think there might still be short-lived references from other paths, even if
not marked dirty, but it isn't realy important.


> > I assume this is trying to defend against some sort of deadlock by not
> > actually getting a cleanup lock (by passing get_cleanup_lock = true to
> > XLogReadBufferForRedoExtended()).
> >
> 
> IIRC, this is just following what we do during normal operation and
> based on the theory that the meta-page is not updated yet so no
> backend will access it. I think we can do what you wrote unless there
> is some other reason behind this failure.

Well, it's not really the same if you silently continue in normal operation
and PANIC during recovery... If it's an optional operation the tiny race
around not getting the cleanup lock is fine, but it's a totally different
story during recovery.

Greetings,

Andres Freund

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Robert Haas

Date:

16 August 2022, 20:46:59

On Wed, Aug 10, 2022 at 1:28 AM Andres Freund <andres@anarazel.de> wrote:
> I assume this is trying to defend against some sort of deadlock by not
> actually getting a cleanup lock (by passing get_cleanup_lock = true to
> XLogReadBufferForRedoExtended()).

I had that thought too, but I don't *think* it's the case. This
function acquires a lock on the oldest bucket page, then on the new
bucket page. We could deadlock if someone who holds a pin on the new
bucket page tries to take a content lock on the old bucket page. But
who would do that? The new bucket page isn't yet linked from the
metapage at this point, so no scan should do that. There can be no
concurrent writers during replay. I think that if someone else has the
new page pinned they probably should not be taking content locks on
other buffers at the same time.

So maybe we can just apply something like the attached.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Attachment

hash-xlog-split-allocate-page-v1.patch

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Tom Lane

Date:

16 August 2022, 21:02:27

Robert Haas <robertmhaas@gmail.com> writes:
> I had that thought too, but I don't *think* it's the case. This
> function acquires a lock on the oldest bucket page, then on the new
> bucket page. We could deadlock if someone who holds a pin on the new
> bucket page tries to take a content lock on the old bucket page. But
> who would do that? The new bucket page isn't yet linked from the
> metapage at this point, so no scan should do that. There can be no
> concurrent writers during replay. I think that if someone else has the
> new page pinned they probably should not be taking content locks on
> other buffers at the same time.

Agreed, the core code shouldn't do that, but somebody doing random stuff
with pageinspect functions could probably make a query do this.
See [1]; unless we're going to reject that bug with "don't do that",
I'm not too comfortable with this line of reasoning.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/17568-ef121b956ec1559c%40postgresql.org

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Robert Haas

Date:

16 August 2022, 23:44:52

On Tue, Aug 16, 2022 at 5:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I had that thought too, but I don't *think* it's the case. This
> > function acquires a lock on the oldest bucket page, then on the new
> > bucket page. We could deadlock if someone who holds a pin on the new
> > bucket page tries to take a content lock on the old bucket page. But
> > who would do that? The new bucket page isn't yet linked from the
> > metapage at this point, so no scan should do that. There can be no
> > concurrent writers during replay. I think that if someone else has the
> > new page pinned they probably should not be taking content locks on
> > other buffers at the same time.
>
> Agreed, the core code shouldn't do that, but somebody doing random stuff
> with pageinspect functions could probably make a query do this.
> See [1]; unless we're going to reject that bug with "don't do that",
> I'm not too comfortable with this line of reasoning.

I don't see the connection. The problem there has to do with bypassing
shared buffers, but this operation isn't bypassing shared buffers.

What sort of random things would someone do with pageinspect functions
that would hold buffer pins on one buffer while locking another one?
The functions in hashfuncs.c don't even seem like they would access
multiple buffers in total, let alone at overlapping times. And I don't
think that a query pageinspect could realistically be suspended while
holding a buffer pin either. If you wrapped it in a cursor it'd be
suspended before or after accessing any given buffer, not right in the
middle of that operation.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Tom Lane

Date:

16 August 2022, 23:55:18

Robert Haas <robertmhaas@gmail.com> writes:
> What sort of random things would someone do with pageinspect functions
> that would hold buffer pins on one buffer while locking another one?
> The functions in hashfuncs.c don't even seem like they would access
> multiple buffers in total, let alone at overlapping times. And I don't
> think that a query pageinspect could realistically be suspended while
> holding a buffer pin either. If you wrapped it in a cursor it'd be
> suspended before or after accessing any given buffer, not right in the
> middle of that operation.

pin != access.  Unless things have changed really drastically since
I last looked, a seqscan will sit on a buffer pin throughout the
series of fetches from a single page.

Admittedly, that's about *heap* page pins while indexscans have
different rules.  But I recall that btrees at least use persistent
pins as well.

It may be that there is indeed no way to make this happen with
available SQL tools.  But I wouldn't put a lot of money on that,
and even less that it'll stay true in the future.

Having said that, you're right that this is qualitatively different
from the other bug, in that this is a deadlock not apparent data
corruption.  However, IIUC it's an LWLock deadlock, which we don't
handle all that nicely.

            regards, tom lane

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Andres Freund

Date:

17 August 2022, 00:38:31

Hi,

On 2022-08-16 17:02:27 -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I had that thought too, but I don't *think* it's the case. This
> > function acquires a lock on the oldest bucket page, then on the new
> > bucket page. We could deadlock if someone who holds a pin on the new
> > bucket page tries to take a content lock on the old bucket page. But
> > who would do that? The new bucket page isn't yet linked from the
> > metapage at this point, so no scan should do that. There can be no
> > concurrent writers during replay. I think that if someone else has the
> > new page pinned they probably should not be taking content locks on
> > other buffers at the same time.
> 
> Agreed, the core code shouldn't do that, but somebody doing random stuff
> with pageinspect functions could probably make a query do this.
> See [1]; unless we're going to reject that bug with "don't do that",
> I'm not too comfortable with this line of reasoning.

I don't think we can defend against lwlock deadlocks where somebody doesn't
follow the AM's deadlock avoidance strategy. I.e. it's fine to pin and lock
pages from some AM without knowing that AM's rules, as long as you only block
while holding a pin/lock of a single page. But it is *not* ok to block waiting
for an lwlock / pin while already holding an lwlock / pin on some other
buffer.  If we were concerned about this we'd have to basically throw many of
our multi-page operations that rely on lock order logic out.

Greetings,

Andres Freund

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Andres Freund

Date:

17 August 2022, 00:57:40

Hi,

On 2022-08-16 19:55:18 -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > What sort of random things would someone do with pageinspect functions
> > that would hold buffer pins on one buffer while locking another one?
> > The functions in hashfuncs.c don't even seem like they would access
> > multiple buffers in total, let alone at overlapping times. And I don't
> > think that a query pageinspect could realistically be suspended while
> > holding a buffer pin either. If you wrapped it in a cursor it'd be
> > suspended before or after accessing any given buffer, not right in the
> > middle of that operation.
> 
> pin != access.  Unless things have changed really drastically since
> I last looked, a seqscan will sit on a buffer pin throughout the
> series of fetches from a single page.

That's still the case. But for heap that shouldn't be a problem, because we'll
never try to take a cleanup lock while holding another page locked (nor even
another heap page pinned, I think).

I find it *highly* suspect that hash needs to acquire a cleanup lock while
holding another buffer locked. The recovery aspect alone makes that seem quite
unwise. Even if there's possibly no deadlock here for some reason or another.

Looking at the non-recovery code makes me even more suspicious:

    /*
     * Physically allocate the new bucket's primary page.  We want to do this
     * before changing the metapage's mapping info, in case we can't get the
     * disk space.  Ideally, we don't need to check for cleanup lock on new
     * bucket as no other backend could find this bucket unless meta page is
     * updated.  However, it is good to be consistent with old bucket locking.
     */
    buf_nblkno = _hash_getnewbuf(rel, start_nblkno, MAIN_FORKNUM);
    if (!IsBufferCleanupOK(buf_nblkno))
    {
        _hash_relbuf(rel, buf_oblkno);
        _hash_relbuf(rel, buf_nblkno);
        goto fail;
    }

_hash_getnewbuf() calls _hash_pageinit() which calls PageInit(), which
memset(0)s the whole page. What does it even mean to check whether you
effectively have a cleanup lock after you zeroed out the page?

Reading the README and the comment above makes me wonder if this whole cleanup
lock business here is just cargo culting and could be dropped?

> Admittedly, that's about *heap* page pins while indexscans have
> different rules.  But I recall that btrees at least use persistent
> pins as well.

I think that's been changed, although not in an unproblematic way.

> Having said that, you're right that this is qualitatively different
> from the other bug, in that this is a deadlock not apparent data
> corruption.  However, IIUC it's an LWLock deadlock, which we don't
> handle all that nicely.

Theoretically the startup side could be interrupted. Except that we don't
accept the startup process dying...

Greetings,

Andres Freund

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Amit Kapila

Date:

17 August 2022, 04:48:14

On Wed, Aug 17, 2022 at 6:27 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-08-16 19:55:18 -0400, Tom Lane wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> > > What sort of random things would someone do with pageinspect functions
> > > that would hold buffer pins on one buffer while locking another one?
> > > The functions in hashfuncs.c don't even seem like they would access
> > > multiple buffers in total, let alone at overlapping times. And I don't
> > > think that a query pageinspect could realistically be suspended while
> > > holding a buffer pin either. If you wrapped it in a cursor it'd be
> > > suspended before or after accessing any given buffer, not right in the
> > > middle of that operation.
> >
> > pin != access.  Unless things have changed really drastically since
> > I last looked, a seqscan will sit on a buffer pin throughout the
> > series of fetches from a single page.
>
> That's still the case. But for heap that shouldn't be a problem, because we'll
> never try to take a cleanup lock while holding another page locked (nor even
> another heap page pinned, I think).
>
>
> I find it *highly* suspect that hash needs to acquire a cleanup lock while
> holding another buffer locked. The recovery aspect alone makes that seem quite
> unwise. Even if there's possibly no deadlock here for some reason or another.
>
>
> Looking at the non-recovery code makes me even more suspicious:
>
>         /*
>          * Physically allocate the new bucket's primary page.  We want to do this
>          * before changing the metapage's mapping info, in case we can't get the
>          * disk space.  Ideally, we don't need to check for cleanup lock on new
>          * bucket as no other backend could find this bucket unless meta page is
>          * updated.  However, it is good to be consistent with old bucket locking.
>          */
>         buf_nblkno = _hash_getnewbuf(rel, start_nblkno, MAIN_FORKNUM);
>         if (!IsBufferCleanupOK(buf_nblkno))
>         {
>                 _hash_relbuf(rel, buf_oblkno);
>                 _hash_relbuf(rel, buf_nblkno);
>                 goto fail;
>         }
>
>
> _hash_getnewbuf() calls _hash_pageinit() which calls PageInit(), which
> memset(0)s the whole page. What does it even mean to check whether you
> effectively have a cleanup lock after you zeroed out the page?
>
> Reading the README and the comment above makes me wonder if this whole cleanup
> lock business here is just cargo culting and could be dropped?
>

I think it is okay to not acquire a clean-up lock on the new bucket
page both in recovery and non-recovery paths. It is primarily required
on the old bucket page to avoid concurrent scans/inserts. As mentioned
in the comments and as per my memory serves, it is mainly for keeping
it consistent with old bucket locking.

-- 
With Regards,
Amit Kapila.

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Robert Haas

Date:

17 August 2022, 12:25:06

On Tue, Aug 16, 2022 at 8:38 PM Andres Freund <andres@anarazel.de> wrote:
> I don't think we can defend against lwlock deadlocks where somebody doesn't
> follow the AM's deadlock avoidance strategy.

That's a good way of putting it. Tom seems to be postulating that
maybe someone can use random tools that exist to take buffer locks and
pins in arbitrary order, and if that is true then you can make any AM
deadlock. I think it isn't true, though, and I think if it were true
the right fix would be to remove the tools that are letting people do
that.

There's also zero evidence that this was ever intended as a deadlock
avoidance maneuver. I think that we are only hypothesizing that it was
intended that way because the code looks weird. But I think the email
discussion shows that I thought it was wrong at the time it was
committed, and just missed the fact that the final version of the
patch hadn't fixed it. And if it *were* a deadlock avoidance maneuver
it would still be pretty broken, because it would make the startup
process error out and the whole system go down.

Regarding the question of whether we need a cleanup lock on the new
bucket I am not really seeing the advantage of going down that path.
Simply fixing this code to take a cleanup lock instead of hoping that
it always gets one by accident is low risk and should fix the observed
problem. Getting rid of the cleanup lock will be more invasive and I'd
like to see some evidence that it's a necessary step before we take
the risk of breaking things.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Andres Freund

Date:

17 August 2022, 18:36:23

Hi,

On 2022-08-17 10:18:14 +0530, Amit Kapila wrote:
> > Looking at the non-recovery code makes me even more suspicious:
> >
> >         /*
> >          * Physically allocate the new bucket's primary page.  We want to do this
> >          * before changing the metapage's mapping info, in case we can't get the
> >          * disk space.  Ideally, we don't need to check for cleanup lock on new
> >          * bucket as no other backend could find this bucket unless meta page is
> >          * updated.  However, it is good to be consistent with old bucket locking.
> >          */
> >         buf_nblkno = _hash_getnewbuf(rel, start_nblkno, MAIN_FORKNUM);
> >         if (!IsBufferCleanupOK(buf_nblkno))
> >         {
> >                 _hash_relbuf(rel, buf_oblkno);
> >                 _hash_relbuf(rel, buf_nblkno);
> >                 goto fail;
> >         }
> >
> >
> > _hash_getnewbuf() calls _hash_pageinit() which calls PageInit(), which
> > memset(0)s the whole page. What does it even mean to check whether you
> > effectively have a cleanup lock after you zeroed out the page?
> >
> > Reading the README and the comment above makes me wonder if this whole cleanup
> > lock business here is just cargo culting and could be dropped?
> >
> 
> I think it is okay to not acquire a clean-up lock on the new bucket
> page both in recovery and non-recovery paths. It is primarily required
> on the old bucket page to avoid concurrent scans/inserts. As mentioned
> in the comments and as per my memory serves, it is mainly for keeping
> it consistent with old bucket locking.

It's not keeping it consistent with bucket locking to zero out a page before
getting a cleanup lock, hopefully at least. This code is just broken on
multiple fronts, and consistency isn't a defense.

Greetings,

Andres Freund

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Andres Freund

Date:

17 August 2022, 18:45:34

Hi,

On 2022-08-17 08:25:06 -0400, Robert Haas wrote:
> Regarding the question of whether we need a cleanup lock on the new
> bucket I am not really seeing the advantage of going down that path.
> Simply fixing this code to take a cleanup lock instead of hoping that
> it always gets one by accident is low risk and should fix the observed
> problem. Getting rid of the cleanup lock will be more invasive and I'd
> like to see some evidence that it's a necessary step before we take
> the risk of breaking things.

Given that the cleanup locks in question are "taken" *after* re-initializing
the page, I'm doubtful that's a sane path forward. It seems quite likely to
mislead somebody to rely on it working as a cleanup lock in the future.

Greetings,

Andres Freund

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Robert Haas

Date:

17 August 2022, 19:21:55

On Wed, Aug 17, 2022 at 2:45 PM Andres Freund <andres@anarazel.de> wrote:
> On 2022-08-17 08:25:06 -0400, Robert Haas wrote:
> > Regarding the question of whether we need a cleanup lock on the new
> > bucket I am not really seeing the advantage of going down that path.
> > Simply fixing this code to take a cleanup lock instead of hoping that
> > it always gets one by accident is low risk and should fix the observed
> > problem. Getting rid of the cleanup lock will be more invasive and I'd
> > like to see some evidence that it's a necessary step before we take
> > the risk of breaking things.
>
> Given that the cleanup locks in question are "taken" *after* re-initializing
> the page, I'm doubtful that's a sane path forward. It seems quite likely to
> mislead somebody to rely on it working as a cleanup lock in the future.

There's not a horde of people lining up to work on the hash index
code, but if you feel like writing and testing the more invasive fix,
I'm not really going to fight you over it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Andres Freund

Date:

17 August 2022, 19:30:32

Hi,

On 2022-08-17 15:21:55 -0400, Robert Haas wrote:
> On Wed, Aug 17, 2022 at 2:45 PM Andres Freund <andres@anarazel.de> wrote:
> > Given that the cleanup locks in question are "taken" *after* re-initializing
> > the page, I'm doubtful that's a sane path forward. It seems quite likely to
> > mislead somebody to rely on it working as a cleanup lock in the future.
>
> There's not a horde of people lining up to work on the hash index
> code, but if you feel like writing and testing the more invasive fix,
> I'm not really going to fight you over it.

My problem is that the code right now is an outright lie. At the absolute very
least this code needs a big honking "we check if we have a cleanup lock here,
but that's just for show, because WE ALREADY OVERWROTE THE WHOLE PAGE".

Greetings,

Andres Freund

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Amit Kapila

Date:

18 August 2022, 09:47:47

On Wed, Aug 17, 2022 at 5:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> Regarding the question of whether we need a cleanup lock on the new
> bucket I am not really seeing the advantage of going down that path.
> Simply fixing this code to take a cleanup lock instead of hoping that
> it always gets one by accident is low risk and should fix the observed
> problem. Getting rid of the cleanup lock will be more invasive and I'd
> like to see some evidence that it's a necessary step before we take
> the risk of breaking things.
>

The patch proposed by you is sufficient to fix the observed issue.
BTW, we are able to reproduce the issue and your patch fixed it. The
idea is to ensure that checkpointer only tries to sync the buffer for
the new bucket, otherwise, it will block while acquiring the lock on
the old bucket buffer in SyncOneBuffer because the replay process
would already have it and we won't be able to hit required condition.

To simulate it, we need to stop the replay before we acquire the lock
for the old bucket. Then, let checkpointer advance the buf_id beyond
the buffer which we will get for the old bucket (in the place where it
loops over all buffers, and mark the ones that need to be written with
BM_CHECKPOINT_NEEDED.). After that let the replay process proceed till
the point where it checks for the clean-up lock on the new bucket.
Next, let the checkpointer advance to sync the buffer corresponding to
the new bucket buffer. This will reproduce the required condition.

We have tried many other combinations but couldn't able to hit it. For
example, we were not able to generate it via bgwriter because it
expects the buffer to have zero usage and ref count which is not
possible during the replay in hash_xlog_split_allocate_page() as we
already have increased the usage count for the new bucket buffer
before checking the cleanup lock on it.

I agree with you that getting rid of the clean-up lock on the new
bucket is a more invasive patch and should be done separately if
required. Yesterday, I have done a brief analysis and I think that is
possible but it doesn't seem to be a good idea to backpatch it.

-- 
With Regards,
Amit Kapila.

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Andres Freund

Date:

30 September 2022, 19:05:12

Hi,

This issue does occasionally happen in CI, as e.g. noted in this thread:
https://www.postgresql.org/message-id/20220930185345.GD6256%40telsasoft.com

On 2022-08-18 15:17:47 +0530, Amit Kapila wrote:
> I agree with you that getting rid of the clean-up lock on the new
> bucket is a more invasive patch and should be done separately if
> required. Yesterday, I have done a brief analysis and I think that is
> possible but it doesn't seem to be a good idea to backpatch it.

My problem with this approach is that the whole cleanup lock is hugely
misleading as-is. As I noted in
https://www.postgresql.org/message-id/20220817193032.z35vdjhpzkgldrd3%40awork3.anarazel.de
we take the cleanup lock *after* re-initializing the page. Thereby
completely breaking the properties that a cleanup lock normally tries to
guarantee.

Even if that were to achieve something useful (doubtful in this case),
it'd need a huge comment explaining what's going on.

Greetings,

Andres Freund

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Amit Kapila

Date:

06 October 2022, 07:14:24

On Sat, Oct 1, 2022 at 12:35 AM Andres Freund <andres@anarazel.de> wrote:
>
> This issue does occasionally happen in CI, as e.g. noted in this thread:
> https://www.postgresql.org/message-id/20220930185345.GD6256%40telsasoft.com
>
> On 2022-08-18 15:17:47 +0530, Amit Kapila wrote:
> > I agree with you that getting rid of the clean-up lock on the new
> > bucket is a more invasive patch and should be done separately if
> > required. Yesterday, I have done a brief analysis and I think that is
> > possible but it doesn't seem to be a good idea to backpatch it.
>
> My problem with this approach is that the whole cleanup lock is hugely
> misleading as-is. As I noted in
> https://www.postgresql.org/message-id/20220817193032.z35vdjhpzkgldrd3%40awork3.anarazel.de
> we take the cleanup lock *after* re-initializing the page. Thereby
> completely breaking the properties that a cleanup lock normally tries to
> guarantee.
>
> Even if that were to achieve something useful (doubtful in this case),
> it'd need a huge comment explaining what's going on.
>

Attached are two patches. The first patch is what Robert has proposed
with some changes in comments to emphasize the fact that cleanup lock
on the new bucket is just to be consistent with the old bucket page
locking as we are initializing it just before checking for cleanup
lock. In the second patch, I removed the acquisition of cleanup lock
on the new bucket page and changed the comments/README accordingly.

I think we can backpatch the first patch and the second patch can be
just a HEAD-only patch. Does that sound reasonable to you?

-- 
With Regards,
Amit Kapila.

Attachment

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

vignesh C

Date:

12 October 2022, 10:46:19

On Thu, 6 Oct 2022 at 12:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Oct 1, 2022 at 12:35 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > This issue does occasionally happen in CI, as e.g. noted in this thread:
> > https://www.postgresql.org/message-id/20220930185345.GD6256%40telsasoft.com
> >
> > On 2022-08-18 15:17:47 +0530, Amit Kapila wrote:
> > > I agree with you that getting rid of the clean-up lock on the new
> > > bucket is a more invasive patch and should be done separately if
> > > required. Yesterday, I have done a brief analysis and I think that is
> > > possible but it doesn't seem to be a good idea to backpatch it.
> >
> > My problem with this approach is that the whole cleanup lock is hugely
> > misleading as-is. As I noted in
> > https://www.postgresql.org/message-id/20220817193032.z35vdjhpzkgldrd3%40awork3.anarazel.de
> > we take the cleanup lock *after* re-initializing the page. Thereby
> > completely breaking the properties that a cleanup lock normally tries to
> > guarantee.
> >
> > Even if that were to achieve something useful (doubtful in this case),
> > it'd need a huge comment explaining what's going on.
> >
>
> Attached are two patches. The first patch is what Robert has proposed
> with some changes in comments to emphasize the fact that cleanup lock
> on the new bucket is just to be consistent with the old bucket page
> locking as we are initializing it just before checking for cleanup
> lock. In the second patch, I removed the acquisition of cleanup lock
> on the new bucket page and changed the comments/README accordingly.
>
> I think we can backpatch the first patch and the second patch can be
> just a HEAD-only patch. Does that sound reasonable to you?

Thanks for the patches.
I have verified that the issue is fixed using a manual test upto
REL_10_STABLE version and found it to be working fine.

I have added code to print the old buffer and new buffer values when
both old buffer and new buffer will get dirtied. Then I had executed
the following test and note down the old buffer and new buffer value
from the log file:
CREATE TABLE pvactst (i INT, a INT[], p POINT) with (autovacuum_enabled = off);
CREATE INDEX hash_pvactst ON pvactst USING hash (i);
create table t1(c1 int);
INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM
generate_series(1,1000) i;
INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM
generate_series(1,1000) i;
INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM
generate_series(1,1000) i;

In my environment, the issue will occur when oldbuf is 38 and newbuf is 60.

Once we know the old buffer and new buffer values, we will have to
debug the checkpointer and recovery process to simulate the scenario.
I used the following steps to simulate the issue in my environment:
1) Create streaming replication setup with the following configurations:
wal_consistency_checking = all
shared_buffers = 128MB                  # min 128kB
bgwriter_lru_maxpages = 0               # max buffers written/round, 0 disables
checkpoint_timeout = 30s                # range 30s-1d
2) Execute the following in master node:
CREATE TABLE pvactst (i INT, a INT[], p POINT) with (autovacuum_enabled = off);
CREATE INDEX hash_pvactst ON pvactst USING hash (i);
3) Hold checkpointer process of standby instance at BufferSync while debugging.
4) Execute the following in master node:
create table t1(c1 int); -- This is required so that the old buffer
value is not dirty in checkpoint process. (If old buffer is dirty then
we will not be able to sync the new buffer as checkpointer will wait
while trying to acquire the lock on old buffer).
5) Make checkpoint process to check the buffers up to old buffer + 1.
In our case it should cross 38.
6) Hold recovery process at
hash_xlog_split_allocate_page->IsBufferCleanupOK (approximately line
hash_xlog.c:357) while executing the following for the last insert in
the master node:
INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM
generate_series(1,1000) i;
INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM
generate_series(1,1000) i;
INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM
generate_series(1,1000) i;
7) Continue the checkpointer process and make it proceed to
SyncOneBuffer with buf_id = 60(newbuf value that was noted from the
earlier execution) and let it proceed up to PinBuffer_Locked(bufHdr);
8) Continue the recovery process will reproduce the PANIC scenario.

Regards,
Vignesh

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

vignesh C

Date:

12 October 2022, 11:52:54

On Wed, 12 Oct 2022 at 16:16, vignesh C <vignesh21@gmail.com> wrote:
>
> On Thu, 6 Oct 2022 at 12:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sat, Oct 1, 2022 at 12:35 AM Andres Freund <andres@anarazel.de> wrote:
> > >
> > > This issue does occasionally happen in CI, as e.g. noted in this thread:
> > > https://www.postgresql.org/message-id/20220930185345.GD6256%40telsasoft.com
> > >
> > > On 2022-08-18 15:17:47 +0530, Amit Kapila wrote:
> > > > I agree with you that getting rid of the clean-up lock on the new
> > > > bucket is a more invasive patch and should be done separately if
> > > > required. Yesterday, I have done a brief analysis and I think that is
> > > > possible but it doesn't seem to be a good idea to backpatch it.
> > >
> > > My problem with this approach is that the whole cleanup lock is hugely
> > > misleading as-is. As I noted in
> > > https://www.postgresql.org/message-id/20220817193032.z35vdjhpzkgldrd3%40awork3.anarazel.de
> > > we take the cleanup lock *after* re-initializing the page. Thereby
> > > completely breaking the properties that a cleanup lock normally tries to
> > > guarantee.
> > >
> > > Even if that were to achieve something useful (doubtful in this case),
> > > it'd need a huge comment explaining what's going on.
> > >
> >
> > Attached are two patches. The first patch is what Robert has proposed
> > with some changes in comments to emphasize the fact that cleanup lock
> > on the new bucket is just to be consistent with the old bucket page
> > locking as we are initializing it just before checking for cleanup
> > lock. In the second patch, I removed the acquisition of cleanup lock
> > on the new bucket page and changed the comments/README accordingly.
> >
> > I think we can backpatch the first patch and the second patch can be
> > just a HEAD-only patch. Does that sound reasonable to you?
>
> Thanks for the patches.
> I have verified that the issue is fixed using a manual test upto
> REL_10_STABLE version and found it to be working fine.

Just to clarify, I have verified that the first patch with Head,
REL_15_STABLE, REL_14_STABLE, REL_13_STABLE, REL_12_STABLE,
REL_11_STABLE and REL_10_STABLE branch fixes the issue. Also verified
that the first and second patch with Head branch fixes the issue.

Regards,
Vignesh

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Amit Kapila

Date:

13 October 2022, 10:58:30

On Wed, Oct 12, 2022 at 4:16 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Thu, 6 Oct 2022 at 12:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sat, Oct 1, 2022 at 12:35 AM Andres Freund <andres@anarazel.de> wrote:
> > >
> > > This issue does occasionally happen in CI, as e.g. noted in this thread:
> > > https://www.postgresql.org/message-id/20220930185345.GD6256%40telsasoft.com
> > >
> > > On 2022-08-18 15:17:47 +0530, Amit Kapila wrote:
> > > > I agree with you that getting rid of the clean-up lock on the new
> > > > bucket is a more invasive patch and should be done separately if
> > > > required. Yesterday, I have done a brief analysis and I think that is
> > > > possible but it doesn't seem to be a good idea to backpatch it.
> > >
> > > My problem with this approach is that the whole cleanup lock is hugely
> > > misleading as-is. As I noted in
> > > https://www.postgresql.org/message-id/20220817193032.z35vdjhpzkgldrd3%40awork3.anarazel.de
> > > we take the cleanup lock *after* re-initializing the page. Thereby
> > > completely breaking the properties that a cleanup lock normally tries to
> > > guarantee.
> > >
> > > Even if that were to achieve something useful (doubtful in this case),
> > > it'd need a huge comment explaining what's going on.
> > >
> >
> > Attached are two patches. The first patch is what Robert has proposed
> > with some changes in comments to emphasize the fact that cleanup lock
> > on the new bucket is just to be consistent with the old bucket page
> > locking as we are initializing it just before checking for cleanup
> > lock. In the second patch, I removed the acquisition of cleanup lock
> > on the new bucket page and changed the comments/README accordingly.
> >
> > I think we can backpatch the first patch and the second patch can be
> > just a HEAD-only patch. Does that sound reasonable to you?
>
> Thanks for the patches.
> I have verified that the issue is fixed using a manual test upto
> REL_10_STABLE version and found it to be working fine.
>

Thanks for the verification. I am planning to push the first patch
(and backpatch it) next week (by next Tuesday) unless we have more
comments or Robert intends to push it.

-- 
With Regards,
Amit Kapila.

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Andres Freund

Date:

13 October 2022, 20:55:35

Hi,

On 2022-10-06 12:44:24 +0530, Amit Kapila wrote:
> On Sat, Oct 1, 2022 at 12:35 AM Andres Freund <andres@anarazel.de> wrote:
> > My problem with this approach is that the whole cleanup lock is hugely
> > misleading as-is. As I noted in
> > https://www.postgresql.org/message-id/20220817193032.z35vdjhpzkgldrd3%40awork3.anarazel.de
> > we take the cleanup lock *after* re-initializing the page. Thereby
> > completely breaking the properties that a cleanup lock normally tries to
> > guarantee.
> >
> > Even if that were to achieve something useful (doubtful in this case),
> > it'd need a huge comment explaining what's going on.
> >
>
> Attached are two patches. The first patch is what Robert has proposed
> with some changes in comments to emphasize the fact that cleanup lock
> on the new bucket is just to be consistent with the old bucket page
> locking as we are initializing it just before checking for cleanup
> lock. In the second patch, I removed the acquisition of cleanup lock
> on the new bucket page and changed the comments/README accordingly.
>
> I think we can backpatch the first patch and the second patch can be
> just a HEAD-only patch. Does that sound reasonable to you?

Not particularly, no. I don't understand how "overwrite a page and then get a
cleanup lock" can sensibly be described by this comment:

> +++ b/src/backend/access/hash/hashpage.c
> @@ -807,7 +807,8 @@ restart_expand:
>       * before changing the metapage's mapping info, in case we can't get the
>       * disk space.  Ideally, we don't need to check for cleanup lock on new
>       * bucket as no other backend could find this bucket unless meta page is
> -     * updated.  However, it is good to be consistent with old bucket locking.
> +     * updated and we initialize the page just before it.  However, it is just
> +     * to be consistent with old bucket locking.
>       */
>      buf_nblkno = _hash_getnewbuf(rel, start_nblkno, MAIN_FORKNUM);
>      if (!IsBufferCleanupOK(buf_nblkno))

This is basically saying "I am breaking basic rules of locking just to be
consistent", no?

Greetings,

Andres Freund

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Peter Geoghegan

Date:

14 October 2022, 00:46:25

On Fri, Sep 30, 2022 at 12:05 PM Andres Freund <andres@anarazel.de> wrote:
> My problem with this approach is that the whole cleanup lock is hugely
> misleading as-is.

While nbtree VACUUM does use cleanup locks, they don't protect the
index structure itself -- it actually functions as an interlock
against concurrent TID recycling, which might otherwise confuse
in-flight index scans. That's why we need cleanup locks for VACUUM,
but not for index deletions, even though the physical modifications
that are performed to physical leaf pages are identical (the WAL
records are almost identical). Clearly the use of cleanup locks is not
really about protecting the leaf page itself -- it's about using the
physical leaf page as a proxy for the heap TIDs contained therein. A
very narrow protocol with a very specific purpose.

More generally, cleanup locks exist to protect transient references
that point into a heap page. References held by one backend only. A
TID, or a HeapTuple C pointer, or something similar. Cleanup locks are
not intended to protect a physical data structure in the heap, either
-- just a reference/pointer that points to the structure. There are
implications for the physical page structure itself, of course, but
that seems secondary. The guarantees are often limited to "never allow
the backend holding the pin to become utterly confused".

I am skeptical of the idea of using cleanup locks for anything more
ambitious than this. Especially in index AM code. It seems
uncomfortably close to "a buffer lock, but somehow also not a buffer
lock".

-- 
Peter Geoghegan

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Andres Freund

Date:

14 October 2022, 01:10:41

Hi,

On 2022-10-13 17:46:25 -0700, Peter Geoghegan wrote:
> On Fri, Sep 30, 2022 at 12:05 PM Andres Freund <andres@anarazel.de> wrote:
> > My problem with this approach is that the whole cleanup lock is hugely
> > misleading as-is.
>
> While nbtree VACUUM does use cleanup locks, they don't protect the
> index structure itself -- it actually functions as an interlock
> against concurrent TID recycling, which might otherwise confuse
> in-flight index scans. That's why we need cleanup locks for VACUUM,
> but not for index deletions, even though the physical modifications
> that are performed to physical leaf pages are identical (the WAL
> records are almost identical). Clearly the use of cleanup locks is not
> really about protecting the leaf page itself -- it's about using the
> physical leaf page as a proxy for the heap TIDs contained therein. A
> very narrow protocol with a very specific purpose.
>
> More generally, cleanup locks exist to protect transient references
> that point into a heap page. References held by one backend only. A
> TID, or a HeapTuple C pointer, or something similar. Cleanup locks are
> not intended to protect a physical data structure in the heap, either
> -- just a reference/pointer that points to the structure. There are
> implications for the physical page structure itself, of course, but
> that seems secondary. The guarantees are often limited to "never allow
> the backend holding the pin to become utterly confused".
>
> I am skeptical of the idea of using cleanup locks for anything more
> ambitious than this. Especially in index AM code. It seems
> uncomfortably close to "a buffer lock, but somehow also not a buffer
> lock".

My point here is a lot more mundane. The code essentially does
_hash_pageinit(), overwriting the whole page, and *then* conditionally
acquires a cleanup lock.  It simply is bogus code.

Greetings,

Andres Freund

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Peter Geoghegan

Date:

14 October 2022, 01:24:27

On Thu, Oct 13, 2022 at 6:10 PM Andres Freund <andres@anarazel.de> wrote:
> My point here is a lot more mundane. The code essentially does
> _hash_pageinit(), overwriting the whole page, and *then* conditionally
> acquires a cleanup lock.  It simply is bogus code.

I understood that that was what you meant. It's easy to see why this
code is broken, but to me it seems related to having too much
confidence in what is possible while relying on cleanup locks. That's
just my take.

-- 
Peter Geoghegan

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Amit Kapila

Date:

14 October 2022, 05:10:11

On Fri, Oct 14, 2022 at 2:25 AM Andres Freund <andres@anarazel.de> wrote:
>
> >
> > Attached are two patches. The first patch is what Robert has proposed
> > with some changes in comments to emphasize the fact that cleanup lock
> > on the new bucket is just to be consistent with the old bucket page
> > locking as we are initializing it just before checking for cleanup
> > lock. In the second patch, I removed the acquisition of cleanup lock
> > on the new bucket page and changed the comments/README accordingly.
> >
> > I think we can backpatch the first patch and the second patch can be
> > just a HEAD-only patch. Does that sound reasonable to you?
>
> Not particularly, no. I don't understand how "overwrite a page and then get a
> cleanup lock" can sensibly be described by this comment:
>
> > +++ b/src/backend/access/hash/hashpage.c
> > @@ -807,7 +807,8 @@ restart_expand:
> >        * before changing the metapage's mapping info, in case we can't get the
> >        * disk space.  Ideally, we don't need to check for cleanup lock on new
> >        * bucket as no other backend could find this bucket unless meta page is
> > -      * updated.  However, it is good to be consistent with old bucket locking.
> > +      * updated and we initialize the page just before it.  However, it is just
> > +      * to be consistent with old bucket locking.
> >        */
> >       buf_nblkno = _hash_getnewbuf(rel, start_nblkno, MAIN_FORKNUM);
> >       if (!IsBufferCleanupOK(buf_nblkno))
>
> This is basically saying "I am breaking basic rules of locking just to be
> consistent", no?
>

Fair point. How about something like: "XXX Do we really need to check
for cleanup lock on the new bucket? Here, we initialize the page, so
ideally we don't need to perform any operation that requires such a
check."?

Feel free to suggest something better.

-- 
With Regards,
Amit Kapila.

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Andres Freund

Date:

14 October 2022, 18:21:40

Hi,

On 2022-10-14 10:40:11 +0530, Amit Kapila wrote:
> On Fri, Oct 14, 2022 at 2:25 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > >
> > > Attached are two patches. The first patch is what Robert has proposed
> > > with some changes in comments to emphasize the fact that cleanup lock
> > > on the new bucket is just to be consistent with the old bucket page
> > > locking as we are initializing it just before checking for cleanup
> > > lock. In the second patch, I removed the acquisition of cleanup lock
> > > on the new bucket page and changed the comments/README accordingly.
> > >
> > > I think we can backpatch the first patch and the second patch can be
> > > just a HEAD-only patch. Does that sound reasonable to you?
> >
> > Not particularly, no. I don't understand how "overwrite a page and then get a
> > cleanup lock" can sensibly be described by this comment:
> >
> > > +++ b/src/backend/access/hash/hashpage.c
> > > @@ -807,7 +807,8 @@ restart_expand:
> > >        * before changing the metapage's mapping info, in case we can't get the
> > >        * disk space.  Ideally, we don't need to check for cleanup lock on new
> > >        * bucket as no other backend could find this bucket unless meta page is
> > > -      * updated.  However, it is good to be consistent with old bucket locking.
> > > +      * updated and we initialize the page just before it.  However, it is just
> > > +      * to be consistent with old bucket locking.
> > >        */
> > >       buf_nblkno = _hash_getnewbuf(rel, start_nblkno, MAIN_FORKNUM);
> > >       if (!IsBufferCleanupOK(buf_nblkno))
> >
> > This is basically saying "I am breaking basic rules of locking just to be
> > consistent", no?
> >
> 
> Fair point. How about something like: "XXX Do we really need to check
> for cleanup lock on the new bucket? Here, we initialize the page, so
> ideally we don't need to perform any operation that requires such a
> check."?.

This still seems to omit that the code is quite broken.

> Feel free to suggest something better.

How about something like:

  XXX: This code is wrong, we're overwriting the buffer before "acquiring" the
  cleanup lock. Currently this is not known to have bad consequences because
  XYZ and the fix seems a bit too risky for the backbranches.

Greetings,

Andres Freund

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Amit Kapila

Date:

15 October 2022, 06:27:09

On Fri, Oct 14, 2022 at 11:51 PM Andres Freund <andres@anarazel.de> wrote:
>
> How about something like:
>
>   XXX: This code is wrong, we're overwriting the buffer before "acquiring" the
>   cleanup lock. Currently this is not known to have bad consequences because
>   XYZ and the fix seems a bit too risky for the backbranches.
>

It looks mostly good to me. I am slightly uncomfortable with the last
part of the sentence: "the fix seems a bit too risky for the
backbranches." because it will stay like that in the back branches
code even after we fix it in HEAD. Instead, can we directly use the
FIXME tag like in the comments: "FIXME: This code is wrong, we're
overwriting the buffer before "acquiring" the cleanup lock. Currently,
this is not known to have bad consequences because no other backend
could find this bucket unless the meta page is updated."? Then, in the
commit message, we can use that sentence, something like: "...  While
fixing this issue, we have observed that cleanup lock is not required
on the new bucket for the split operation as we're overwriting the
buffer before "acquiring" the cleanup lock. Currently, this is not
known to have bad consequences and the fix seems a bit too risky for
the back branches.

-- 
With Regards,
Amit Kapila.

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Robert Haas

Date:

17 October 2022, 14:43:16

On Fri, Oct 14, 2022 at 2:21 PM Andres Freund <andres@anarazel.de> wrote:
> On 2022-10-14 10:40:11 +0530, Amit Kapila wrote:
> > On Fri, Oct 14, 2022 at 2:25 AM Andres Freund <andres@anarazel.de> wrote:
> > Fair point. How about something like: "XXX Do we really need to check
> > for cleanup lock on the new bucket? Here, we initialize the page, so
> > ideally we don't need to perform any operation that requires such a
> > check."?.
>
> This still seems to omit that the code is quite broken.

I don't think it's the job of a commit which is trying to fix a
certain bug to document all the other bugs it isn't fixing. If that
were the standard, we'd never be able to commit any bug fixes, which
is exactly what's happening on this thread. The first hunk of 0001
demonstrably fixes a real bug that has real consequences and, as far
as I can see, makes nothing worse in any way. We should commit that
hunk - and only that hunk - back-patch it all the way, and be happy to
have gotten something done. We've known what the correct fix is here
for 2 months and we're not doing anything about it because there's
some other problem that we're trying to worry about at the same time.
Let's stop doing that.

To put that another way, we don't need to fix or document anything
else to have a conditional cleanup lock acquisition shouldn't be
conditional. If it shouldn't be a cleanup lock either, well, that's a
separate patch.

Alternatively, if we all agree that 0001 and 0002 are both safe and
correct, then let's just merge the two patches together and commit it
with an explanatory message like:

===
Don't require a cleanup lock on the new page when splitting a hash index bucket.

The previous code took a cleanup lock conditionally and panicked if it
failed, which is wrong, because a process such as the background
writer or checkpointer can transiently pin pages on a standby. We
could make the cleanup lock acquisition unconditional, but it turns
out that it isn't needed at all, because no scan can examine the new
bucket before the metapage has been updated, and thus an exclusive
lock on the new bucket's primary page is sufficient. Note that we
still need a cleanup lock on the old bucket's primary page, because
that one is visible to scans.
===

> > Feel free to suggest something better.
>
> How about something like:
>
>   XXX: This code is wrong, we're overwriting the buffer before "acquiring" the
>   cleanup lock. Currently this is not known to have bad consequences because
>   XYZ and the fix seems a bit too risky for the backbranches.

I think here you're talking about the code that runs in
normal-running, not recovery, in _hash_expandtable, where we call
_hash_getnewbuf and then IsBufferCleanupOK. That code indeed seems
stupid, because as you say, there's no point in calling
_hash_getnewbuf() and thus overwriting the buffer and then only
afterwards checking IsBufferCleanupOK. By then the die is cast. But at
the same time, I think that it's not wrong in any way that matters to
the best of our current knowledge. That new buffer that we just went
and got might be pinned by somebody else, but they can't be doing
anything interesting with it, because we wouldn't be allocating it as
a new page if it were already in use for anything, and only one
process is allowed to be doing such an allocation at a time. That both
means that we can likely remove the cleanup lock acquisition, but it
also means that if we don't, there is no correctness problem here,
strictly speaking.

So I would suggest that if we feel we absolutely must put a comment
here, we could make it say something like "XXX. It doesn't make sense
to call _hash_getnewbuf() first, zeroing the buffer, and then only
afterwards check whether we have a cleanup lock. However, since no
scan can be accessing the new buffer yet, any concurrent accesses will
just be from processes like the bgwriter or checkpointer which don't
care about its contents, so it doesn't really matter."

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Andres Freund

Date:

17 October 2022, 17:02:10

Hi,

On 2022-10-17 10:43:16 -0400, Robert Haas wrote:
> On Fri, Oct 14, 2022 at 2:21 PM Andres Freund <andres@anarazel.de> wrote:
> > On 2022-10-14 10:40:11 +0530, Amit Kapila wrote:
> > > On Fri, Oct 14, 2022 at 2:25 AM Andres Freund <andres@anarazel.de> wrote:
> > > Fair point. How about something like: "XXX Do we really need to check
> > > for cleanup lock on the new bucket? Here, we initialize the page, so
> > > ideally we don't need to perform any operation that requires such a
> > > check."?.
> >
> > This still seems to omit that the code is quite broken.
>
> I don't think it's the job of a commit which is trying to fix a
> certain bug to document all the other bugs it isn't fixing.

That's true in general, but the case of fixing a bug in one place but not in
another nearby is a different story.


> If that were the standard, we'd never be able to commit any bug fixes

<eyeroll/>


> which is exactly what's happening on this thread.

What's been happening from my POV is that Amit and you didn't even acknowledge
the broken cleanup lock logic for weeks. I don't mind a reasoned decision to
not care about the non-recovery case. But until now I've not seen that, but I
also have a hard time keeping up with email, so I might have missed it.


> > > Feel free to suggest something better.
> >
> > How about something like:
> >
> >   XXX: This code is wrong, we're overwriting the buffer before "acquiring" the
> >   cleanup lock. Currently this is not known to have bad consequences because
> >   XYZ and the fix seems a bit too risky for the backbranches.
>
> I think here you're talking about the code that runs in
> normal-running, not recovery, in _hash_expandtable, where we call
> _hash_getnewbuf and then IsBufferCleanupOK.

Yes.


> That code indeed seems stupid, because as you say, there's no point in
> calling _hash_getnewbuf() and thus overwriting the buffer and then only
> afterwards checking IsBufferCleanupOK. By then the die is cast. But at the
> same time, I think that it's not wrong in any way that matters to the best
> of our current knowledge. That new buffer that we just went and got might be
> pinned by somebody else, but they can't be doing anything interesting with
> it, because we wouldn't be allocating it as a new page if it were already in
> use for anything, and only one process is allowed to be doing such an
> allocation at a time. That both means that we can likely remove the cleanup
> lock acquisition, but it also means that if we don't, there is no
> correctness problem here, strictly speaking.

If that's the case cool - I just don't know the locking protocol of hash
indexes well enough to judge this.


> So I would suggest that if we feel we absolutely must put a comment
> here, we could make it say something like "XXX. It doesn't make sense
> to call _hash_getnewbuf() first, zeroing the buffer, and then only
> afterwards check whether we have a cleanup lock. However, since no
> scan can be accessing the new buffer yet, any concurrent accesses will
> just be from processes like the bgwriter or checkpointer which don't
> care about its contents, so it doesn't really matter."

WFM. I'd probably lean to just fixing in the backbranches instead, but as long
as we make a conscious decision...

Greetings,

Andres Freund

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Robert Haas

Date:

17 October 2022, 17:34:02

On Mon, Oct 17, 2022 at 1:02 PM Andres Freund <andres@anarazel.de> wrote:
> That's true in general, but the case of fixing a bug in one place but not in
> another nearby is a different story.

I agree, but I still think we shouldn't let the perfect be the enemy
of the good.

> > That code indeed seems stupid, because as you say, there's no point in
> > calling _hash_getnewbuf() and thus overwriting the buffer and then only
> > afterwards checking IsBufferCleanupOK. By then the die is cast. But at the
> > same time, I think that it's not wrong in any way that matters to the best
> > of our current knowledge. That new buffer that we just went and got might be
> > pinned by somebody else, but they can't be doing anything interesting with
> > it, because we wouldn't be allocating it as a new page if it were already in
> > use for anything, and only one process is allowed to be doing such an
> > allocation at a time. That both means that we can likely remove the cleanup
> > lock acquisition, but it also means that if we don't, there is no
> > correctness problem here, strictly speaking.
>
> If that's the case cool - I just don't know the locking protocol of hash
> indexes well enough to judge this.

Darn, I was hoping you did, because I think this could certainly use
more than one pair of educated eyes.

> > So I would suggest that if we feel we absolutely must put a comment
> > here, we could make it say something like "XXX. It doesn't make sense
> > to call _hash_getnewbuf() first, zeroing the buffer, and then only
> > afterwards check whether we have a cleanup lock. However, since no
> > scan can be accessing the new buffer yet, any concurrent accesses will
> > just be from processes like the bgwriter or checkpointer which don't
> > care about its contents, so it doesn't really matter."
>
> WFM. I'd probably lean to just fixing in the backbranches instead, but as long
> as we make a conscious decision...

I am reasonably confident that just making the cleanup lock
acquisition unconditional will not break anything that isn't broken
already. Perhaps that confidence will turn out to be misplaced, but at
the moment I just don't see what can go wrong. Since it's a standby,
nobody else should be trying to get a cleanup lock, and even if they
did, err, so what? We can't deadlock because we don't hold any other
locks.

I don't feel quite as confident that not attempting a cleanup lock on
the new bucket's primary page is OK. I think it should be fine. The
existing comment even says it should be fine. But, that comment could
be wrong, and I'm not sure that I have my head around what all of the
possible interactions around that cleanup lock are. So changing it
makes me a little nervous.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Andres Freund

Date:

17 October 2022, 20:30:02

On 2022-10-17 13:34:02 -0400, Robert Haas wrote:
> I don't feel quite as confident that not attempting a cleanup lock on
> the new bucket's primary page is OK. I think it should be fine. The
> existing comment even says it should be fine. But, that comment could
> be wrong, and I'm not sure that I have my head around what all of the
> possible interactions around that cleanup lock are. So changing it
> makes me a little nervous.

If it's not OK, then the acquire-cleanuplock-after-reinit would be an
active bug though, right?

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Robert Haas

Date:

18 October 2022, 14:55:03

On Mon, Oct 17, 2022 at 4:30 PM Andres Freund <andres@anarazel.de> wrote:
> On 2022-10-17 13:34:02 -0400, Robert Haas wrote:
> > I don't feel quite as confident that not attempting a cleanup lock on
> > the new bucket's primary page is OK. I think it should be fine. The
> > existing comment even says it should be fine. But, that comment could
> > be wrong, and I'm not sure that I have my head around what all of the
> > possible interactions around that cleanup lock are. So changing it
> > makes me a little nervous.
>
> If it's not OK, then the acquire-cleanuplock-after-reinit would be an
> active bug though, right?

Yes, probably so.

Another approach here would be to have something like _hash_getnewbuf
that does not use RBM_ZERO_AND_LOCK or call _hash_pageinit, and then
call _hash_pageinit here, perhaps just before nopaque =
HashPageGetOpaque(npage), so that it's within the critical section.
But that doesn't feel very consistent with the rest of the code.

Maybe just nuking the IsBufferCleanupOK call is best, I don't know. I
honestly doubt that it matters very much what we pick here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Amit Kapila

Date:

31 October 2022, 11:26:56

On Tue, Oct 18, 2022 at 8:25 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Oct 17, 2022 at 4:30 PM Andres Freund <andres@anarazel.de> wrote:
> > On 2022-10-17 13:34:02 -0400, Robert Haas wrote:
>
> Maybe just nuking the IsBufferCleanupOK call is best, I don't know. I
> honestly doubt that it matters very much what we pick here.
>

Agreed, I think the important point to decide is what to do for
back-branches. We have the next minor release in a few days' time and
this is the last release for v10. I see the following options based on
the discussion here.

a. Use the code change in 0001 from email [1] and a comment change
proposed by Robert in email [2] to fix the bug reported. This should
be backpatched till v10. Then separately, we can consider committing
something like 0002 from email [1] as a HEAD-only patch.
b. Use the code change in 0001 from email [1] to fix the bug reported.
This should be backpatched till v10. Then separately, we can consider
committing something like 0002 from email [1] as a HEAD-only patch.
c. Combine 0001 and 0002 from the email [1] and push them in all
branches till v10.

I prefer going with (a).

[1] - https://www.postgresql.org/message-id/CAA4eK1LekwAZU5yf2h%2BW1Ko_c85TZHuNLg6jVPD6KDXrYYFo1g%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CA%2BTgmoYruVb7Nh5TUt47sTyYui2zE8Ke9T3DcHeB1wSkb%3DuSCw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Robert Haas

Date:

31 October 2022, 17:10:40

On Mon, Oct 31, 2022 at 7:27 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> Agreed, I think the important point to decide is what to do for
> back-branches. We have the next minor release in a few days' time and
> this is the last release for v10. I see the following options based on
> the discussion here.
>
> a. Use the code change in 0001 from email [1] and a comment change
> proposed by Robert in email [2] to fix the bug reported. This should
> be backpatched till v10. Then separately, we can consider committing
> something like 0002 from email [1] as a HEAD-only patch.
> b. Use the code change in 0001 from email [1] to fix the bug reported.
> This should be backpatched till v10. Then separately, we can consider
> committing something like 0002 from email [1] as a HEAD-only patch.
> c. Combine 0001 and 0002 from the email [1] and push them in all
> branches till v10.
>
> I prefer going with (a).

I vote for (a) or (b) for now, and we can consider what else to do
later. It might even include back-patching. But fixing things that are
causing problems we can see seems to me to have higher priority than
fixing things that are not.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Amit Kapila

Date:

01 November 2022, 03:49:32

On Mon, Oct 31, 2022 at 10:40 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Oct 31, 2022 at 7:27 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Agreed, I think the important point to decide is what to do for
> > back-branches. We have the next minor release in a few days' time and
> > this is the last release for v10. I see the following options based on
> > the discussion here.
> >
> > a. Use the code change in 0001 from email [1] and a comment change
> > proposed by Robert in email [2] to fix the bug reported. This should
> > be backpatched till v10. Then separately, we can consider committing
> > something like 0002 from email [1] as a HEAD-only patch.
> > b. Use the code change in 0001 from email [1] to fix the bug reported.
> > This should be backpatched till v10. Then separately, we can consider
> > committing something like 0002 from email [1] as a HEAD-only patch.
> > c. Combine 0001 and 0002 from the email [1] and push them in all
> > branches till v10.
> >
> > I prefer going with (a).
>
> I vote for (a) or (b) for now, and we can consider what else to do
> later.
>

I am fine with any of those. Would you like to commit or do you prefer
me to take care of this?

-- 
With Regards,
Amit Kapila.

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Robert Haas

Date:

07 November 2022, 17:42:45

On Mon, Oct 31, 2022 at 11:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> I am fine with any of those. Would you like to commit or do you prefer
> me to take care of this?

Sorry for not responding to this sooner. I think it's too late to do
anything about this for the current round of releases at this point,
but I am fine if you want to take care of it after that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Amit Kapila

Date:

08 November 2022, 09:37:46

On Mon, Nov 7, 2022 at 11:12 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Oct 31, 2022 at 11:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I am fine with any of those. Would you like to commit or do you prefer
> > me to take care of this?
>
> Sorry for not responding to this sooner. I think it's too late to do
> anything about this for the current round of releases at this point,
> but I am fine if you want to take care of it after that.
>

Okay, I'll take care of this either later this week after the release
work is finished or early next week.

-- 
With Regards,
Amit Kapila.

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Amit Kapila

Date:

14 November 2022, 10:36:27

On Tue, Nov 8, 2022 at 3:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Nov 7, 2022 at 11:12 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Mon, Oct 31, 2022 at 11:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > I am fine with any of those. Would you like to commit or do you prefer
> > > me to take care of this?
> >
> > Sorry for not responding to this sooner. I think it's too late to do
> > anything about this for the current round of releases at this point,
> > but I am fine if you want to take care of it after that.
> >
>
> Okay, I'll take care of this either later this week after the release
> work is finished or early next week.
>

Pushed.

-- 
With Regards,
Amit Kapila.

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Andres Freund

Date:

14 November 2022, 17:48:32

On 2022-11-14 16:06:27 +0530, Amit Kapila wrote:
> Pushed.

Thanks.

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From

Amit Kapila

Date:

16 November 2022, 02:03:42

On Mon, Nov 14, 2022 at 11:18 PM Andres Freund <andres@anarazel.de> wrote:
>
> On 2022-11-14 16:06:27 +0530, Amit Kapila wrote:
> > Pushed.
>
> Thanks.
>

Please find the attached patch to remove the buffer cleanup check on
the new bucket page. I think we should do this only for the HEAD. Do
you have any suggestions or objections on this one?

-- 
With Regards,
Amit Kapila.

Attachment

v3-0001-Don-t-acquire-cleanup-lock-on-the-new-bucket-page.patch