Re: BufferAlloc: don't take two simultaneous locks - Mailing list pgsql-hackers

From Andres Freund
Subject Re: BufferAlloc: don't take two simultaneous locks
Date
Msg-id 20220225170127.lxe66l2swx3empgu@alap3.anarazel.de
Whole thread Raw
In response to Re: BufferAlloc: don't take two simultaneous locks  (Yura Sokolov <y.sokolov@postgrespro.ru>)
Responses Re: BufferAlloc: don't take two simultaneous locks  (Yura Sokolov <y.sokolov@postgrespro.ru>)
List pgsql-hackers
Hi,

On 2022-02-25 12:51:22 +0300, Yura Sokolov wrote:
> > > +     * The usage_count starts out at 1 so that the buffer can survive one
> > > +     * clock-sweep pass.
> > > +     *
> > > +     * We use direct atomic OR instead of Lock+Unlock since no other backend
> > > +     * could be interested in the buffer. But StrategyGetBuffer,
> > > +     * Flush*Buffers, Drop*Buffers are scanning all buffers and locks them to
> > > +     * compare tag, and UnlockBufHdr does raw write to state. So we have to
> > > +     * spin if we found buffer locked.
> > 
> > So basically the first half of of the paragraph is wrong, because no, we
> > can't?
> 
> Logically, there are no backends that could be interesting in the buffer.
> Physically they do LockBufHdr/UnlockBufHdr just to check they are not interesting.

Yea, but that's still being interested in the buffer...


> > > +     * Note that we write tag unlocked. It is also safe since there is always
> > > +     * check for BM_VALID when tag is compared.
> > 
> > 
> > >       */
> > >      buf->tag = newTag;
> > > -    buf_state &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED |
> > > -                   BM_CHECKPOINT_NEEDED | BM_IO_ERROR | BM_PERMANENT |
> > > -                   BUF_USAGECOUNT_MASK);
> > >      if (relpersistence == RELPERSISTENCE_PERMANENT || forkNum == INIT_FORKNUM)
> > > -        buf_state |= BM_TAG_VALID | BM_PERMANENT | BUF_USAGECOUNT_ONE;
> > > +        new_bits = BM_TAG_VALID | BM_PERMANENT | BUF_USAGECOUNT_ONE;
> > >      else
> > > -        buf_state |= BM_TAG_VALID | BUF_USAGECOUNT_ONE;
> > > -
> > > -    UnlockBufHdr(buf, buf_state);
> > > +        new_bits = BM_TAG_VALID | BUF_USAGECOUNT_ONE;
> > >  
> > > -    if (oldPartitionLock != NULL)
> > > +    buf_state = pg_atomic_fetch_or_u32(&buf->state, new_bits);
> > > +    while (unlikely(buf_state & BM_LOCKED))
> > 
> > I don't think it's safe to atomic in arbitrary bits. If somebody else has
> > locked the buffer header in this moment, it'll lead to completely bogus
> > results, because unlocking overwrites concurrently written contents (which
> > there shouldn't be any, but here there are)...
> 
> That is why there is safety loop in the case buf->state were locked just
> after first optimistic atomic_fetch_or. 99.999% times this loop will not
> have a job. But in case other backend did lock buf->state, loop waits
> until it releases lock and retry atomic_fetch_or.

> > And or'ing contents in also doesn't make sense because we it doesn't work to
> > actually unset any contents?
> 
> Sorry, I didn't understand sentence :((


You're OR'ing multiple bits into buf->state. LockBufHdr() only ORs in
BM_LOCKED. ORing BM_LOCKED is fine:
Either the buffer is not already locked, in which case it just sets the
BM_LOCKED bit, acquiring the lock. Or it doesn't change anything, because
BM_LOCKED already was set.

But OR'ing in multiple bits is *not* fine, because it'll actually change the
contents of ->state while the buffer header is locked.


> > Why don't you just use LockBufHdr/UnlockBufHdr?
> 
> This pair makes two atomic writes to memory. Two writes are heavier than
> one write in this version (if optimistic case succeed).

UnlockBufHdr doesn't use a locked atomic op. It uses a write barrier and an
unlocked write.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Add parameter jit_warn_above_fraction
Next
From: Tom Lane
Date:
Subject: Re: Frontend error logging style