Re: BufferAlloc: don't take two simultaneous locks - Mailing list pgsql-hackers
From | Yura Sokolov |
---|---|
Subject | Re: BufferAlloc: don't take two simultaneous locks |
Date | |
Msg-id | f0f952234146415a92892fcdb74077a602fd6f1f.camel@postgrespro.ru Whole thread Raw |
In response to | Re: BufferAlloc: don't take two simultaneous locks (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
В Пт, 25/02/2022 в 09:01 -0800, Andres Freund пишет: > Hi, > > On 2022-02-25 12:51:22 +0300, Yura Sokolov wrote: > > > > + * The usage_count starts out at 1 so that the buffer can survive one > > > > + * clock-sweep pass. > > > > + * > > > > + * We use direct atomic OR instead of Lock+Unlock since no other backend > > > > + * could be interested in the buffer. But StrategyGetBuffer, > > > > + * Flush*Buffers, Drop*Buffers are scanning all buffers and locks them to > > > > + * compare tag, and UnlockBufHdr does raw write to state. So we have to > > > > + * spin if we found buffer locked. > > > > > > So basically the first half of of the paragraph is wrong, because no, we > > > can't? > > > > Logically, there are no backends that could be interesting in the buffer. > > Physically they do LockBufHdr/UnlockBufHdr just to check they are not interesting. > > Yea, but that's still being interested in the buffer... > > > > > > + * Note that we write tag unlocked. It is also safe since there is always > > > > + * check for BM_VALID when tag is compared. > > > > */ > > > > buf->tag = newTag; > > > > - buf_state &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED | > > > > - BM_CHECKPOINT_NEEDED | BM_IO_ERROR | BM_PERMANENT | > > > > - BUF_USAGECOUNT_MASK); > > > > if (relpersistence == RELPERSISTENCE_PERMANENT || forkNum == INIT_FORKNUM) > > > > - buf_state |= BM_TAG_VALID | BM_PERMANENT | BUF_USAGECOUNT_ONE; > > > > + new_bits = BM_TAG_VALID | BM_PERMANENT | BUF_USAGECOUNT_ONE; > > > > else > > > > - buf_state |= BM_TAG_VALID | BUF_USAGECOUNT_ONE; > > > > - > > > > - UnlockBufHdr(buf, buf_state); > > > > + new_bits = BM_TAG_VALID | BUF_USAGECOUNT_ONE; > > > > > > > > - if (oldPartitionLock != NULL) > > > > + buf_state = pg_atomic_fetch_or_u32(&buf->state, new_bits); > > > > + while (unlikely(buf_state & BM_LOCKED)) > > > > > > I don't think it's safe to atomic in arbitrary bits. If somebody else has > > > locked the buffer header in this moment, it'll lead to completely bogus > > > results, because unlocking overwrites concurrently written contents (which > > > there shouldn't be any, but here there are)... > > > > That is why there is safety loop in the case buf->state were locked just > > after first optimistic atomic_fetch_or. 99.999% times this loop will not > > have a job. But in case other backend did lock buf->state, loop waits > > until it releases lock and retry atomic_fetch_or. > > > And or'ing contents in also doesn't make sense because we it doesn't work to > > > actually unset any contents? > > > > Sorry, I didn't understand sentence :(( > > You're OR'ing multiple bits into buf->state. LockBufHdr() only ORs in > BM_LOCKED. ORing BM_LOCKED is fine: > Either the buffer is not already locked, in which case it just sets the > BM_LOCKED bit, acquiring the lock. Or it doesn't change anything, because > BM_LOCKED already was set. > > But OR'ing in multiple bits is *not* fine, because it'll actually change the > contents of ->state while the buffer header is locked. First, both states are valid: before atomic_or and after. Second, there are no checks for buffer->state while buffer header is locked. All LockBufHdr users uses result of LockBufHdr. (I just checked that). > > > Why don't you just use LockBufHdr/UnlockBufHdr? > > > > This pair makes two atomic writes to memory. Two writes are heavier than > > one write in this version (if optimistic case succeed). > > UnlockBufHdr doesn't use a locked atomic op. It uses a write barrier and an > unlocked write. Write barrier is not free on any platform. Well, while I don't see problem with modifying buffer->state, there is problem with modifying buffer->tag: I missed Drop*Buffers doesn't check BM_TAG_VALID flag. Therefore either I had to add this check to those places, or return to LockBufHdr+UnlockBufHdr pair. For patch simplicity I'll return Lock+UnlockBufHdr pair. But it has measurable impact on low connection numbers on many-sockets. > > Greetings, > > Andres Freund
pgsql-hackers by date: