Re: BufferAlloc: don't take two simultaneous locks - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: BufferAlloc: don't take two simultaneous locks |
Date | |
Msg-id | 20220225080455.3zzxwfpvbizzcsfo@alap3.anarazel.de Whole thread Raw |
In response to | Re: BufferAlloc: don't take two simultaneous locks (Michail Nikolaev <michail.nikolaev@gmail.com>) |
Responses |
Re: BufferAlloc: don't take two simultaneous locks
Re: BufferAlloc: don't take two simultaneous locks |
List | pgsql-hackers |
Hi, On 2022-02-21 11:06:49 +0300, Yura Sokolov wrote: > From 04b07d0627ec65ba3327dc8338d59dbd15c405d8 Mon Sep 17 00:00:00 2001 > From: Yura Sokolov <y.sokolov@postgrespro.ru> > Date: Mon, 21 Feb 2022 08:49:03 +0300 > Subject: [PATCH v3] [PGPRO-5616] bufmgr: do not acquire two partition locks. > > Acquiring two partition locks leads to complex dependency chain that hurts > at high concurrency level. > > There is no need to hold both lock simultaneously. Buffer is pinned so > other processes could not select it for eviction. If tag is cleared and > buffer removed from old partition other processes will not find it. > Therefore it is safe to release old partition lock before acquiring > new partition lock. Yes, the current design is pretty nonsensical. It leads to really absurd stuff like holding the relation extension lock while we write out old buffer contents etc. > + * We have pinned buffer and we are single pinner at the moment so there > + * is no other pinners. Seems redundant. > We hold buffer header lock and exclusive partition > + * lock if tag is valid. Given these statements it is safe to clear tag > + * since no other process can inspect it to the moment. > + */ Could we share code with InvalidateBuffer here? It's not quite the same code, but nearly the same. > + * The usage_count starts out at 1 so that the buffer can survive one > + * clock-sweep pass. > + * > + * We use direct atomic OR instead of Lock+Unlock since no other backend > + * could be interested in the buffer. But StrategyGetBuffer, > + * Flush*Buffers, Drop*Buffers are scanning all buffers and locks them to > + * compare tag, and UnlockBufHdr does raw write to state. So we have to > + * spin if we found buffer locked. So basically the first half of of the paragraph is wrong, because no, we can't? > + * Note that we write tag unlocked. It is also safe since there is always > + * check for BM_VALID when tag is compared. > */ > buf->tag = newTag; > - buf_state &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED | > - BM_CHECKPOINT_NEEDED | BM_IO_ERROR | BM_PERMANENT | > - BUF_USAGECOUNT_MASK); > if (relpersistence == RELPERSISTENCE_PERMANENT || forkNum == INIT_FORKNUM) > - buf_state |= BM_TAG_VALID | BM_PERMANENT | BUF_USAGECOUNT_ONE; > + new_bits = BM_TAG_VALID | BM_PERMANENT | BUF_USAGECOUNT_ONE; > else > - buf_state |= BM_TAG_VALID | BUF_USAGECOUNT_ONE; > - > - UnlockBufHdr(buf, buf_state); > + new_bits = BM_TAG_VALID | BUF_USAGECOUNT_ONE; > > - if (oldPartitionLock != NULL) > + buf_state = pg_atomic_fetch_or_u32(&buf->state, new_bits); > + while (unlikely(buf_state & BM_LOCKED)) I don't think it's safe to atomic in arbitrary bits. If somebody else has locked the buffer header in this moment, it'll lead to completely bogus results, because unlocking overwrites concurrently written contents (which there shouldn't be any, but here there are)... And or'ing contents in also doesn't make sense because we it doesn't work to actually unset any contents? Why don't you just use LockBufHdr/UnlockBufHdr? Greetings, Andres Freund
pgsql-hackers by date: