Re: BufferAlloc: don't take two simultaneous locks - Mailing list pgsql-hackers

From Yura Sokolov
Subject Re: BufferAlloc: don't take two simultaneous locks
Date
Msg-id c6f11bacdab7f7ef0891dec0823759807038a91f.camel@postgrespro.ru
Whole thread Raw
In response to Re: BufferAlloc: don't take two simultaneous locks  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: BufferAlloc: don't take two simultaneous locks  (Yura Sokolov <y.sokolov@postgrespro.ru>)
List pgsql-hackers
В Чт, 17/03/2022 в 12:02 +0900, Kyotaro Horiguchi пишет:
> At Wed, 16 Mar 2022 14:11:58 +0300, Yura Sokolov <y.sokolov@postgrespro.ru> wrote in 
> > В Ср, 16/03/2022 в 12:07 +0900, Kyotaro Horiguchi пишет:
> > > At Tue, 15 Mar 2022 13:47:17 +0300, Yura Sokolov <y.sokolov@postgrespro.ru> wrote in 
> > > In v7, HASH_ENTER returns the element stored in DynaHashReuse using
> > > the freelist_idx of the new key.  v8 uses that of the old key (at the
> > > time of HASH_REUSE).  So in the case "REUSE->ENTER(elem exists and
> > > returns the stashed)" case the stashed element is returned to its
> > > original partition.  But it is not what I mentioned.
> > > 
> > > On the other hand, once the stahsed element is reused by HASH_ENTER,
> > > it gives the same resulting state with HASH_REMOVE->HASH_ENTER(borrow
> > > from old partition) case.  I suspect that ththat the frequent freelist
> > > starvation comes from the latter case.
> > 
> > Doubtfully. Due to probabilty theory, single partition doubdfully
> > will be too overflowed. Therefore, freelist.
> 
> Yeah.  I think so generally.
> 
> > But! With 128kb shared buffers there is just 32 buffers. 32 entry for
> > 32 freelist partition - certainly some freelist partition will certainly
> > have 0 entry even if all entries are in freelists. 
> 
> Anyway, it's an extreme condition and the starvation happens only at a
> neglegible ratio.
> 
> > > RETURNED: 2
> > > ALLOCED: 0
> > > BORROWED: 435
> > > REUSED: 495444
> > > ASSIGNED: 495467 (-23)
> > > 
> > > Now "BORROWED" happens 0.8% of REUSED
> > 
> > 0.08% actually :)
> 
> Mmm.  Doesn't matter:p
> 
> > > > > > I lost access to Xeon 8354H, so returned to old Xeon X5675.
> > > > > ...
> > > > > > Strange thing: both master and patched version has higher
> > > > > > peak tps at X5676 at medium connections (17 or 27 clients)
> > > > > > than in first october version [1]. But lower tps at higher
> > > > > > connections number (>= 191 clients).
> > > > > > I'll try to bisect on master this unfortunate change.
> ...
> > I've checked. Looks like something had changed on the server, since
> > old master commit behaves now same to new one (and differently to
> > how it behaved in October).
> > I remember maintainance downtime of the server in november/december.
> > Probably, kernel were upgraded or some system settings were changed.
> 
> One thing I have a little concern is that numbers shows 1-2% of
> degradation steadily for connection numbers < 17.
> 
> I think there are two possible cause of the degradation.
> 
> 1. Additional branch by consolidating HASH_ASSIGN into HASH_ENTER.
>   This might cause degradation for memory-contended use.
> 
> 2. nallocs operation might cause degradation on non-shared dynahasyes?
>   I believe doesn't but I'm not sure.
> 
>   On a simple benchmarking with pgbench on a laptop, dynahash
>   allocation (including shared and non-shared) happend about at 50
>   times per second with 10 processes and 200 with 100 processes.
> 
> > > I don't think nalloced needs to be the same width to long.  For the
> > > platforms with 32-bit long, anyway the possible degradation if any by
> > > 64-bit atomic there doesn't matter.  So don't we always define the
> > > atomic as 64bit and use the pg_atomic_* functions directly?
> > 
> > Some 32bit platforms has no native 64bit atomics. Then they are
> > emulated with locks.
> > 
> > Well, and for 32bit platform long is just enough. Why spend other
> > 4 bytes per each dynahash?
> 
> I don't think additional bytes doesn't matter, but emulated atomic
> operations can matter. However I'm not sure which platform uses that
> fallback implementations.  (x86 seems to have __sync_fetch_and_add()
> since P4).
> 
> My opinion in the previous mail is that if that level of degradation
> caued by emulated atomic operations matters, we shouldn't use atomic
> there at all since atomic operations on the modern platforms are not
> also free.
> 
> In relation to 2 above, if we observe that the degradation disappears
> by (tentatively) use non-atomic operations for nalloced, we should go
> back to the previous per-freelist nalloced.

Here is version with nalloced being union of appropriate atomic and
long.

------

regards
Yura Sokolov

Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: A test for replay of regression tests
Next
From: Peter Eisentraut
Date:
Subject: Re: pgsql: Add option to use ICU as global locale provider