Re: BufferAlloc: don't take two simultaneous locks - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: BufferAlloc: don't take two simultaneous locks
Date
Msg-id 20220317.120248.1826629229756908827.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: BufferAlloc: don't take two simultaneous locks  (Yura Sokolov <y.sokolov@postgrespro.ru>)
Responses Re: BufferAlloc: don't take two simultaneous locks  (Yura Sokolov <y.sokolov@postgrespro.ru>)
List pgsql-hackers
At Wed, 16 Mar 2022 14:11:58 +0300, Yura Sokolov <y.sokolov@postgrespro.ru> wrote in
> В Ср, 16/03/2022 в 12:07 +0900, Kyotaro Horiguchi пишет:
> > At Tue, 15 Mar 2022 13:47:17 +0300, Yura Sokolov <y.sokolov@postgrespro.ru> wrote in
> > In v7, HASH_ENTER returns the element stored in DynaHashReuse using
> > the freelist_idx of the new key.  v8 uses that of the old key (at the
> > time of HASH_REUSE).  So in the case "REUSE->ENTER(elem exists and
> > returns the stashed)" case the stashed element is returned to its
> > original partition.  But it is not what I mentioned.
> >
> > On the other hand, once the stahsed element is reused by HASH_ENTER,
> > it gives the same resulting state with HASH_REMOVE->HASH_ENTER(borrow
> > from old partition) case.  I suspect that ththat the frequent freelist
> > starvation comes from the latter case.
>
> Doubtfully. Due to probabilty theory, single partition doubdfully
> will be too overflowed. Therefore, freelist.

Yeah.  I think so generally.

> But! With 128kb shared buffers there is just 32 buffers. 32 entry for
> 32 freelist partition - certainly some freelist partition will certainly
> have 0 entry even if all entries are in freelists.

Anyway, it's an extreme condition and the starvation happens only at a
neglegible ratio.

> > RETURNED: 2
> > ALLOCED: 0
> > BORROWED: 435
> > REUSED: 495444
> > ASSIGNED: 495467 (-23)
> >
> > Now "BORROWED" happens 0.8% of REUSED
>
> 0.08% actually :)

Mmm.  Doesn't matter:p

> > > > > I lost access to Xeon 8354H, so returned to old Xeon X5675.
> > > > ...
> > > > > Strange thing: both master and patched version has higher
> > > > > peak tps at X5676 at medium connections (17 or 27 clients)
> > > > > than in first october version [1]. But lower tps at higher
> > > > > connections number (>= 191 clients).
> > > > > I'll try to bisect on master this unfortunate change.
...
> I've checked. Looks like something had changed on the server, since
> old master commit behaves now same to new one (and differently to
> how it behaved in October).
> I remember maintainance downtime of the server in november/december.
> Probably, kernel were upgraded or some system settings were changed.

One thing I have a little concern is that numbers shows 1-2% of
degradation steadily for connection numbers < 17.

I think there are two possible cause of the degradation.

1. Additional branch by consolidating HASH_ASSIGN into HASH_ENTER.
  This might cause degradation for memory-contended use.

2. nallocs operation might cause degradation on non-shared dynahasyes?
  I believe doesn't but I'm not sure.

  On a simple benchmarking with pgbench on a laptop, dynahash
  allocation (including shared and non-shared) happend about at 50
  times per second with 10 processes and 200 with 100 processes.

> > I don't think nalloced needs to be the same width to long.  For the
> > platforms with 32-bit long, anyway the possible degradation if any by
> > 64-bit atomic there doesn't matter.  So don't we always define the
> > atomic as 64bit and use the pg_atomic_* functions directly?
>
> Some 32bit platforms has no native 64bit atomics. Then they are
> emulated with locks.
>
> Well, and for 32bit platform long is just enough. Why spend other
> 4 bytes per each dynahash?

I don't think additional bytes doesn't matter, but emulated atomic
operations can matter. However I'm not sure which platform uses that
fallback implementations.  (x86 seems to have __sync_fetch_and_add()
since P4).

My opinion in the previous mail is that if that level of degradation
caued by emulated atomic operations matters, we shouldn't use atomic
there at all since atomic operations on the modern platforms are not
also free.

In relation to 2 above, if we observe that the degradation disappears
by (tentatively) use non-atomic operations for nalloced, we should go
back to the previous per-freelist nalloced.

I don't have access to such a musculous machines, though..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: pg_tablespace_location() failure with allow_in_place_tablespaces
Next
From: Andy Fan
Date:
Subject: Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?