Re: BufferAlloc: don't take two simultaneous locks - Mailing list pgsql-hackers

From Yura Sokolov
Subject Re: BufferAlloc: don't take two simultaneous locks
Date
Msg-id a033d901ed0120ab5bf7562d03a5bea3e33dacb6.camel@postgrespro.ru
Whole thread Raw
In response to Re: BufferAlloc: don't take two simultaneous locks  (Simon Riggs <simon.riggs@enterprisedb.com>)
Responses Re: BufferAlloc: don't take two simultaneous locks
List pgsql-hackers
В Пт, 25/02/2022 в 09:38 +0000, Simon Riggs пишет:
> On Fri, 25 Feb 2022 at 09:24, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
> 
> > > This approach is cleaner than v1, but should also perform better
> > > because there will be a 1:1 relationship between a buffer and its
> > > dynahash entry, most of the time.
> > 
> > Thank you for suggestion. Yes, it is much clearer than my initial proposal.
> > 
> > Should I incorporate it to v4 patch? Perhaps, it could be a separate
> > commit in new version.
> 
> I don't insist that you do that, but since the API changes are a few
> hours work ISTM better to include in one patch for combined perf
> testing. It would be better to put all changes in this area into PG15
> than to split it across multiple releases.
> 
> > Why there is need for this? Which way backend could be forced to abort
> > between BufTableReuse and BufTableAssign in this code path? I don't
> > see any CHECK_FOR_INTERRUPTS on the way, but may be I'm missing
> > something.
> 
> Sounds reasonable.

Ok, here is v4.
It is with two commits: one for BufferAlloc locking change and other
for dynahash's freelist avoiding.

Buffer locking patch is same to v2 with some comment changes. Ie it uses
Lock+UnlockBufHdr 

For dynahash HASH_REUSE and HASH_ASSIGN as suggested.
HASH_REUSE stores deleted element into per-process static variable.
HASH_ASSIGN uses this element instead of freelist. If there's no
such stored element, it falls back to HASH_ENTER.

I've implemented Robert Haas's suggestion to count element in freelists
instead of nentries:

> One idea is to jigger things so that we maintain a count of the total
> number of entries that doesn't change except when we allocate, and
> then for each freelist partition we maintain the number of entries in
> that freelist partition.  So then the size of the hash table, instead
> of being sum(nentries) is totalsize - sum(nfree).

https://postgr.es/m/CA%2BTgmoZkg-04rcNRURt%3DjAG0Cs5oPyB-qKxH4wqX09e-oXy-nw%40mail.gmail.com

It helps to avoid freelist lock just to actualize counters.
I made it with replacing "nentries" with "nfree" and adding
"nalloced" to each freelist. It also makes "hash_update_hash_key" valid
for key that migrates partitions.

I believe, there is no need for "nalloced" for each freelist, and
instead single such field should be in HASHHDR. More, it seems to me
`element_alloc` function needs no acquiring freelist partition lock
since it is called only during initialization of shared hash table.
Am I right?

I didn't go this path in v4 for simplicity, but can put it to v5
if approved.

To be honest, "reuse" patch gives little improvement. But still
measurable on some connection numbers.

I tried to reduce freelist partitions to 8, but it has mixed impact.
Most of time performance is same, but sometimes a bit lower. I
didn't investigate reasons. Perhaps they are not related to buffer
manager.

I didn't introduce new functions BufTableReuse and BufTableAssign
since there are single call to BufTableInsert and two calls to
BufTableDelete. So I reused this functions, just added "reuse" flag
to BufTableDelete. 

Tests simple_select for Xeon 8354H, 128MB and 1G shared buffers
for scale 100.

1 socket:
  conns |     master |   patch_v4 |  master 1G | patch_v4 1G 
--------+------------+------------+------------+------------
      1 |      41975 |      41540 |      52898 |      52213 
      2 |      77693 |      77908 |      97571 |      98371 
      3 |     114713 |     115522 |     142709 |     145226 
      5 |     188898 |     187617 |     239322 |     237269 
      7 |     261516 |     260006 |     329119 |     329449 
     17 |     521821 |     519473 |     672390 |     662106 
     27 |     555487 |     555697 |     674630 |     672736 
     53 |     868213 |     896539 |    1190734 |    1202505 
     83 |     868232 |     866029 |    1164997 |    1158719 
    107 |     850477 |     845685 |    1140597 |    1134502 
    139 |     816311 |     816808 |    1101471 |    1091258 
    163 |     794788 |     796517 |    1078445 |    1071568 
    191 |     765934 |     776185 |    1059497 |    1041944 
    211 |     738656 |     777365 |    1083356 |    1046422 
    239 |     713124 |     841337 |    1104629 |    1116668 
    271 |     692138 |     847803 |    1094432 |    1128971 
    307 |     682919 |     849239 |    1086306 |    1127051 
    353 |     679449 |     842125 |    1071482 |    1117471 
    397 |     676217 |     844015 |    1058937 |    1118628 

2 sockets:
  conns |     master |   patch_v4 |  master 1G | patch_v4 1G 
--------+------------+------------+------------+------------
      1 |      44317 |      44034 |      53920 |      53583 
      2 |      81193 |      78621 |      99138 |      97968 
      3 |     120755 |     115648 |     148102 |     147423 
      5 |     190007 |     188943 |     232078 |     231029 
      7 |     258602 |     260649 |     325545 |     318567 
     17 |     551814 |     552914 |     692312 |     697518 
     27 |     787353 |     786573 |    1023509 |    1022891 
     53 |     973880 |    1008534 |    1228274 |    1278194 
     83 |    1108442 |    1269777 |    1596292 |    1648156 
    107 |    1072188 |    1339634 |    1542401 |    1664476 
    139 |    1000446 |    1316372 |    1490757 |    1676127 
    163 |     967378 |    1257445 |    1461468 |    1655574 
    191 |     926010 |    1189591 |    1435317 |    1639313 
    211 |     909919 |    1149905 |    1417437 |    1632764 
    239 |     895944 |    1115681 |    1393530 |    1616329 
    271 |     880545 |    1090208 |    1374878 |    1609544 
    307 |     865560 |    1066798 |    1355164 |    1593769 
    353 |     857591 |    1046426 |    1330069 |    1584006 
    397 |     840374 |    1024711 |    1312257 |    1564872 

--------

regards

Yura Sokolov
Postgres Professional
y.sokolov@postgrespro.ru
funny.falcon@gmail.com

Attachment

pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Failed transaction statistics to measure the logical replication progress
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Allow async standbys wait for sync replication