Re: BufferAlloc: don't take two simultaneous locks - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: BufferAlloc: don't take two simultaneous locks |
Date | |
Msg-id | CANbhV-EdmOjAzXVosn6RX35g2K6xJ1OThRtQBpsRfEHYxC9rNg@mail.gmail.com Whole thread Raw |
In response to | Re: BufferAlloc: don't take two simultaneous locks (Michail Nikolaev <michail.nikolaev@gmail.com>) |
Responses |
Re: BufferAlloc: don't take two simultaneous locks
|
List | pgsql-hackers |
On Mon, 21 Feb 2022 at 08:06, Yura Sokolov <y.sokolov@postgrespro.ru> wrote: > > Good day, Kyotaro Horiguchi and hackers. > > В Чт, 17/02/2022 в 14:16 +0900, Kyotaro Horiguchi пишет: > > At Wed, 16 Feb 2022 10:40:56 +0300, Yura Sokolov <y.sokolov@postgrespro.ru> wrote in > > > Hello, all. > > > > > > I thought about patch simplification, and tested version > > > without BufTable and dynahash api change at all. > > > > > > It performs suprisingly well. It is just a bit worse > > > than v1 since there is more contention around dynahash's > > > freelist, but most of improvement remains. > > > > > > I'll finish benchmarking and will attach graphs with > > > next message. Patch is attached here. > > > > Thanks for the new patch. The patch as a whole looks fine to me. But > > some comments needs to be revised. > > Thank you for review and remarks. v3 gets the buffer partition locking right, well done, great results! In v3, the comment at line 1279 still implies we take both locks together, which is not now the case. Dynahash actions are still possible. You now have the BufTableDelete before the BufTableInsert, which opens up the possibility I discussed here: http://postgr.es/m/CANbhV-F0H-8oB_A+m=55hP0e0QRL=RdDDQuSXMTFt6JPrdX+pQ@mail.gmail.com (Apologies for raising a similar topic, I hadn't noticed this thread before; thanks to Horiguchi-san for pointing this out). v1 had a horrible API (sorry!) where you returned the entry and then explicitly re-used it. I think we *should* make changes to dynahash, but not with the API you proposed. Proposal for new BufTable API BufTableReuse() - similar to BufTableDelete() but does NOT put entry back on freelist, we remember it in a private single item cache in dynahash BufTableAssign() - similar to BufTableInsert() but can only be executed directly after BufTableReuse(), fails with ERROR otherwise. Takes the entry from single item cache and re-assigns it to new tag In dynahash we have two new modes that match the above HASH_REUSE - used by BufTableReuse(), similar to HASH_REMOVE, but places entry on the single item cache, avoiding freelist HASH_ASSIGN - used by BufTableAssign(), similar to HASH_ENTER, but uses the entry from the single item cache, rather than asking freelist This last call can fail if someone else already inserted the tag, in which case it adds the single item cache entry back onto freelist Notice that single item cache is not in shared memory, so on abort we should give it back, so we probably need an extra API call for that also to avoid leaking an entry. Doing it this way allows us to * avoid touching freelists altogether in the common path - we know we are about to reassign the entry, so we do remember it - no contention from other backends, no borrowing etc.. * avoid sharing the private details outside of the dynahash module * allows us to use the same technique elsewhere that we have partitioned hash tables This approach is cleaner than v1, but should also perform better because there will be a 1:1 relationship between a buffer and its dynahash entry, most of the time. With these changes, I think we will be able to *reduce* the number of freelists for partitioned dynahash from 32 to maybe 8, as originally speculated by Robert in 2016: https://www.postgresql.org/message-id/CA%2BTgmoZkg-04rcNRURt%3DjAG0Cs5oPyB-qKxH4wqX09e-oXy-nw%40mail.gmail.com since the freelists will be much less contended with the above approach It would be useful to see performance with a higher number of connections, >400. -- Simon Riggs http://www.EnterpriseDB.com/
pgsql-hackers by date: