Re: Buffer Manager and Contention - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: Buffer Manager and Contention |
Date | |
Msg-id | 20220225.102025.173579582113640633.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Buffer Manager and Contention (Simon Riggs <simon.riggs@enterprisedb.com>) |
Responses |
Re: Buffer Manager and Contention
Re: Buffer Manager and Contention |
List | pgsql-hackers |
(I added Yura, as the author of a related patch) At Thu, 24 Feb 2022 12:58:23 +0000, Simon Riggs <simon.riggs@enterprisedb.com> wrote in > Thinking about poor performance in the case where the data fits in > RAM, but the working set is too big for shared_buffers, I notice a > couple of things that seem bad in BufMgr, but don't understand why > they are like that. > > 1. If we need to allocate a buffer to a new block we do this in one > step, while holding both partition locks for the old and the new tag. > Surely it would cause less contention to make the old block/tag > invalid (after flushing), drop the old partition lock and then switch > to the new one? i.e. just hold one mapping partition lock at a time. > Is there a specific reason we do it this way? I'm not sure but I guess the developer wanted to make the operation atomic. Yura Sokolov is proposing a patch to separte the two partition locks. https://www.postgresql.org/message-id/1edbb61981fe1d99c3f20e3d56d6c88999f4227c.camel%40postgrespro.ru And it seems to me viable for me and a benchmarking in the thread showed a good result. I'd appreciate your input on that thread. > 2. Possibly connected to the above, we issue BufTableInsert() BEFORE > we issue BufTableDelete(). That means we need extra entries in the > buffer mapping hash table to allow us to hold both the old and the new > at the same time, for a short period. The way dynahash.c works, we try Yes. > to allocate an entry from the freelist and if that doesn't work, we > begin searching ALL the freelists for free entries to steal. So if we > get enough people trying to do virtual I/O at the same time, then we > will hit a "freelist storm" where everybody is chasing the last few > entries. It would make more sense if we could do BufTableDelete() To reduce that overhead, Yura proposed a surgically modification on dynahash, but I didn't like that and the latest patch doesn't contain that part. > first, then hold onto the buffer mapping entry rather than add it to > the freelist, so we can use it again when we do BufTableInsert() very > shortly afterwards. That way we wouldn't need to search the freelist > at all. What is the benefit or reason of doing the Delete after the > Insert? Hmm. something like hash_swap_key() or hash_reinsert_entry()? That sounds reasonable. (Yura's proposal was taking out an entry from hash then attach it with a new key again.) > Put that another way, it looks like BufTable functions are used in a > way that mismatches against the way dynahash is designed. > > Thoughts? On the first part, I think Yura's patch works. On the second point, Yura already showed it gives a certain amount of gain if we do that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: