Buffer Manager and Contention - Mailing list pgsql-hackers

From Simon Riggs
Subject Buffer Manager and Contention
Date
Msg-id CANbhV-F0H-8oB_A+m=55hP0e0QRL=RdDDQuSXMTFt6JPrdX+pQ@mail.gmail.com
Whole thread Raw
Responses Re: Buffer Manager and Contention
List pgsql-hackers
Thinking about poor performance in the case where the data fits in
RAM, but the working set is too big for shared_buffers, I notice a
couple of things that seem bad in BufMgr, but don't understand why
they are like that.

1. If we need to allocate a buffer to a new block we do this in one
step, while holding both partition locks for the old and the new tag.
Surely it would cause less contention to make the old block/tag
invalid (after flushing), drop the old partition lock and then switch
to the new one? i.e. just hold one mapping partition lock at a time.
Is there a specific reason we do it this way?

2. Possibly connected to the above, we issue BufTableInsert() BEFORE
we issue BufTableDelete(). That means we need extra entries in the
buffer mapping hash table to allow us to hold both the old and the new
at the same time, for a short period. The way dynahash.c works, we try
to allocate an entry from the freelist and if that doesn't work, we
begin searching ALL the freelists for free entries to steal. So if we
get enough people trying to do virtual I/O at the same time, then we
will hit a "freelist storm" where everybody is chasing the last few
entries. It would make more sense if we could do BufTableDelete()
first, then hold onto the buffer mapping entry rather than add it to
the freelist, so we can use it again when we do BufTableInsert() very
shortly afterwards. That way we wouldn't need to search the freelist
at all. What is the benefit or reason of doing the Delete after the
Insert?

Put that another way, it looks like BufTable functions are used in a
way that mismatches against the way dynahash is designed.

Thoughts?

-- 
Simon Riggs                http://www.EnterpriseDB.com/



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Design of pg_stat_subscription_workers vs pgstats
Next
From: Fabien COELHO
Date:
Subject: Re: Typo in pgbench messages.