Re: BufFreelistLock - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: BufFreelistLock
Date
Msg-id AANLkTin3zLXNeft_55BTyEMc3Y9sX4aJBjxcASbxBO4b@mail.gmail.com
Whole thread Raw
In response to Re: BufFreelistLock  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BufFreelistLock
List pgsql-hackers
On Wed, Dec 8, 2010 at 8:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> I think that the BufFreelistLock can be a contention bottleneck on a
>> system with a lot of CPUs that do a lot of shared-buffer allocations
>> which can fulfilled by the OS buffer cache.
>
> Really?  buffer/README says
>
>  The buffer
>  management policy is designed so that BufFreelistLock need not be taken
>  except in paths that will require I/O, and thus will be slow anyway.

True, but very large memory means they often don't require true disk I/O anyway.

> It's hard to see how it's going to be much of a problem if you're going
> to be doing kernel calls as well.

Are kernels calls really all that slow?  I thought they had been
greatly optimized on recent hardware and kernels.
I'm not sure how to create a test case to distinguish that.

> Is the test case you're looking at
> really representative of any common situation?

That's always the question.  I took the "pick a random number and use
it to look up a pgbench_accounts by primary key" logic from pgbench
-S,
and but it into a stored procedure where it loops 10,000 times, to
remove the overhead of ping-ponging messages back and forth for every
query.
(But doing so also removes the overhead of taking AccessShareLock for
every select, so those two changes are entangled.)

This type of workload could be representative of a nested loop join.

I started looking into it because someone
(http://archives.postgresql.org/pgsql-performance/2010-11/msg00350.php)
thought that that pgbench -S might more or less match their real world
work load.  But by the time I moved most of selecting into a stored
procedure, maybe it no longer does (it's not even clear if they were
using prepared statements).  But separating things into their
component potential bottlenecks, which do you tackle first?  The more
fundamental.  The easiest to analyze.  The one that can't be gotten
around by fine-tuning.  The more interesting :).



>> 1) Would it be useful for BufFreelistLock be partitioned, like
>> BufMappingLock, or via some kind of clever "virtual partitioning" that
>> could get the same benefit via another means?
>
> Maybe, but you could easily end up with a net loss if the partitioning
> makes buffer allocation significantly stupider (ie, higher probability
> of picking a less-than-optimal buffer to recycle).
>
>> For the clock sweep algorithm, I think you could access
>> nextVictimBuffer without any type of locking.
>
> This is wrong, mainly because you wouldn't have any security against two
> processes decrementing the usage count of the same buffer because they'd
> fetched the same value of nextVictimBuffer.  That would probably happen
> often enough to severely compromise the accuracy of the usage counts and
> thus the accuracy of the LRU eviction behavior.  See above.

Ah, I hadn't considered that.


Cheers,

Jeff


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PERFORM] Slow BLOBs restoring
Next
From: Simon Riggs
Date:
Subject: Re: Hot Standby btree delete records and vacuum_defer_cleanup_age