Home > mailing lists

Re: BufFreelistLock - Mailing list pgsql-hackers

From	Jeff Janes
Subject	Re: BufFreelistLock
Date	December 9, 2010 04:44:22
Msg-id	AANLkTin3zLXNeft_55BTyEMc3Y9sX4aJBjxcASbxBO4b@mail.gmail.com Whole thread Raw
In response to	Re: BufFreelistLock (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: BufFreelistLock
List	pgsql-hackers

Tree view

On Wed, Dec 8, 2010 at 8:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> I think that the BufFreelistLock can be a contention bottleneck on a
>> system with a lot of CPUs that do a lot of shared-buffer allocations
>> which can fulfilled by the OS buffer cache.
>
> Really?  buffer/README says
>
>  The buffer
>  management policy is designed so that BufFreelistLock need not be taken
>  except in paths that will require I/O, and thus will be slow anyway.

True, but very large memory means they often don't require true disk I/O anyway.

> It's hard to see how it's going to be much of a problem if you're going
> to be doing kernel calls as well.

Are kernels calls really all that slow?  I thought they had been
greatly optimized on recent hardware and kernels.
I'm not sure how to create a test case to distinguish that.

> Is the test case you're looking at
> really representative of any common situation?

That's always the question.  I took the "pick a random number and use
it to look up a pgbench_accounts by primary key" logic from pgbench
-S,
and but it into a stored procedure where it loops 10,000 times, to
remove the overhead of ping-ponging messages back and forth for every
query.
(But doing so also removes the overhead of taking AccessShareLock for
every select, so those two changes are entangled.)

This type of workload could be representative of a nested loop join.

I started looking into it because someone
(http://archives.postgresql.org/pgsql-performance/2010-11/msg00350.php)
thought that that pgbench -S might more or less match their real world
work load.  But by the time I moved most of selecting into a stored
procedure, maybe it no longer does (it's not even clear if they were
using prepared statements).  But separating things into their
component potential bottlenecks, which do you tackle first?  The more
fundamental.  The easiest to analyze.  The one that can't be gotten
around by fine-tuning.  The more interesting :).

>> 1) Would it be useful for BufFreelistLock be partitioned, like
>> BufMappingLock, or via some kind of clever "virtual partitioning" that
>> could get the same benefit via another means?
>
> Maybe, but you could easily end up with a net loss if the partitioning
> makes buffer allocation significantly stupider (ie, higher probability
> of picking a less-than-optimal buffer to recycle).
>
>> For the clock sweep algorithm, I think you could access
>> nextVictimBuffer without any type of locking.
>
> This is wrong, mainly because you wouldn't have any security against two
> processes decrementing the usage count of the same buffer because they'd
> fetched the same value of nextVictimBuffer.  That would probably happen
> often enough to severely compromise the accuracy of the usage counts and
> thus the accuracy of the LRU eviction behavior.  See above.

Ah, I hadn't considered that.

Cheers,

Jeff

pgsql-hackers by date:

From: Tom Lane
Date: 09 December 2010, 04:29:03
Subject: Re: [PERFORM] Slow BLOBs restoring

From: Simon Riggs
Date: 09 December 2010, 06:52:27
Subject: Re: Hot Standby btree delete records and vacuum_defer_cleanup_age

Re: BufFreelistLock - Mailing list pgsql-hackers

Previous

Next