pgsql: Allow Pin/UnpinBuffer to operate in a lockfree manner. - Mailing list pgsql-committers

From Andres Freund
Subject pgsql: Allow Pin/UnpinBuffer to operate in a lockfree manner.
Date
Msg-id E1apSHT-0007xJ-Bh@gemulon.postgresql.org
Whole thread Raw
Responses Re: pgsql: Allow Pin/UnpinBuffer to operate in a lockfree manner.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-committers
Allow Pin/UnpinBuffer to operate in a lockfree manner.

Pinning/Unpinning a buffer is a very frequent operation; especially in
read-mostly cache resident workloads. Benchmarking shows that in various
scenarios the spinlock protecting a buffer header's state becomes a
significant bottleneck. The problem can be reproduced with pgbench -S on
larger machines, but can be considerably worse for queries which touch
the same buffers over and over at a high frequency (e.g. nested loops
over a small inner table).

To allow atomic operations to be used, cram BufferDesc's flags,
usage_count, buf_hdr_lock, refcount into a single 32bit atomic variable;
that allows to manipulate them together using 32bit compare-and-swap
operations. This requires reducing MAX_BACKENDS to 2^18-1 (which could
be lifted by using a 64bit field, but it's not a realistic configuration
atm).

As not all operations can easily implemented in a lockfree manner,
implement the previous buf_hdr_lock via a flag bit in the atomic
variable. That way we can continue to lock the header in places where
it's needed, but can get away without acquiring it in the more frequent
hot-paths.  There's some additional operations which can be done without
the lock, but aren't in this patch; but the most important places are
covered.

As bufmgr.c now essentially re-implements spinlocks, abstract the delay
logic from s_lock.c into something more generic. It now has already two
users, and more are coming up; there's a follupw patch for lwlock.c at
least.

This patch is based on a proof-of-concept written by me, which Alexander
Korotkov made into a fully working patch; the committed version is again
revised by me.  Benchmarking and testing has, amongst others, been
provided by Dilip Kumar, Alexander Korotkov, Robert Haas.

On a large x86 system improvements for readonly pgbench, with a high
client count, of a factor of 8 have been observed.

Author: Alexander Korotkov and Andres Freund
Discussion: 2400449.GjM57CE0Yg@dinodell

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/48354581a49c30f5757c203415aa8412d85b0f70

Modified Files
--------------
contrib/pg_buffercache/pg_buffercache_pages.c |  15 +-
src/backend/storage/buffer/buf_init.c         |   7 +-
src/backend/storage/buffer/bufmgr.c           | 508 +++++++++++++++++---------
src/backend/storage/buffer/freelist.c         |  44 ++-
src/backend/storage/buffer/localbuf.c         |  64 ++--
src/backend/storage/lmgr/s_lock.c             | 206 ++++++-----
src/include/postmaster/postmaster.h           |  15 +-
src/include/storage/buf_internals.h           | 101 +++--
src/include/storage/s_lock.h                  |  18 +
src/tools/pgindent/typedefs.list              |   1 +
10 files changed, 622 insertions(+), 357 deletions(-)


pgsql-committers by date:

Previous
From: Andres Freund
Date:
Subject: pgsql: Avoid the use of a separate spinlock to protect a LWLock's wait
Next
From: Tom Lane
Date:
Subject: pgsql: Fix access-to-already-freed-memory issue in plpython's error han