On 2015-08-31 13:54:57 +0300, YUriy Zhuravlev wrote:
> We have noticed s_lock in PinBuffer and UnpinBuffer. For the test we
> rewrited PinBuffer and UnpinBuffer by atomic operations and we liked
> the result. Degradation of performance almost completely disappeared,
> and went scaling up to 400 clients (4 NUMA nodes with 256 "CPUs").
>
> To scale Postgres for large NUMA machine must be ported to the atomic
> operations bufmgr. During our tests, we no found errors in our patch, but most
> likely it is not true and this patch only for test.
I agree that this is necessary, and it matches with what I've
profiled.
Unfortunately I don't think the patch can be quite as simple as
presented here - we rely on the exclusion provided by the spinlock in a
bunch of places for more than the manipulation of individual values. And
those currently are only correct if there's no other possible
manipulations going on. But it's definitely doable.
I've initial patch doing this, but for me at it seemed to be necessary
to merge flags, usage and refcount into a single value - otherwise the
consistency is hard to maintain because you can't do a CAS over all
values.
> diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
> index 3ae2848..5fdaca7 100644
> --- a/src/backend/storage/buffer/buf_init.c
> +++ b/src/backend/storage/buffer/buf_init.c
> @@ -95,9 +95,9 @@ InitBufferPool(void)
> BufferDesc *buf = GetBufferDescriptor(i);
>
> CLEAR_BUFFERTAG(buf->tag);
> - buf->flags = 0;
> - buf->usage_count = 0;
> - buf->refcount = 0;
> + buf->flags.value = 0;
> + buf->usage_count.value = 0;
> + buf->refcount.value = 0;
> buf->wait_backend_pid = 0;
That's definitely not correct, you should initialize the atomics using
pg_atomic_init_u32() and write to by using pg_atomic_write_u32() - not
access them directly. This breaks the fallback paths.
Greetings,
Andres Freund