Re: Move PinBuffer and UnpinBuffer to atomics - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Move PinBuffer and UnpinBuffer to atomics
Date
Msg-id 20150911161421.GB4996@alap3.anarazel.de
Whole thread Raw
In response to Move PinBuffer and UnpinBuffer to atomics  (YUriy Zhuravlev <u.zhuravlev@postgrespro.ru>)
Responses Re: Move PinBuffer and UnpinBuffer to atomics  (YUriy Zhuravlev <u.zhuravlev@postgrespro.ru>)
List pgsql-hackers
Hi,

On 2015-09-11 13:23:24 +0300, YUriy Zhuravlev wrote:
> Continuing the theme: http://www.postgresql.org/message-id/3368228.mTSz6V0Jsq@dinodell

Please don't just start new threads for a new version of the patch.

> This time, we fairly rewrote  'refcount' and 'usage_count' to atomic in
> PinBuffer and UnpinBuffer (but save lock for buffer flags in Unpin).

Hm.

> In the same time it doesn't affect to correctness of buffer manager
> because that variables already have LWLock on top of them (for partition of
> hashtable).

Note that there's a pending patch that removes the buffer mapping locks
entirely.

> If someone pinned buffer after the call StrategyGetBuffer we just try
> again (in BufferAlloc).  Also in the code there is one more check
> before deleting the old buffer, where changes can be rolled back. The
> other functions where it is checked 'refcount' and 'usage_count' put
> exclusive locks.

I don't think this is correct. This way we can leave the for (;;) loop
in BufferAlloc() thinking that the buffer is unused (and can't be further
pinned because of the held spinlock!) while it actually has been pinned
since by PinBuffer(). Additionally oldFlags can get out of sync there.


I don't think the approach of making some of the fields atomics but not
really caring about the rest is going to work. My suggestion is to add a
single 'state' 32bit atomic. This 32bit state is subdivided into:

10bit for flags,
3bit for usage_count,
16bit for refcount

then turn each operation that currently uses one of these fields into
corresponding accesses (just different values for flags, bit-shiftery &
mask for reading usage count, bit mask for reading refcount). The trick
then is to add a *new* flag value BM_LOCKED. This can then act as a sort
of a 'one bit' spinlock.

That should roughly look like (more or less pseudocode):

void
LockBufHdr(BufferDesc *desc)
{   int state = pg_atomic_read_u32(&desc->state);
   for (;;)   {       /* wait till lock is free */       while (unlikely(state & BM_LOCKED))       {
pg_spin_delay();          state = pg_atomic_read_u32(&desc->state);
 
           /* add exponential backoff? Should seldomly be contended tho. */       }
       /* and try to get lock */       if (pg_atomic_compare_exchange_u32(&desc->state, &state, state | BM_LOCKED))
     break;   }
 
}

static bool
PinBuffer(volatile BufferDesc *buf, BufferAccessStrategy strategy)
{
...   if (ref == NULL)   {       ReservePrivateRefCountEntry();       ref = NewPrivateRefCountEntry(b);
...
       int state = pg_atomic_read_u32(&desc->state);       int oldstate = state;
       while (true)       {
           /* spin-wait till lock is free */           while (unlikely(state & BM_LOCKED))           {
pg_spin_delay();              state = pg_atomic_read_u32(&desc->state);           }
 
           /* increase refcount */           state += 1;
           /* increase usagecount unless already max */           if ((state & USAGE_COUNT_MASK) != BM_MAX_USAGE_COUNT)
             state += BM_USAGE_COUNT_ONE;
 
           result = (state & BM_VALID) != 0;
           if (pg_atomic_compare_exchange_u32(&desc->state, &oldstate, state))               break;
           /* get ready for next loop, oldstate has been updated by cas */           state = oldstate;   }
...
}

other callsites can either just plainly continue to use
LockBufHdr/UnlockBufHdr or converted similarly to PinBuffer().

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Thom Brown
Date:
Subject: Re: [PROPOSAL] VACUUM Progress Checker.
Next
From: David Rowley
Date:
Subject: Re: Move PinBuffer and UnpinBuffer to atomics