Re: RFC: bufmgr locking changes - Mailing list pgsql-hackers

From Neil Conway
Subject Re: RFC: bufmgr locking changes
Date
Msg-id 87n08fh5b4.fsf@mailbox.samurai.com
Whole thread Raw
In response to Re: RFC: bufmgr locking changes  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: RFC: bufmgr locking changes
List pgsql-hackers
(Sorry Tom, I was meaning to reply to you once I'd had a chance to revise the bufmgr patch; since that seems a fair
waysoff, I figured it would be better to respond now.)
 

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Neil Conway <neilc@samurai.com> writes:
>> we now hold the buffer's meta data lock while doing buffer I/O.
>
> The latter is a really bad idea IMHO.  The io_in_progress lock can be
> held for eons (in CPU terms) and should not be blocking people who
> simply want to bump their refcount up and down.

My reasoning was that the contention for the per-buffer meta data lock
should be pretty low. The io_in_progress lock is held when we're
either faulting a page in or flushing a page out. In the first case,
we can't actually use the buffer no matter how we do the locking (its
content is incomplete), so there's no effective loss in
concurrency. In the second case, what kinds of concurrent activity can
we allow on the buffer? (We can allow reads, of course, but I don't
believe we can allow writes.)

However, I'll think some more on this, you (and Jan, who raised this
point a while ago via IRC) are probably correct.

> It's possible that you could combine the io_in_progress lock with the
> cntx_lock

Yeah, that's a possibility.

>> - Remove SetBufferCommitInfoNeedsSave(). AFAICS, this is now
>>   completely equivalent to WriteNoReleaseBuffer(), so I just removed
>>   the former and replaced all the calls to it with calls to the later.
>
> The reason I've kept the separation was as a form of documentation as to
> the reason for each write.  Although they currently do the same thing,
> that might not always be true.  I'd prefer not to eliminate the
> distinction from the source code --- though I'd not object if you want
> to make SetBufferCommitInfoNeedsSave a macro that invokes the other
> routine.

Ok, fair enough -- I've changed SetBufferCommitInfoNeedsSave() to be a
macro for WriteNoReleaseBuffer().

>> - Make 'PrivateRefCount' an array of uint16s, rather than longs. This
>>   saves 2 bits * shared_buffers per backend on 32-bit machines and 6
>>   bits * shared_buffers per backend on some 64-bit machines. It means
>>   a given backend can only pin a single buffer 65,636 times, but that
>>   should be more than enough. Similarly, made LocalRefCount an array
>>   of uint16s.
>
> I think both of these are ill-considered micro-optimization.  How do you
> know that the pin count can't exceed 64K?  Consider recursive plpgsql
> functions for a likely counterexample.

Fair enough -- I couldn't conceive of an actual scenario in which
a single backend would acquire > 64K pins on a single buffer, but I'll
take your word that it's not so far fetched. However, there is still
room for improvement, IMHO: on a machine with 64-bit longs, we're
plainly allocating 4 bytes more than we need do. Any objection if I
change these to arrays of int32?

> Please put that back.  It is there to avoid unnecessary acquisitions
> of buffer locks during UnlockBuffers (which is executed during any
> transaction abort).  Without it, you will be needing to lock every
> single buffer during an abort in order to check its flags.

It seems bizarre that we need to iterate through a few thousand array
elements just to do some lock cleanup. I'll take a closer look at
this...

-Neil



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: What's left?
Next
From: Claudio Natoli
Date:
Subject: Re: [pgsql-hackers-win32] What's left?