Re: 'COPY ... FROM' inserts to btree, blocks on buffer - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: 'COPY ... FROM' inserts to btree, blocks on buffer |
Date | |
Msg-id | 1104594249.3978.1239.camel@localhost.localdomain Whole thread Raw |
In response to | Re: 'COPY ... FROM' inserts to btree, blocks on buffer writeout (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: 'COPY ... FROM' inserts to btree, blocks on buffer writeout
|
List | pgsql-hackers |
On Sat, 2005-01-01 at 00:42, Tom Lane wrote: > I wrote: > > A possible fix for this is to reorder the operations in FlushBuffer > > so that we share-lock the buffer before releasing BufMgrLock ... but > > I'm not sure that doesn't introduce other deadlock risks. > > That's too simplistic, since at least one caller of FlushBuffer is > trying to write pages that may be in active use. If the target page > is already exclusive-locked by another process then a deadlock between > the per-buffer lock and BufMgrLock is entirely possible. > > I think that it would work for BufferAlloc to share-lock the victim > buffer before calling FlushBuffer; we'd have to add a bool parameter to > FlushBuffer telling it the lock was already acquired. This is safe in > this particular path because a buffer that was just taken from the > freelist should not be exclusive-locked by anyone. (It could possibly > be share-locked by the bgwriter, but that won't cause a deadlock.) > A process that wanted exclusive lock would first have to pin the buffer, > which can't happen before we release the BufMgrLock. By adding the > parameter we'd avoid changing the behavior for other callers. > > Comments, better ideas? I think the proposal sounds safe, though I worry about performance. This situation occurs because an index block in cache was thought to be a block replacement candidate, then found to be useful by another backend which attempts to start using it before the flush occurs. ...which hints at the shared_buffers being set too low. AFAICS we need not consider losing timeslices as the explanation for concurrent activity. This situation is much more easily possible with multi-CPU systems, especially when many single core systems have hyperthreaded virtual CPUs and the default shared_buffers of 8Mb has not been changed. The effect of the proposed change would be to favour the removal of the block from the cache rather than favoring the reuse of it. I think the favour in this situation should go towards the reuse, forcing the backend searching for a free buffer to search again. There are already 2 other cases in BufferAlloc which cater for when two backends want the same block, which hints towards the importance of favouring the reuse: this is just one more case. My suggestion: LockBuffer in FlushBuffer should return as unsuccessful if there is an LW_EXCLUSIVE lock already held, causing another iteration of the do while loop in BufferAlloc. Doing that would be faster for both backends since neither has to wait for I/O, as well as avoiding the deadlock situation that has been uncovered. If the backend searches for a block replacement candidate and fails more than once, we should consider warning "shared buffers too low..." Later, I'd like to consider changing the way StrategyGetBuffer works, since it always stops at the first unpinned buffer and doesn't consider whether it is dirty or not in its decision to use it. It seems like it would be worth reading a *few* buffers further on to see if a more appropriate choice could have been made now that we have bgwriter to eventually clean the buffers. -- Best Regards, Simon Riggs
pgsql-hackers by date: