On Fri, Jan 6, 2012 at 12:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I started to wonder how likely it would be that some other process would
> sit on a buffer pin for so long as to allow 134217727 iterations of
> ReadBufferExtended, even given the slowdowns associated with
> CLOBBER_CACHE_ALWAYS. This led to some fruitless searching for possible
> deadlock conditions, but eventually I realized that there's a much
> simpler explanation: if PrivateRefCount > 1 then
> ConditionalLockBufferForCleanup always fails. This means that once
> ConditionalLockBufferForCleanup has failed once, the currently committed
> code in lazy_vacuum_heap is guaranteed to loop until it gets a failure
> in enlarging the ResourceOwner buffer-reference array. Which in turn
> means that neither of you ever actually exercised the skip case, or
> you'd have noticed the problem. Nor are the current regression tests
> well designed to exercise the case. (There might well be failures of
> this type happening from time to time in autovacuum, but we'd not see
> any reported failure in the buildfarm, unless we went trawling in
> postmaster logs.)
>
> So at this point I've got serious doubts as to the quality of testing of
> that whole patch, not just this part.
I tested the case where we skip a block during the first pass, but I
admit that I punted on testing the case where we skip a block during
the second pass, because I couldn't think of a good way to exercise
it. Any suggestions?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company