Thread: pgsql: Move btbulkdelete's vacuum_delay_point() call to a place in the
pgsql: Move btbulkdelete's vacuum_delay_point() call to a place in the
From
tgl@postgresql.org (Tom Lane)
Date:
Log Message: ----------- Move btbulkdelete's vacuum_delay_point() call to a place in the loop where we are not holding a buffer content lock; where it was, InterruptHoldoffCount is positive and so we'd not respond to cancel signals as intended. Also add missing vacuum_delay_point() call in btvacuumcleanup. This should fix complaint from Evgeny Gridasov about failure to respond to SIGINT/SIGTERM in a timely fashion (bug #2257). Modified Files: -------------- pgsql/src/backend/access/nbtree: nbtree.c (r1.140 -> r1.141) (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/nbtree/nbtree.c.diff?r1=1.140&r2=1.141)
On Tue, 2006-02-14 at 13:20 -0400, Tom Lane wrote: > Log Message: > ----------- > Move btbulkdelete's vacuum_delay_point() call to a place in the loop where > we are not holding a buffer content lock; where it was, InterruptHoldoffCount > is positive and so we'd not respond to cancel signals as intended. Also > add missing vacuum_delay_point() call in btvacuumcleanup. This should fix > complaint from Evgeny Gridasov about failure to respond to SIGINT/SIGTERM > in a timely fashion (bug #2257). Cool and Interesting. That might explain some pretty dire performance numbers from last week while running auto vacuum. Performance was flat-lining for a while. Still need to investigate further though. Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > On Tue, 2006-02-14 at 13:20 -0400, Tom Lane wrote: >> add missing vacuum_delay_point() call in btvacuumcleanup. > Cool and Interesting. That might explain some pretty dire performance > numbers from last week while running auto vacuum. Performance was > flat-lining for a while. Still need to investigate further though. Yeah, the missing delay would result in a spike in I/O demand from vacuum (auto or otherwise) while processing a big index, if you had vacuum delay configured. GIST had the same problem, too. regards, tom lane
On Tue, 2006-02-14 at 15:55 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > On Tue, 2006-02-14 at 13:20 -0400, Tom Lane wrote: > >> add missing vacuum_delay_point() call in btvacuumcleanup. > > > Cool and Interesting. That might explain some pretty dire performance > > numbers from last week while running auto vacuum. Performance was > > flat-lining for a while. Still need to investigate further though. > > Yeah, the missing delay would result in a spike in I/O demand from > vacuum (auto or otherwise) while processing a big index, if you had > vacuum delay configured. GIST had the same problem, too. Perhaps if vacuum_delay_point() contained a timer check, we'd be able to see if any gap between vacuum delays was more than the actual delay itself. It would be nice to know they are all gone, forever. Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > Perhaps if vacuum_delay_point() contained a timer check, we'd be able to > see if any gap between vacuum delays was more than the actual delay > itself. It would be nice to know they are all gone, forever. vacuum_delay_point is intended to be cheap enough (in the non-delay case) that no one would have any hesitation about dropping it into loops. With a timer check in there, that might not be true, so I'm resistant to doing it unconditionally. But I could see having some #ifdef'd code that could be conditionally compiled in to measure the maximum inter-delay-point time in a development build. regards, tom lane
On Tue, 2006-02-14 at 17:13 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > Perhaps if vacuum_delay_point() contained a timer check, we'd be able to > > see if any gap between vacuum delays was more than the actual delay > > itself. It would be nice to know they are all gone, forever. > > vacuum_delay_point is intended to be cheap enough (in the non-delay > case) that no one would have any hesitation about dropping it into > loops. With a timer check in there, that might not be true, so I'm > resistant to doing it unconditionally. But I could see having some > #ifdef'd code that could be conditionally compiled in to measure the > maximum inter-delay-point time in a development build. I'm thinking something like this might work better, since this is the issue I/we really care about: if (LWLockNumHeldByMe() == 0) pg_usleep(msec * 1000L); else elog(WARNING, "losing sleep because internal locks are held"); or is it not worth losing sleep over? ;-) The additional code is only executed *if* we are going to sleep, so a few extra cycles won't hurt when we are just about to sleep for at least 10 milliseconds. We could make similar checks at most of the other sleep locations, modifying the number of held locks appropriately. Best Regards, Simon Riggs
Attachment
Simon Riggs <simon@2ndquadrant.com> writes: > if (LWLockNumHeldByMe() == 0) > pg_usleep(msec * 1000L); > else > elog(WARNING, "losing sleep because internal locks are held"); I did some testing yesterday with an Assert added to vacuum_delay_point: Assert(InterruptHoldoffCount == 0 && CritSectionCount == 0); which is basically asserting that ProcessInterrupts is allowed to execute an interrupt. I desisted from committing this test, though, because it's not clear that callers should not be allowed to call vacuum_delay_point in such cases --- for the current callers it proved fairly easy to locate the calls at places where this is certain to be OK, but I don't want to wire in a requirement of it. In any case, such tests don't help for the question of whether we've missed any major loops that ought to contain vacuum_delay_point calls. (If we were to add a worst-case-interval measurement, it might be reasonable to not count calls that don't meet the ProcessInterrupts condition.) regards, tom lane