Thread: pgsql: Move btbulkdelete's vacuum_delay_point() call to a place in the

pgsql: Move btbulkdelete's vacuum_delay_point() call to a place in the

From

tgl@postgresql.org (Tom Lane)

Date:

14 February 2006, 13:20:06

Log Message:
-----------
Move btbulkdelete's vacuum_delay_point() call to a place in the loop where
we are not holding a buffer content lock; where it was, InterruptHoldoffCount
is positive and so we'd not respond to cancel signals as intended.  Also
add missing vacuum_delay_point() call in btvacuumcleanup.  This should fix
complaint from Evgeny Gridasov about failure to respond to SIGINT/SIGTERM
in a timely fashion (bug #2257).

Modified Files:
--------------
    pgsql/src/backend/access/nbtree:
        nbtree.c (r1.140 -> r1.141)
        (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/nbtree/nbtree.c.diff?r1=1.140&r2=1.141)

Re: pgsql: Move btbulkdelete's vacuum_delay_point()

From

Simon Riggs

Date:

14 February 2006, 16:46:53

On Tue, 2006-02-14 at 13:20 -0400, Tom Lane wrote:
> Log Message:
> -----------
> Move btbulkdelete's vacuum_delay_point() call to a place in the loop where
> we are not holding a buffer content lock; where it was, InterruptHoldoffCount
> is positive and so we'd not respond to cancel signals as intended.  Also
> add missing vacuum_delay_point() call in btvacuumcleanup.  This should fix
> complaint from Evgeny Gridasov about failure to respond to SIGINT/SIGTERM
> in a timely fashion (bug #2257).

Cool and Interesting. That might explain some pretty dire performance
numbers from last week while running auto vacuum. Performance was
flat-lining for a while. Still need to investigate further though.

Best Regards, Simon Riggs

Re: pgsql: Move btbulkdelete's vacuum_delay_point()

From

Tom Lane

Date:

14 February 2006, 16:55:18

Simon Riggs <simon@2ndquadrant.com> writes:
> On Tue, 2006-02-14 at 13:20 -0400, Tom Lane wrote:
>> add missing vacuum_delay_point() call in btvacuumcleanup.

> Cool and Interesting. That might explain some pretty dire performance
> numbers from last week while running auto vacuum. Performance was
> flat-lining for a while. Still need to investigate further though.

Yeah, the missing delay would result in a spike in I/O demand from
vacuum (auto or otherwise) while processing a big index, if you had
vacuum delay configured.  GIST had the same problem, too.

            regards, tom lane

Re: pgsql: Move btbulkdelete's vacuum_delay_point()

From

Simon Riggs

Date:

14 February 2006, 18:00:14

On Tue, 2006-02-14 at 15:55 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Tue, 2006-02-14 at 13:20 -0400, Tom Lane wrote:
> >> add missing vacuum_delay_point() call in btvacuumcleanup.
>
> > Cool and Interesting. That might explain some pretty dire performance
> > numbers from last week while running auto vacuum. Performance was
> > flat-lining for a while. Still need to investigate further though.
>
> Yeah, the missing delay would result in a spike in I/O demand from
> vacuum (auto or otherwise) while processing a big index, if you had
> vacuum delay configured.  GIST had the same problem, too.

Perhaps if vacuum_delay_point() contained a timer check, we'd be able to
see if any gap between vacuum delays was more than the actual delay
itself. It would be nice to know they are all gone, forever.

Best Regards, Simon Riggs

Re: pgsql: Move btbulkdelete's vacuum_delay_point()

From

Tom Lane

Date:

14 February 2006, 18:13:25

Simon Riggs <simon@2ndquadrant.com> writes:
> Perhaps if vacuum_delay_point() contained a timer check, we'd be able to
> see if any gap between vacuum delays was more than the actual delay
> itself. It would be nice to know they are all gone, forever.

vacuum_delay_point is intended to be cheap enough (in the non-delay
case) that no one would have any hesitation about dropping it into
loops.  With a timer check in there, that might not be true, so I'm
resistant to doing it unconditionally.  But I could see having some
#ifdef'd code that could be conditionally compiled in to measure the
maximum inter-delay-point time in a development build.

            regards, tom lane

Re: pgsql: Move btbulkdelete's vacuum_delay_point()

From

Simon Riggs

Date:

15 February 2006, 11:52:35

On Tue, 2006-02-14 at 17:13 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Perhaps if vacuum_delay_point() contained a timer check, we'd be able to
> > see if any gap between vacuum delays was more than the actual delay
> > itself. It would be nice to know they are all gone, forever.
>
> vacuum_delay_point is intended to be cheap enough (in the non-delay
> case) that no one would have any hesitation about dropping it into
> loops.  With a timer check in there, that might not be true, so I'm
> resistant to doing it unconditionally.  But I could see having some
> #ifdef'd code that could be conditionally compiled in to measure the
> maximum inter-delay-point time in a development build.

I'm thinking something like this might work better, since this is the
issue I/we really care about:

   if (LWLockNumHeldByMe() == 0)
       pg_usleep(msec * 1000L);
   else
        elog(WARNING, "losing sleep because internal locks are held");

or is it not worth losing sleep over? ;-)

The additional code is only executed *if* we are going to sleep, so a
few extra cycles won't hurt when we are just about to sleep for at least
10 milliseconds.

We could make similar checks at most of the other sleep locations,
modifying the number of held locks appropriately.

Best Regards, Simon Riggs

Attachment

sleepeasy.patch

Re: pgsql: Move btbulkdelete's vacuum_delay_point()

From

Tom Lane

Date:

15 February 2006, 12:08:52

Simon Riggs <simon@2ndquadrant.com> writes:
>    if (LWLockNumHeldByMe() == 0)
>        pg_usleep(msec * 1000L);
>    else
>         elog(WARNING, "losing sleep because internal locks are held");

I did some testing yesterday with an Assert added to vacuum_delay_point:
    Assert(InterruptHoldoffCount == 0 && CritSectionCount == 0);
which is basically asserting that ProcessInterrupts is allowed to
execute an interrupt.  I desisted from committing this test, though,
because it's not clear that callers should not be allowed to call
vacuum_delay_point in such cases --- for the current callers it proved
fairly easy to locate the calls at places where this is certain to be OK,
but I don't want to wire in a requirement of it.

In any case, such tests don't help for the question of whether we've
missed any major loops that ought to contain vacuum_delay_point calls.
(If we were to add a worst-case-interval measurement, it might be
reasonable to not count calls that don't meet the ProcessInterrupts
condition.)

            regards, tom lane