Thread: More idle thoughts

More idle thoughts

From

Greg Stark

Date:

26 March 2010, 15:59:44

The Linux kernel had a big push to reduce latency, and one of the
tricks they did was they replaced the usual interrupt points with a
call which noted how long it had been since the last interrupt point.
It occurs to me we could do the same for CHECK_FOR_INTERRUPTS() by
conditionally having it call a function which calls gettimeofday and
compares with the previous timestamp received at the last CFI().

I think the only gotcha would be at CHECK_FOR_INTERRUPTS() calls which
occur after a syscall that an actual interrupt would interrupt. They
might show a long wait even though an actual interrupt would trigger
them right away --  the two that come to mind are reading from the
client and blocking on a semaphore. So I think we would need to add a
parameter to indicate if that's the case for a given call-site. I
think it would be easy to know what value to put there because I think
we always do a CFI() immediately after such syscalls but I'm not 100%
sure that's the case.

Obviously this wouldn't run all the time. I'm not even sure it should
run on the build farm because I think doing an extra syscall at these
places might mask timing bugs by synchronizing the bus in a lot of
places. But even a few build farm animals might uncover places where
we don't respond to C-c or hold up the sinval messages etc.

It also doesn't replace our current method of responding to user
complaints -- many if not all of them are relatively subtle cases
where the user is doing something unusual to create a loop that
doesn't normally occur. But we won't know that unless we try.

-- 
greg

Re: More idle thoughts

From

Tom Lane

Date:

26 March 2010, 16:40:42

Greg Stark <stark@mit.edu> writes:
> The Linux kernel had a big push to reduce latency, and one of the
> tricks they did was they replaced the usual interrupt points with a
> call which noted how long it had been since the last interrupt point.
> It occurs to me we could do the same for CHECK_FOR_INTERRUPTS() by
> conditionally having it call a function which calls gettimeofday and
> compares with the previous timestamp received at the last CFI().

Hmm.  The only thing you could find out is which CFI call was reporting
the long delay, not what the path of control in between had been.  If
the long computation had been down inside some function that's no longer
active, this wouldn't help much to track it down.  Still, it'd be better
than no data at all.

> I think the only gotcha would be at CHECK_FOR_INTERRUPTS() calls which
> occur after a syscall that an actual interrupt would interrupt. They
> might show a long wait even though an actual interrupt would trigger
> them right away --  the two that come to mind are reading from the
> client and blocking on a semaphore. So I think we would need to add a
> parameter to indicate if that's the case for a given call-site. I
> think it would be easy to know what value to put there because I think
> we always do a CFI() immediately after such syscalls but I'm not 100%
> sure that's the case.

I'm afraid you're being too optimistic about that.  What I'd think about
is just adding an extra call to reset the delay-time counter after each
such syscall, rather than having two kinds of CFI.

> Obviously this wouldn't run all the time.

Yeah, it would increase the overhead of CFI by orders of magnitude,
so you wouldn't even want to think about building a production version
that way.  But it might be a useful testing option.
        regards, tom lane

Re: More idle thoughts

From

Simon Riggs

Date:

28 March 2010, 13:32:25

On Fri, 2010-03-26 at 18:59 +0000, Greg Stark wrote:

> The Linux kernel had a big push to reduce latency, and one of the
> tricks they did was they replaced the usual interrupt points with a
> call which noted how long it had been since the last interrupt point.
> It occurs to me we could do the same for CHECK_FOR_INTERRUPTS() by
> conditionally having it call a function which calls gettimeofday and
> compares with the previous timestamp received at the last CFI().

Reducing latency sounds good, but what has CFI got to do with that?

-- Simon Riggs           www.2ndQuadrant.com

Re: More idle thoughts

From

Tom Lane

Date:

28 March 2010, 13:47:17

Simon Riggs <simon@2ndQuadrant.com> writes:
> On Fri, 2010-03-26 at 18:59 +0000, Greg Stark wrote:
>> It occurs to me we could do the same for CHECK_FOR_INTERRUPTS() by
>> conditionally having it call a function which calls gettimeofday and
>> compares with the previous timestamp received at the last CFI().

> Reducing latency sounds good, but what has CFI got to do with that?

It took me about five minutes to figure out what Greg was on about too.
His point is that we need to locate code paths in which an extremely
long time can pass between successive CFI calls, because that means
the backend will fail to respond to SIGINT/SIGTERM for a long time.
Instrumenting CFI itself is a possible tool for that.
        regards, tom lane

Re: More idle thoughts

From

Simon Riggs

Date:

29 March 2010, 05:15:44

On Sun, 2010-03-28 at 12:47 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > On Fri, 2010-03-26 at 18:59 +0000, Greg Stark wrote:
> >> It occurs to me we could do the same for CHECK_FOR_INTERRUPTS() by
> >> conditionally having it call a function which calls gettimeofday and
> >> compares with the previous timestamp received at the last CFI().
> 
> > Reducing latency sounds good, but what has CFI got to do with that?
> 
> It took me about five minutes to figure out what Greg was on about too.
> His point is that we need to locate code paths in which an extremely
> long time can pass between successive CFI calls, because that means
> the backend will fail to respond to SIGINT/SIGTERM for a long time.
> Instrumenting CFI itself is a possible tool for that.

I was thinking we could do this via signals, but actually instrumenting
the code paths seems better.

There probably are a few paths still to improve. Dare I suggest we
follow the tried and tested open source approach of wait-for-complaint?

Reducing latency elsewhere would be time better spent (!). I was
thinking of adding a "reason" field onto ReadBuffer, so we can diagnose
the source of buffer waits.

-- Simon Riggs           www.2ndQuadrant.com