Thread: Avoiding overflow in timeout-related calculations

Avoiding overflow in timeout-related calculations

From

Tom Lane

Date:

18 November 2012, 19:57:57

The discussion of bug #7670 showed that what's happening there is that
if you specify a log_rotation_age of more than 25 days (2^31 msec),
WaitLatch will sometimes be passed a timeout of more than 2^31 msec,
leading to unportable behavior.  At least some kernels will return
EINVAL for that, and it's not very clear what will happen on others.

After some thought about this, I think the best thing to do is to tweak
syslogger.c to to clamp the requested sleep to INT_MAX msec.  The fact
that a couple of people have tried to set log_rotation_age to 30 days or
more suggests that it's useful, so reducing the GUC's upper limit isn't
a desirable fix.  This should be an easy change since the logic in that
loop will already behave correctly if it's woken up before the requested
rotation time.

I went looking for other timeout-related GUC variables that might have
overoptimistic upper limits, and found these cases:

PostAuthDelay is converted to microseconds, and therefore had better
have an upper bound of INT_MAX/1000000 seconds.  (Larger values would
work on machines with 64-bit long, but it doesn't seem worth
complicating the code to allow for that.)

contrib/auth_delay's auth_delay_milliseconds is likewise converted
to microseconds, so its max had better be INT_MAX/1000 msec.  (Likewise,
I don't see any value in supporting larger limits.)

XLogArchiveTimeout, although it's never multiplied by anything,
is compared to ((int) (now - last_xlog_switch_time))
which means that its nominal maximum of INT_MAX is problematic:
if you actually set it to that it would be quite possible for
the system to miss the timeout occurrence if it didn't chance
to service the timeout signal in exactly the last second of the
interval.  (After that second, the above-quoted expression would
overflow and lead to the wrong comparison result.)  So it seems
to me we'd better back it off some.  I'm inclined to propose
dropping it to INT_MAX/2 seconds.

Comments, objections?
        regards, tom lane

Re: Avoiding overflow in timeout-related calculations

From

Andres Freund

Date:

18 November 2012, 20:14:32

On 2012-11-18 14:57:51 -0500, Tom Lane wrote:
> The discussion of bug #7670 showed that what's happening there is that
> if you specify a log_rotation_age of more than 25 days (2^31 msec),
> WaitLatch will sometimes be passed a timeout of more than 2^31 msec,
> leading to unportable behavior.  At least some kernels will return
> EINVAL for that, and it's not very clear what will happen on others.
>
> After some thought about this, I think the best thing to do is to tweak
> syslogger.c to to clamp the requested sleep to INT_MAX msec.  The fact
> that a couple of people have tried to set log_rotation_age to 30 days or
> more suggests that it's useful, so reducing the GUC's upper limit isn't
> a desirable fix.  This should be an easy change since the logic in that
> loop will already behave correctly if it's woken up before the requested
> rotation time.

Cool. Agreed.

> I went looking for other timeout-related GUC variables that might have
> overoptimistic upper limits, and found these cases:
>
> [sensible stuff]

Lowering the maximum of those seems sensible to me. Anybody using that
large value for those already had a problem even if it worked.

I think at least wal_sender_timeout and wal_receiver_timeout are also
problematic.

Greetings,

Andres

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Avoiding overflow in timeout-related calculations

From

Tom Lane

Date:

18 November 2012, 20:21:38

Andres Freund <andres@anarazel.de> writes:
> I think at least wal_sender_timeout and wal_receiver_timeout are also
> problematic.

I looked at those and didn't see a problem --- what are you worried
about exactly?
        regards, tom lane

Re: Avoiding overflow in timeout-related calculations

From

Andres Freund

Date:

18 November 2012, 20:39:49

On 2012-11-18 15:21:34 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I think at least wal_sender_timeout and wal_receiver_timeout are also
> > problematic.
> 
> I looked at those and didn't see a problem --- what are you worried
> about exactly?

Forget it, too hungry to read the code properly...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services