Thread: Avoiding overflow in timeout-related calculations
The discussion of bug #7670 showed that what's happening there is that if you specify a log_rotation_age of more than 25 days (2^31 msec), WaitLatch will sometimes be passed a timeout of more than 2^31 msec, leading to unportable behavior. At least some kernels will return EINVAL for that, and it's not very clear what will happen on others. After some thought about this, I think the best thing to do is to tweak syslogger.c to to clamp the requested sleep to INT_MAX msec. The fact that a couple of people have tried to set log_rotation_age to 30 days or more suggests that it's useful, so reducing the GUC's upper limit isn't a desirable fix. This should be an easy change since the logic in that loop will already behave correctly if it's woken up before the requested rotation time. I went looking for other timeout-related GUC variables that might have overoptimistic upper limits, and found these cases: PostAuthDelay is converted to microseconds, and therefore had better have an upper bound of INT_MAX/1000000 seconds. (Larger values would work on machines with 64-bit long, but it doesn't seem worth complicating the code to allow for that.) contrib/auth_delay's auth_delay_milliseconds is likewise converted to microseconds, so its max had better be INT_MAX/1000 msec. (Likewise, I don't see any value in supporting larger limits.) XLogArchiveTimeout, although it's never multiplied by anything, is compared to ((int) (now - last_xlog_switch_time)) which means that its nominal maximum of INT_MAX is problematic: if you actually set it to that it would be quite possible for the system to miss the timeout occurrence if it didn't chance to service the timeout signal in exactly the last second of the interval. (After that second, the above-quoted expression would overflow and lead to the wrong comparison result.) So it seems to me we'd better back it off some. I'm inclined to propose dropping it to INT_MAX/2 seconds. Comments, objections? regards, tom lane
On 2012-11-18 14:57:51 -0500, Tom Lane wrote: > The discussion of bug #7670 showed that what's happening there is that > if you specify a log_rotation_age of more than 25 days (2^31 msec), > WaitLatch will sometimes be passed a timeout of more than 2^31 msec, > leading to unportable behavior. At least some kernels will return > EINVAL for that, and it's not very clear what will happen on others. > > After some thought about this, I think the best thing to do is to tweak > syslogger.c to to clamp the requested sleep to INT_MAX msec. The fact > that a couple of people have tried to set log_rotation_age to 30 days or > more suggests that it's useful, so reducing the GUC's upper limit isn't > a desirable fix. This should be an easy change since the logic in that > loop will already behave correctly if it's woken up before the requested > rotation time. Cool. Agreed. > I went looking for other timeout-related GUC variables that might have > overoptimistic upper limits, and found these cases: > > [sensible stuff] Lowering the maximum of those seems sensible to me. Anybody using that large value for those already had a problem even if it worked. I think at least wal_sender_timeout and wal_receiver_timeout are also problematic. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@anarazel.de> writes: > I think at least wal_sender_timeout and wal_receiver_timeout are also > problematic. I looked at those and didn't see a problem --- what are you worried about exactly? regards, tom lane
On 2012-11-18 15:21:34 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I think at least wal_sender_timeout and wal_receiver_timeout are also > > problematic. > > I looked at those and didn't see a problem --- what are you worried > about exactly? Forget it, too hungry to read the code properly... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services