Thread: Re: [BUGS] BUG #1588: pg_autovacuum sleep parameter overflow
Lionel Bouton wrote: > > The following bug has been logged online: > > Bug reference: 1588 > Logged by: Lionel Bouton > Email address: lionel-subscription@bouton.name > PostgreSQL version: 8.0.1 > Operating system: Linux 2.6, glibc 2.3.4 > Description: pg_autovacuum sleep parameter overflow > Details: > > pg_autovacuum doesn't sleep when the -s parameter is set above 2147 > (PostgreSQL then is under constant stress), seems like a 32-bit signed > integer overflow (the time in µs then gets past 2^31-1) somewhere. > > This worked with 7.4.6 (at least with -s 3600). Yes, this is a bug. Right now, pg_autovacuum uses pg_usleep(), which accepts a 'long' value representing the number of microseconds to sleep. Because long is 32-bits on many platforms, it overflows around 2 billion microseconds, which is 2147 seconds. The comment in pg_usleep() verifies it: * On machines where "long" is 32 bits, the maximum delay is ~2000 seconds. Here is a proposed patch that will fix it. There is no reason to use pg_usleep in pg_autovacuum in this area, so we just use sleep(). We don't need micro-second accuracy, and we don't need Win32 to handle signals that arrive during the sleep. However, I am now wondering if we should change pg_usleep() to take a double rather than long. This would avoid such problems in the future in other places in our code. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: contrib/pg_autovacuum/pg_autovacuum.c =================================================================== RCS file: /cvsroot/pgsql/contrib/pg_autovacuum/pg_autovacuum.c,v retrieving revision 1.31 diff -c -c -r1.31 pg_autovacuum.c *** contrib/pg_autovacuum/pg_autovacuum.c 19 Apr 2005 03:35:15 -0000 1.31 --- contrib/pg_autovacuum/pg_autovacuum.c 11 May 2005 14:43:34 -0000 *************** *** 1749,1755 **** fflush(LOGOUTPUT); } ! pg_usleep(sleep_secs * 1000000); /* Larger Pause between outer loops */ gettimeofday(&then, 0); /* Reset time counter */ --- 1749,1764 ---- fflush(LOGOUTPUT); } ! /* Larger Pause between outer loops */ ! /* ! * pg_usleep() is wrong here because its maximum is ~2000 seconds, ! * and we don't need signal interruptability on Win32 here. ! */ ! #ifndef WIN32 ! sleep(sleep_secs); /* Unix sleep is seconds */ ! #else ! sleep(sleep_secs * 1000); /* Win32 sleep() is milliseconds */ ! #endif gettimeofday(&then, 0); /* Reset time counter */
Bruce Momjian wrote: > */ >! #ifndef WIN32 >! sleep(sleep_secs); /* Unix sleep is seconds */ >! #else >! sleep(sleep_secs * 1000); /* Win32 sleep() is milliseconds */ > > Shouldn't the be Sleep with a capital S? see http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/sleep.asp cheers andrew
Andrew Dunstan wrote: > > > Bruce Momjian wrote: > > > */ > >! #ifndef WIN32 > >! sleep(sleep_secs); /* Unix sleep is seconds */ > >! #else > >! sleep(sleep_secs * 1000); /* Win32 sleep() is milliseconds */ > > > > > > Shouldn't the be Sleep with a capital S? see > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/sleep.asp I was wondering about that, so I compiled it with MinGW and sleep() worked fine, so I stuck with that. In fact, I just tried Sleep() and that didn't work. It said: C:/DOCUME~1/BRUCEM~1/LOCALS~1/Temp/ccMLbaaa.o(.text+0x27):x.c: undefined reference to `Sleep' Maybe if I added some includes it would work, but sleep() seemed safest. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian wrote: >Andrew Dunstan wrote: > > >>Bruce Momjian wrote: >> >> >> >>> */ >>>! #ifndef WIN32 >>>! sleep(sleep_secs); /* Unix sleep is seconds */ >>>! #else >>>! sleep(sleep_secs * 1000); /* Win32 sleep() is milliseconds */ >>> >>> >>> >>> >>Shouldn't the be Sleep with a capital S? see >>http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/sleep.asp >> >> > >I was wondering about that, so I compiled it with MinGW and sleep() >worked fine, so I stuck with that. In fact, I just tried Sleep() and >that didn't work. It said: > > C:/DOCUME~1/BRUCEM~1/LOCALS~1/Temp/ccMLbaaa.o(.text+0x27):x.c: undefined > reference to `Sleep' > >Maybe if I added some includes it would work, but sleep() seemed safest. > > > Strange ... as long as you #include <windows.h> it should be fine, and pg_autovacuum.c already does include it. Anyway, whatever works, I guess. cheers andrew
Bruce Momjian <pgman@candle.pha.pa.us> writes: > However, I am now wondering if we should change pg_usleep() to take a > double rather than long. This would avoid such problems in the future > in other places in our code. I'd leave it alone; there aren't any other places that need long sleeps, and I don't really expect them. When and if we have a real need for it, we can change it. regards, tom lane
Andrew Dunstan wrote: > Strange ... as long as you #include <windows.h> it should be fine, and > pg_autovacuum.c already does include it. > > Anyway, whatever works, I guess. > Ah, my test program did not include "windows.h". I found it understood sleep() with just the standard Unix includes. I did not test compiling pg_autovacuum itself. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Patch applied and backpatched to 8.0.X. --------------------------------------------------------------------------- Bruce Momjian wrote: > Index: contrib/pg_autovacuum/pg_autovacuum.c > =================================================================== > RCS file: /cvsroot/pgsql/contrib/pg_autovacuum/pg_autovacuum.c,v > retrieving revision 1.31 > diff -c -c -r1.31 pg_autovacuum.c > *** contrib/pg_autovacuum/pg_autovacuum.c 19 Apr 2005 03:35:15 -0000 1.31 > --- contrib/pg_autovacuum/pg_autovacuum.c 11 May 2005 14:43:34 -0000 > *************** > *** 1749,1755 **** > fflush(LOGOUTPUT); > } > > ! pg_usleep(sleep_secs * 1000000); /* Larger Pause between outer loops */ > > gettimeofday(&then, 0); /* Reset time counter */ > > --- 1749,1764 ---- > fflush(LOGOUTPUT); > } > > ! /* Larger Pause between outer loops */ > ! /* > ! * pg_usleep() is wrong here because its maximum is ~2000 seconds, > ! * and we don't need signal interruptability on Win32 here. > ! */ > ! #ifndef WIN32 > ! sleep(sleep_secs); /* Unix sleep is seconds */ > ! #else > ! sleep(sleep_secs * 1000); /* Win32 sleep() is milliseconds */ > ! #endif > > gettimeofday(&then, 0); /* Reset time counter */ > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073