Thread: autovacuum daemon stops doing work after about an hour
I took advantage of last weekend to upgrade from 7.2.4 to 7.4.0 on a new faster box. Now I'm trying to implement pg_autovacuum. It seems to work ok, but after about an hour or so, it does nothing. The process still is running, but nothing is sent to the log file. I'm running the daemon as distributed with PG 7.4 release as follows: pg_autovacuum -d4 -V 0.15 -A 1 -U postgres -L /var/tmp/autovacuum.log -D the last few lines of the log are: [2003-12-02 11:43:58 AM] VACUUM ANALYZE "public"."msg_recipients" [2003-12-02 12:24:33 PM] select relfilenode,reltuples,relpages from pg_class where relfilenode=18588239 [2003-12-02 12:24:33 PM] table name: vkmlm."public"."msg_recipients" [2003-12-02 12:24:33 PM] relfilenode: 18588239; relisshared: 0 [2003-12-02 12:24:33 PM] reltuples: 9; relpages: 529132 [2003-12-02 12:24:33 PM] curr_analyze_count: 1961488; cur_delete_count: 1005040 [2003-12-02 12:24:33 PM] ins_at_last_analyze: 1961488; del_at_last_vacuum: 1005040 [2003-12-02 12:24:33 PM] insert_threshold: 509; delete_threshold 1001 [2003-12-02 12:24:33 PM] Performing: VACUUM ANALYZE "public"."user_list" [2003-12-02 12:24:33 PM] VACUUM ANALYZE "public"."user_list" [2003-12-02 12:43:19 PM] select relfilenode,reltuples,relpages from pg_class where relfilenode=18588202 [2003-12-02 12:43:19 PM] table name: vkmlm."public"."user_list" [2003-12-02 12:43:19 PM] relfilenode: 18588202; relisshared: 0 [2003-12-02 12:43:19 PM] reltuples: 9; relpages: 391988 [2003-12-02 12:43:19 PM] curr_analyze_count: 1159843; cur_delete_count: 1118540 [2003-12-02 12:43:19 PM] ins_at_last_analyze: 1159843; del_at_last_vacuum: 1118540 [2003-12-02 12:43:19 PM] insert_threshold: 509; delete_threshold 1001 Then it just sits there. I started it at 11:35am, and it is now 3:30pm. I did the same last night at about 10:58pm, and it ran and did work until 11:57pm, then sat there until I killed/restarted pg_autovacuum this morning at 11:35. The process is not using any CPU time. I just killed/restarted it and it found work to do on my busy tables which I'd expect. I'm running Postgres 7.4 release on FreeBSD 4.9-RELEASE. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
On Tue, 2003-12-02 at 15:37, Vivek Khera wrote: > Now I'm trying to implement pg_autovacuum. It seems to work ok, but > after about an hour or so, it does nothing. The process still is > running, but nothing is sent to the log file. > > I'm running the daemon as distributed with PG 7.4 release as follows: > > pg_autovacuum -d4 -V 0.15 -A 1 -U postgres -L /var/tmp/autovacuum.log -D > > the last few lines of the log are: > > [2003-12-02 11:43:58 AM] VACUUM ANALYZE "public"."msg_recipients" > [2003-12-02 12:24:33 PM] select relfilenode,reltuples,relpages from pg_class where relfilenode=18588239 > [2003-12-02 12:24:33 PM] table name: vkmlm."public"."msg_recipients" > [2003-12-02 12:24:33 PM] relfilenode: 18588239; relisshared: 0 > [2003-12-02 12:24:33 PM] reltuples: 9; relpages: 529132 > [2003-12-02 12:24:33 PM] curr_analyze_count: 1961488; cur_delete_count: 1005040 > [2003-12-02 12:24:33 PM] ins_at_last_analyze: 1961488; del_at_last_vacuum: 1005040 > [2003-12-02 12:24:33 PM] insert_threshold: 509; delete_threshold 1001 > [2003-12-02 12:24:33 PM] Performing: VACUUM ANALYZE "public"."user_list" > [2003-12-02 12:24:33 PM] VACUUM ANALYZE "public"."user_list" > [2003-12-02 12:43:19 PM] select relfilenode,reltuples,relpages from pg_class where relfilenode=18588202 > [2003-12-02 12:43:19 PM] table name: vkmlm."public"."user_list" > [2003-12-02 12:43:19 PM] relfilenode: 18588202; relisshared: 0 > [2003-12-02 12:43:19 PM] reltuples: 9; relpages: 391988 > [2003-12-02 12:43:19 PM] curr_analyze_count: 1159843; cur_delete_count: 1118540 > [2003-12-02 12:43:19 PM] ins_at_last_analyze: 1159843; del_at_last_vacuum: 1118540 > [2003-12-02 12:43:19 PM] insert_threshold: 509; delete_threshold 1001 > > Then it just sits there. I started it at 11:35am, and it is now > 3:30pm. Weird.... Alphabetically speaking, is vkmlm."public"."user_list" be the last table in the last schema in the last database? You are running with -d4, so you would get a message about going to sleep shortly after dealing with the last table, but you didn't get the sleep message, so I don't think the problem is that pg_autovacuum is sleeping for an inordinate amount time. > I did the same last night at about 10:58pm, and it ran and did work until > 11:57pm, then sat there until I killed/restarted pg_autovacuum this > morning at 11:35. The process is not using any CPU time. > > I just killed/restarted it and it found work to do on my busy tables > which I'd expect. when you kill it, do you get a core file? Could you do a backtrace and see where pg_autovacuum is hung up? > I'm running Postgres 7.4 release on FreeBSD 4.9-RELEASE. I don't run FreeBSD, so I haven't tested with FreeBSD. Recently Craig Boston reported and submitted a patch for a crash on FreeBSD, but that doesn't sound like your problem. Could be some other type of platform dependent problem.
>>>>> "MTO" == Matthew T O'Connor <matthew@zeut.net> writes: >> Then it just sits there. I started it at 11:35am, and it is now >> 3:30pm. MTO> Weird.... Alphabetically speaking, is vkmlm."public"."user_list" be the MTO> last table in the last schema in the last database? You are running conveniently, yes it is... MTO> with -d4, so you would get a message about going to sleep shortly after MTO> dealing with the last table, but you didn't get the sleep message, so I MTO> don't think the problem is that pg_autovacuum is sleeping for an MTO> inordinate amount time. The only sleep logged was [2003-12-03 04:47:13 PM] 1 All DBs checked in: 84996853 usec, will sleep for 469 secs. Here's all it did on yesterday afternoon's "hour of work": [2003-12-03 04:45:48 PM] Performing: ANALYZE "public"."url_track" [2003-12-03 04:46:27 PM] Performing: ANALYZE "public"."msg_recipients" [2003-12-03 04:46:55 PM] Performing: ANALYZE "public"."deliveries" [2003-12-03 04:46:55 PM] Performing: ANALYZE "public"."user_list" [2003-12-03 04:47:12 PM] Performing: ANALYZE "public"."sessions" [2003-12-03 04:55:02 PM] Performing: ANALYZE "public"."url_track" [2003-12-03 04:55:22 PM] Performing: VACUUM ANALYZE "public"."msg_recipients" [2003-12-03 05:40:11 PM] Performing: VACUUM ANALYZE "public"."user_list" then 18 minutes later, it reported: [2003-12-03 05:58:25 PM] select relfilenode,reltuples,relpages from pg_class where relfilenode=18588202 [2003-12-03 05:58:25 PM] table name: vkmlm."public"."user_list" [2003-12-03 05:58:25 PM] relfilenode: 18588202; relisshared: 0 [2003-12-03 05:58:25 PM] reltuples: 9; relpages: 427920 [2003-12-03 05:58:25 PM] curr_analyze_count: 2559236; cur_delete_count: 2475824 [2003-12-03 05:58:25 PM] ins_at_last_analyze: 2559236; del_at_last_vacuum: 2475824 [2003-12-03 05:58:25 PM] insert_threshold: 509; delete_threshold 1001 and stopped doing anything. MTO> when you kill it, do you get a core file? Could you do a backtrace and MTO> see where pg_autovacuum is hung up? nope. unfortunately my PG libs are without debugging, too. I'll rebuild pg_autovacuum with debugging and run it under gdb so I can see where it gets stuck. I'll report back when I find something. I just wanted to check first if anyone else ran into this.
Vivek Khera wrote: >>>>>>"MTO" == Matthew T O'Connor <matthew@zeut.net> writes: > > >>>Then it just sits there. I started it at 11:35am, and it is now >>>3:30pm. > > > MTO> Weird.... Alphabetically speaking, is vkmlm."public"."user_list" be the > MTO> last table in the last schema in the last database? You are running > > conveniently, yes it is... > > MTO> with -d4, so you would get a message about going to sleep shortly after > MTO> dealing with the last table, but you didn't get the sleep message, so I > MTO> don't think the problem is that pg_autovacuum is sleeping for an > MTO> inordinate amount time. > > The only sleep logged was > > [2003-12-03 04:47:13 PM] 1 All DBs checked in: 84996853 usec, will sleep for 469 secs. What I seen is: # tail -f auto.log [2003-12-04 07:10:18 PM] reltuples: 72; relpages: 1 [2003-12-04 07:10:18 PM] curr_analyze_count: 72; cur_delete_count: 0 [2003-12-04 07:10:18 PM] ins_at_last_analyze: 72; del_at_last_vacuum: 0 [2003-12-04 07:10:18 PM] insert_threshold: 572; delete_threshold 536 [2003-12-04 07:10:18 PM] table name: empdb."public"."contracts" [2003-12-04 07:10:18 PM] relfilenode: 17784; relisshared: 0 [2003-12-04 07:10:18 PM] reltuples: 347; relpages: 5 [2003-12-04 07:10:18 PM] curr_analyze_count: 347; cur_delete_count: 0 [2003-12-04 07:10:18 PM] ins_at_last_analyze: 347; del_at_last_vacuum: 0 [2003-12-04 07:10:18 PM] insert_threshold: 847; delete_threshold 673 [ 5 minutes of delay ] <----- LOOK THIS [2003-12-04 07:10:18 PM] 503 All DBs checked in: 179396 usec, will sleep for 300 secs. [2003-12-04 07:15:19 PM] 504 All DBs checked in: 98814 usec, will sleep for 300 secs. I think is a good Idea put a fflush after: fprintf(LOGOUTPUT, "[%s] %s\n", timebuffer, logentry); Regards Gaetano Mendola
>>>>> "MTO" == Matthew T O'Connor <matthew@zeut.net> writes: >> I'm running Postgres 7.4 release on FreeBSD 4.9-RELEASE. MTO> I don't run FreeBSD, so I haven't tested with FreeBSD. Recently Craig MTO> Boston reported and submitted a patch for a crash on FreeBSD, but that MTO> doesn't sound like your problem. Could be some other type of platform MTO> dependent problem. Oh lucky me. I think I found it. I compiled with -g -O and ran it under gdb, so the output is line buffered. The last thing it prints out now is this: [2003-12-04 02:11:17 PM] 3 All DBs checked in: -786419782 usec, will sleep for -1272 secs. since sleep() takes an unsigned int as its parameter, we are actually sleeping for 4294966024 seconds == 136 years. I recall reading about the negative time to test the dbs somewhere... I guess I'll get on debugging that. The time keeper in this box is pretty darned accurate otherwise (using ntpd). -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
>>>>> "MTO" == Matthew T O'Connor <matthew@zeut.net> writes: MTO> I don't run FreeBSD, so I haven't tested with FreeBSD. Recently Craig MTO> Boston reported and submitted a patch for a crash on FreeBSD, but that some more debugging data: (gdb) print now $2 = {tv_sec = 1070565077, tv_usec = 216477} (gdb) print then $3 = {tv_sec = 1070561568, tv_usec = 668963} (gdb) print diff $4 = -5459981371352 (gdb) print sleep_secs $5 = -1272 so for some reason, instead of calculating 3508547514 as the diff, it got a hugely negative number. I'll bet it has something to do with the compiler... more debugging to follow (without -O compilation...) MTO> ---------------------------(end of broadcast)--------------------------- MTO> TIP 7: don't forget to increase your free space map settings -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
>>>>>> "MTO" == Matthew T O'Connor <matthew@zeut.net> writes: > > MTO> I don't run FreeBSD, so I haven't tested with FreeBSD. Recently > Craig MTO> Boston reported and submitted a patch for a crash on FreeBSD, > but that > > some more debugging data: > > (gdb) print now > $2 = {tv_sec = 1070565077, tv_usec = 216477} > (gdb) print then > $3 = {tv_sec = 1070561568, tv_usec = 668963} > (gdb) print diff > $4 = -5459981371352 > (gdb) print sleep_secs > $5 = -1272 > > so for some reason, instead of calculating 3508547514 as the diff, it > got a hugely negative number. > > I'll bet it has something to do with the compiler... more debugging to > follow (without -O compilation...) Could this be the recently reported bug where time goes backwards on FreeBSD? Can anyone who knows more about this problem chime in, I know it was recently discussed on Hackers. The simple fix is to just make sure it's a positive number. If not, then just sleep for some small positive amount. I can make a patch for this, probably sometime this weekend. Thanks for tracking this down.
>>>>> "MTO" == Matthew T O'Connor <matthew@zeut.net> writes: MTO> Could this be the recently reported bug where time goes backwards on MTO> FreeBSD? Can anyone who knows more about this problem chime in, I know it MTO> was recently discussed on Hackers. Time does not go backwards -- the now and then variables are properly incrementing in time as you see from the debugging output. The error appears to be with the computation of the "diff". It is either a C programming error, or a compiler error. I'm not a C "cop" so I can't tell you which it is. Witness this program, below, compiled as "cc -g -o t t.c" and the output here: % ./t seconds = 3509 seconds1 = 3509000000 useconds = -452486 stepped diff = 3508547514 seconds2 = -785967296 seconds3 = 3509000000 diff = -786419782 long long diff = 3508547514 % apperantly, if you compute (now.tv_sec - then.tv_sec) * 1000000 all at once, it overflows since the RHS is all computed using longs rather than long longs. Fix is to cast at least one of the values to long long on the RHS, as in the computation of seconds3 below. compare that to the computation of seconds2 and you'll see that this is the cause. I'd be curious to see the output of this program on other platforms and other compilers. I'm using gcc 2.95.4 as shipped with FreeBSD 4.8+. That all being said, you should never sleep less than the base time, and never for more than a max amount, perhaps 1 hour? --cut here-- #include <sys/time.h> #include <stdio.h> int main() { struct timeval now, then; long long diff = 0; long long seconds, seconds1, seconds2, seconds3, useconds; now.tv_sec = 1070565077L; now.tv_usec = 216477L; then.tv_sec = 1070561568L; then.tv_usec = 668963L; seconds = now.tv_sec - then.tv_sec; printf("seconds = %lld\n",seconds); seconds1 = seconds * 1000000; printf("seconds1 = %lld\n",seconds1); useconds = now.tv_usec - then.tv_usec; printf("useconds = %lld\n",useconds); diff = seconds1 + useconds; printf("stepped diff = %lld\n",diff); /* this appears to be the culprit... it should be same as seconds1 */ seconds2 = (now.tv_sec - then.tv_sec) * 1000000; printf("seconds2 = %lld\n",seconds2); /* seems we need to cast long's to long long's for this computation */ seconds3 = ((long long)now.tv_sec - (long long)then.tv_sec) * 1000000; printf("seconds3 = %lld\n",seconds3); diff = (now.tv_sec - then.tv_sec) * 1000000 + (now.tv_usec - then.tv_usec); printf ("diff = %lld\n",diff); diff = ((long long)now.tv_sec - (long long)then.tv_sec) * 1000000 + (now.tv_usec - then.tv_usec); printf ("long long diff = %lld\n",diff); exit(0); } --cut here--
Actually, you can simplify the fix thusly: diff = (long long)(now.tv_sec - then.tv_sec) * 1000000 + (now.tv_usec - then.tv_usec);
--On Thursday, December 04, 2003 16:20:22 -0500 Vivek Khera <khera@kcilink.com> wrote: >>>>>> "MTO" == Matthew T O'Connor <matthew@zeut.net> writes: > > MTO> Could this be the recently reported bug where time goes backwards on > MTO> FreeBSD? Can anyone who knows more about this problem chime in, I > know it MTO> was recently discussed on Hackers. > > > Time does not go backwards -- the now and then variables are properly > incrementing in time as you see from the debugging output. > > The error appears to be with the computation of the "diff". It is > either a C programming error, or a compiler error. I'm not a C "cop" > so I can't tell you which it is. > > Witness this program, below, compiled as "cc -g -o t t.c" and the > output here: > > % ./t > seconds = 3509 > seconds1 = 3509000000 > useconds = -452486 > stepped diff = 3508547514 > seconds2 = -785967296 > seconds3 = 3509000000 > diff = -786419782 > long long diff = 3508547514 > % > > apperantly, if you compute (now.tv_sec - then.tv_sec) * 1000000 all at > once, it overflows since the RHS is all computed using longs rather > than long longs. Fix is to cast at least one of the values to long > long on the RHS, as in the computation of seconds3 below. compare > that to the computation of seconds2 and you'll see that this is the > cause. > > I'd be curious to see the output of this program on other platforms > and other compilers. I'm using gcc 2.95.4 as shipped with FreeBSD > 4.8+. this is with the UnixWare compiler: $ cc -O -o testvk testvk.c $ ./testvk seconds = 3509 seconds1 = 3509000000 useconds = -452486 stepped diff = 3508547514 seconds2 = -785967296 seconds3 = 3509000000 diff = -786419782 long long diff = 3508547514 $ I think this is a C bug. > > That all being said, you should never sleep less than the base time, > and never for more than a max amount, perhaps 1 hour? > > > --cut here-- ># include <sys/time.h> ># include <stdio.h> > > int > main() > { > struct timeval now, then; > long long diff = 0; > long long seconds, seconds1, seconds2, seconds3, useconds; > > now.tv_sec = 1070565077L; > now.tv_usec = 216477L; > > then.tv_sec = 1070561568L; > then.tv_usec = 668963L; > > seconds = now.tv_sec - then.tv_sec; > printf("seconds = %lld\n",seconds); > seconds1 = seconds * 1000000; > printf("seconds1 = %lld\n",seconds1); > useconds = now.tv_usec - then.tv_usec; > printf("useconds = %lld\n",useconds); > > diff = seconds1 + useconds; > printf("stepped diff = %lld\n",diff); > > /* this appears to be the culprit... it should be same as seconds1 */ > seconds2 = (now.tv_sec - then.tv_sec) * 1000000; > printf("seconds2 = %lld\n",seconds2); > > /* seems we need to cast long's to long long's for this computation */ > seconds3 = ((long long)now.tv_sec - (long long)then.tv_sec) * 1000000; > printf("seconds3 = %lld\n",seconds3); > > > diff = (now.tv_sec - then.tv_sec) * 1000000 + (now.tv_usec - > then.tv_usec); printf ("diff = %lld\n",diff); > > diff = ((long long)now.tv_sec - (long long)then.tv_sec) * 1000000 + > (now.tv_usec - then.tv_usec); printf ("long long diff = %lld\n",diff); > > exit(0); > } > > > --cut here-- > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: ler@lerctr.org US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749
>>>>> "LR" == Larry Rosenman <ler@lerctr.org> writes: >> I'd be curious to see the output of this program on other platforms >> and other compilers. I'm using gcc 2.95.4 as shipped with FreeBSD >> 4.8+. LR> this is with the UnixWare compiler: LR> $ cc -O -o testvk testvk.c LR> $ ./testvk LR> seconds = 3509 LR> seconds1 = 3509000000 LR> useconds = -452486 LR> stepped diff = 3508547514 LR> seconds2 = -785967296 LR> seconds3 = 3509000000 LR> diff = -786419782 LR> long long diff = 3508547514 LR> $ LR> I think this is a C bug. Upon further reflection, I think so to. The entire RHS is long's so the arithmetic is done in longs, then assigned to a long long when done (after things have overflowed). Forcing any one of the RHS values to be long long causes the arithmetic to all be done using long longs, and then you get the numbers you expect. I think you only notice this in autovacuum when it takes a long time to complete the work, like my example of about 3500 seconds.
This has been fixed and will be in 7.4.1. --------------------------------------------------------------------------- Vivek Khera wrote: > >>>>> "LR" == Larry Rosenman <ler@lerctr.org> writes: > > >> I'd be curious to see the output of this program on other platforms > >> and other compilers. I'm using gcc 2.95.4 as shipped with FreeBSD > >> 4.8+. > LR> this is with the UnixWare compiler: > LR> $ cc -O -o testvk testvk.c > LR> $ ./testvk > LR> seconds = 3509 > LR> seconds1 = 3509000000 > LR> useconds = -452486 > LR> stepped diff = 3508547514 > LR> seconds2 = -785967296 > LR> seconds3 = 3509000000 > LR> diff = -786419782 > LR> long long diff = 3508547514 > LR> $ > > LR> I think this is a C bug. > > Upon further reflection, I think so to. The entire RHS is long's so > the arithmetic is done in longs, then assigned to a long long when > done (after things have overflowed). Forcing any one of the RHS > values to be long long causes the arithmetic to all be done using long > longs, and then you get the numbers you expect. > > I think you only notice this in autovacuum when it takes a long time > to complete the work, like my example of about 3500 seconds. > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Gaetano Mendola wrote: > Vivek Khera wrote: > >>>>>>> "MTO" == Matthew T O'Connor <matthew@zeut.net> writes: >> >> >> >>>> Then it just sits there. I started it at 11:35am, and it is now >>>> 3:30pm. >> >> >> >> MTO> Weird.... Alphabetically speaking, is vkmlm."public"."user_list" >> be the >> MTO> last table in the last schema in the last database? You are running >> >> conveniently, yes it is... >> >> MTO> with -d4, so you would get a message about going to sleep shortly >> after >> MTO> dealing with the last table, but you didn't get the sleep >> message, so I >> MTO> don't think the problem is that pg_autovacuum is sleeping for an >> MTO> inordinate amount time. >> >> The only sleep logged was >> >> [2003-12-03 04:47:13 PM] 1 All DBs checked in: 84996853 usec, will >> sleep for 469 secs. > > > What I seen is: > > > # tail -f auto.log > [2003-12-04 07:10:18 PM] reltuples: 72; relpages: 1 > [2003-12-04 07:10:18 PM] curr_analyze_count: 72; cur_delete_count: 0 > [2003-12-04 07:10:18 PM] ins_at_last_analyze: 72; del_at_last_vacuum: 0 > [2003-12-04 07:10:18 PM] insert_threshold: 572; delete_threshold 536 > [2003-12-04 07:10:18 PM] table name: empdb."public"."contracts" > [2003-12-04 07:10:18 PM] relfilenode: 17784; relisshared: 0 > [2003-12-04 07:10:18 PM] reltuples: 347; relpages: 5 > [2003-12-04 07:10:18 PM] curr_analyze_count: 347; cur_delete_count: 0 > [2003-12-04 07:10:18 PM] ins_at_last_analyze: 347; del_at_last_vacuum: 0 > [2003-12-04 07:10:18 PM] insert_threshold: 847; delete_threshold 673 > > > [ 5 minutes of delay ] <----- LOOK THIS > > > [2003-12-04 07:10:18 PM] 503 All DBs checked in: 179396 usec, will sleep > for 300 secs. > [2003-12-04 07:15:19 PM] 504 All DBs checked in: 98814 usec, will sleep > for 300 secs. > > I think is a good Idea put a fflush after: > > fprintf(LOGOUTPUT, "[%s] %s\n", timebuffer, logentry); Was I wrong ? If you are watching in tail the log believeme is really annoying. Regards Gaetano Mendola
The world rejoiced as mendola@bigfoot.com (Gaetano Mendola) wrote: > I think is a good Idea put a fflush after: > > fprintf(LOGOUTPUT, "[%s] %s\n", timebuffer, logentry); I thought I had put fflush()es at all the interesting locations... Apparently it was an error to not go to the effort of making sure it worked well on FreeBSD. (It was on my list, but I never got the Round Tuits...) There's an AMD-64 box coming in soon, targeted at FreeBSD, so that should change... -- let name="cbbrowne" and tld="acm.org" in name ^ "@" ^ tld;; http://www3.sympatico.ca/cbbrowne/linux.html What would a chair look like, if your knees bent the other way?
Christopher Browne wrote: >The world rejoiced as mendola@bigfoot.com (Gaetano Mendola) wrote: > > >>I think is a good Idea put a fflush after: >> >>fprintf(LOGOUTPUT, "[%s] %s\n", timebuffer, logentry); >> >> > >I thought I had put fflush()es at all the interesting locations... > > I just looked through the code, I think there are fflush()es at all but one interesting locations. The last log_entry call before sleeping doesn't have an fflush call after it. I'll submit a patch that adds it. >Apparently it was an error to not go to the effort of making sure it >worked well on FreeBSD. (It was on my list, but I never got the Round >Tuits...) There's an AMD-64 box coming in soon, targeted at FreeBSD, >so that should change... > > Yeah, FreeBSD testing would have been nice, but I don't have access to any FreeBSD boxes so.....
>>>>> "MTO" == Matthew T O'Connor <matthew@zeut.net> writes: MTO> Yeah, FreeBSD testing would have been nice, but I don't have access to MTO> any FreeBSD boxes so..... FWIW, with the fflush() added after that sleep, and the fix to the long long computation of sleep time to keep it from overflowing, pg_autovacuum has been working flawlessly on my FreeBSD 4.9 + PG 7.4.0 production server. I'm just still playing with tuning pg_autovacuum to keep it from vacuuming my busy tables *too* often. Just a question: will my test program show negative sleep 'diff' on your linux box? I can't imagine that it would give different results than on freebsd. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/