Thread: pg_autovacuum Win32 Service startup delay
When starting as a service at boot time on Windows, pg_autovacuum may fail to start because the PostgreSQL service is still starting up. This patch causes the service to attempt a second connection 30 seconds after the initial connection failure before giving up entirely. Regards, Dave
Attachment
"Dave Page" <dpage@vale-housing.co.uk> writes: > When starting as a service at boot time on Windows, pg_autovacuum may > fail to start because the PostgreSQL service is still starting up. This > patch causes the service to attempt a second connection 30 seconds after > the initial connection failure before giving up entirely. Hm. In event that the system crashed beforehand, it could require much more than 30 seconds to finish replaying the old WAL log. So the above doesn't seem super robust to me. Would it be reasonable to try every 30 seconds for five minutes, or some such? (Five minutes at least has a defensible rationale, ie it's the default checkpoint interval and we expect we can replay the log at least as fast as it was created initially.) regards, tom lane
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > On Mon, Jan 24, 2005 at 06:57:54PM -0500, Tom Lane wrote: >> (Five minutes at least has a defensible rationale, ie it's the default >> checkpoint interval and we expect we can replay the log at least as >> fast as it was created initially.) > Hmm, I remember Mark Wong from OSDL saying that it took to replay the > logs after a crash more than the six hours it had taken to generate > them. Six hours? Did he have checkpoints disabled somehow? regards, tom lane
On Mon, Jan 24, 2005 at 06:57:54PM -0500, Tom Lane wrote: > (Five minutes at least has a defensible rationale, ie it's the default > checkpoint interval and we expect we can replay the log at least as > fast as it was created initially.) Hmm, I remember Mark Wong from OSDL saying that it took to replay the logs after a crash more than the six hours it had taken to generate them. Simon commented that it was unexpected, but there was no further comment on the issue. (On his test the server is generating the logs as fast as it can, so it may not be important, but anyway ... ) -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "Ciencias políticas es la ciencia de entender por qué los políticos actúan como lo hacen" (netfunny.com)
Dave Page wrote: >When starting as a service at boot time on Windows, pg_autovacuum may >fail to start because the PostgreSQL service is still starting up. This >patch causes the service to attempt a second connection 30 seconds after >the initial connection failure before giving up entirely. > > In the windows service world, is there any reason pg_autovacuum should ever give up? The reason I had it give up was so that it didn't accidently run against a different postgresql instance. I don't think that will happen in the windows service world. I think it should keep trying to do it's job until it's told to exit. Matthew
Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: >> On Mon, Jan 24, 2005 at 06:57:54PM -0500, Tom Lane wrote: >>> (Five minutes at least has a defensible rationale, ie it's the default >>> checkpoint interval and we expect we can replay the log at least as >>> fast as it was created initially.) > >> Hmm, I remember Mark Wong from OSDL saying that it took to replay the >> logs after a crash more than the six hours it had taken to generate >> them. > > Six hours? Did he have checkpoints disabled somehow? No, I remember they were talking about recovery from backup using PITR. (i.e. not simple crash recovery, but replaying the logs from the whole benchmark session) Best Regards, Michael Paesold
"Matthew T. O'Connor" <matthew@zeut.net> writes: > In the windows service world, is there any reason pg_autovacuum should > ever give up? I was a bit worried about the scenario in which J Random Luser tries to start the server twice and ends up with two autovacuum daemons attached to the same postmaster. I'm not sure if this is possible, probable, or dangerous ... but it seems like a point to consider. regards, tom lane
Matthew T. O'Connor schrieb: > In the windows service world, is there any reason pg_autovacuum should > ever give up? The reason I had it give up was so that it didn't > accidently run against a different postgresql instance. I don't think > that will happen in the windows service world. I think it should keep > trying to do it's job until it's told to exit. A "never giving up" pg_autovacuum seems a little bit rude to me. It's like the salesman who keeps trying to sell me something I have clearly no use. Especially if in setting up og_autovacuum sth. goes wrong: wrong user, wrong password. Service keeps running, service keeps using ressources, seems perfectly normal... but: nothing happens. (and if everything looks "perfect", checking the logs is not the first you do, do you?) So: I think a reasonable compromise is to keep pg_autovacuum trying for some time (maybe 5 minutes as Tom recommended) and after that give up. Harald
Attachment
Tom Lane wrote: >"Matthew T. O'Connor" <matthew@zeut.net> writes: > > >>In the windows service world, is there any reason pg_autovacuum should >>ever give up? >> >> > >I was a bit worried about the scenario in which J Random Luser tries to >start the server twice and ends up with two autovacuum daemons attached >to the same postmaster. I'm not sure if this is possible, probable, >or dangerous ... but it seems like a point to consider. > It is a good point to consider. Let me be a little more detailed in my explanation and see if that helps: * A never give up pg_autovacuum would only be used when run as a windows service. * The windows service control manager can still kill pg_autovacuum, so you shouldn't be able to start more than one that way. * You have always been able to run multiple pg_autovacuums, it's not advisable, and it's only bad side effect would be excessive, or more than expected, vacuum commands.
"Matthew T. O'Connor" <matthew@zeut.net> writes: > Tom Lane wrote: >> I was a bit worried about the scenario in which J Random Luser tries to >> start the server twice and ends up with two autovacuum daemons attached >> to the same postmaster. I'm not sure if this is possible, probable, >> or dangerous ... but it seems like a point to consider. > It is a good point to consider. Let me be a little more detailed in my > explanation and see if that helps: > * A never give up pg_autovacuum would only be used when run as a windows > service. > * The windows service control manager can still kill pg_autovacuum, so > you shouldn't be able to start more than one that way. > * You have always been able to run multiple pg_autovacuums, it's not > advisable, and it's only bad side effect would be excessive, or more > than expected, vacuum commands. OK, that seems to take care of my worries above. I agree with the point someone else made that if the service keeps trying to start forever, it wouldn't be obvious to the user that it wasn't working. So a limited time window seems best ... but I think it needs to be at least five minutes. regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: 24 January 2005 23:58 > To: Dave Page > Cc: pgsql-patches@postgresql.org > Subject: Re: [PATCHES] pg_autovacuum Win32 Service startup delay > > "Dave Page" <dpage@vale-housing.co.uk> writes: > > When starting as a service at boot time on Windows, > pg_autovacuum may > > fail to start because the PostgreSQL service is still > starting up. This > > patch causes the service to attempt a second connection 30 > seconds after > > the initial connection failure before giving up entirely. > > Hm. In event that the system crashed beforehand, it could > require much > more than 30 seconds to finish replaying the old WAL log. So > the above > doesn't seem super robust to me. Would it be reasonable to > try every 30 > seconds for five minutes, or some such? (Five minutes at least has a > defensible rationale, ie it's the default checkpoint interval and we > expect we can replay the log at least as fast as it was created > initially.) OK, revised patch attached. This version tries every 30 seconds for 5 minutes then gives up. Regards, Dave.