Thread: Autovacuum launcher doesn't notice death of postmaster immediately

Autovacuum launcher doesn't notice death of postmaster immediately

From
Peter Eisentraut
Date:
I notice that in 8.3, when I kill the postmaster process with SIGKILL or 
SIGSEGV, the child processes writer and stats collector go away 
immediately, but the autovacuum launcher hangs around for up to a 
minute.  (I suppose this has to do with the periodic wakeups?).  When 
you try to restart the postmaster before that it fails with a complaint 
that someone is still attached to the shared memory segment.

These are obviously not normal modes of operation, but I fear that this 
could cause some problems with people's control scripts of the 
sort, "it crashed, let's try to restart it".

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: Autovacuum launcher doesn't notice death of postmaster immediately

From
Alvaro Herrera
Date:
Peter Eisentraut wrote:
> I notice that in 8.3, when I kill the postmaster process with SIGKILL or 
> SIGSEGV, the child processes writer and stats collector go away 
> immediately, but the autovacuum launcher hangs around for up to a 
> minute.  (I suppose this has to do with the periodic wakeups?).  When 
> you try to restart the postmaster before that it fails with a complaint 
> that someone is still attached to the shared memory segment.
> 
> These are obviously not normal modes of operation, but I fear that this 
> could cause some problems with people's control scripts of the 
> sort, "it crashed, let's try to restart it".

The launcher is set up to wake up in autovacuum_naptime seconds at most.
So if the user configures a ridiculuos time (for example 86400 seconds,
which I've seen) then the launcher would not detect the postmaster death
for a very long time, which is probably bad.  (You measured a one minute
delay because that's the default naptime).

Maybe this is not such a hot idea, and we should wake the launcher up
every 10 seconds (or less?).  I picked 10 seconds because that's the
time the bgwriter sleeps if there is no activity configured.  Does this
sound acceptable?  The only problem with waking it up too frequently is
that it would be waking the system up (for gettimeofday()) even if
nothing is happening.

I also just noticed that the launcher will check if postmaster is alive,
then sleep, and then possibly do some work.  So if the postmaster died
in the sleep period, the launcher might try to do some work.  Should we
add a check for postmaster liveliness after the sleep?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Autovacuum launcher doesn't notice death of postmaster immediately

From
"Jim C. Nasby"
Date:
On Mon, Jun 04, 2007 at 11:04:26AM -0400, Alvaro Herrera wrote:
> The launcher is set up to wake up in autovacuum_naptime seconds at most.
> So if the user configures a ridiculuos time (for example 86400 seconds,
> which I've seen) then the launcher would not detect the postmaster death

Yeah, I've seen people set that up with the intention of "now autovacuum
will only run during our slow time!". I'm thinking it'd be worth
mentioning in the docs that this won't work, and instead suggesting that
they run vacuumdb -a or equivalent at that time instead. Thoughts?
--
Jim Nasby                                      decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Autovacuum launcher doesn't notice death of postmaster immediately

From
"Andrew Hammond"
Date:
On 6/7/07, Jim C. Nasby <decibel@decibel.org> wrote:
> On Mon, Jun 04, 2007 at 11:04:26AM -0400, Alvaro Herrera wrote:
> > The launcher is set up to wake up in autovacuum_naptime seconds at most.
> > So if the user configures a ridiculuos time (for example 86400 seconds,
> > which I've seen) then the launcher would not detect the postmaster death

Is there some threshold after which we should have PostgreSQL emit a
warning to the effect of "autovacuum_naptime is very large. Are you
sure you know what you're doing?"

> Yeah, I've seen people set that up with the intention of "now autovacuum
> will only run during our slow time!". I'm thinking it'd be worth
> mentioning in the docs that this won't work, and instead suggesting that
> they run vacuumdb -a or equivalent at that time instead. Thoughts?

Hmmm... it seems to me that points new users towards not using
autovacuum, which doesn't seem like the best idea. I think it'd be
better to say that setting the naptime really high is a Bad Idea.
Instead, if they want to shift maintenances to "off hours" they should
consider using a cron job that bonks around the
pg_autovacuum.vac_base_thresh or vac_scale_factor values for tables
they don't want vacuumed during "operational hours" (set them really
high at the start of operational hours, then to normal during off
hours). Tweaking the enable column would work too, but they presumably
don't want to disable ANALYZE, although it's entirely likely that new
users don't know what ANALYZE does, in which case they _really_ don't
want to disable it.

This should probably be very close to a section that says something
about how insufficient maintenance can be expected to lead to greater
performance issues than using autovacuum with default settings.
Assuming we believe that to be the case, which I think is reasonable
given that we are now defaulting to having autovacuum enabled.

Andrew


"Andrew Hammond" <andrew.george.hammond@gmail.com> writes:
> Hmmm... it seems to me that points new users towards not using
> autovacuum, which doesn't seem like the best idea. I think it'd be
> better to say that setting the naptime really high is a Bad Idea.

It seems like we should have an upper limit on the GUC variable that's
less than INT_MAX ;-).  Would an hour be sane?  10 minutes?

This is independent of the problem at hand, though, which is that we
probably want the launcher to notice postmaster death in less time
than autovacuum_naptime, for reasonable values of same.
        regards, tom lane


Re: Autovacuum launcher doesn't notice death of postmaster immediately

From
"Matthew T. O'Connor"
Date:
Tom Lane wrote:
> "Andrew Hammond" <andrew.george.hammond@gmail.com> writes:
>> Hmmm... it seems to me that points new users towards not using
>> autovacuum, which doesn't seem like the best idea. I think it'd be
>> better to say that setting the naptime really high is a Bad Idea.
> 
> It seems like we should have an upper limit on the GUC variable that's
> less than INT_MAX ;-).  Would an hour be sane?  10 minutes?
> 
> This is independent of the problem at hand, though, which is that we
> probably want the launcher to notice postmaster death in less time
> than autovacuum_naptime, for reasonable values of same.

Do we need a configurable autovacuum naptime at all?  I know I put it in 
the original contrib autovacuum because I had no idea what knobs might 
be needed.  I can't see a good reason to ever have a naptime longer than 
the default 60 seconds, but I suppose one might want a smaller naptime 
for a very active system?


Re: Autovacuum launcher doesn't notice death of postmaster immediately

From
Michael Paesold
Date:
Matthew T. O'Connor schrieb:
> Tom Lane wrote:
>> "Andrew Hammond" <andrew.george.hammond@gmail.com> writes:
>>> Hmmm... it seems to me that points new users towards not using
>>> autovacuum, which doesn't seem like the best idea. I think it'd be
>>> better to say that setting the naptime really high is a Bad Idea.
>>
>> It seems like we should have an upper limit on the GUC variable that's
>> less than INT_MAX ;-).  Would an hour be sane?  10 minutes?
>>
>> This is independent of the problem at hand, though, which is that we
>> probably want the launcher to notice postmaster death in less time
>> than autovacuum_naptime, for reasonable values of same.
> 
> Do we need a configurable autovacuum naptime at all?  I know I put it in 
> the original contrib autovacuum because I had no idea what knobs might 
> be needed.  I can't see a good reason to ever have a naptime longer than 
> the default 60 seconds, but I suppose one might want a smaller naptime 
> for a very active system?

A PostgreSQL database on my laptop for testing. It should use as little 
resources as possible while being idle. That would be a scenario for 
naptime greater than 60 seconds, wouldn't it?

Best Regards
Michael Paesold



Re: Autovacuum launcher doesn't notice death of postmaster immediately

From
"Zeugswetter Andreas ADI SD"
Date:
> > > The launcher is set up to wake up in autovacuum_naptime seconds at
most.

Imho the fix is usually to have a sleep loop.

Andreas


Re: Autovacuum launcher doesn't notice death of postmaster immediately

From
Alvaro Herrera
Date:
Zeugswetter Andreas ADI SD escribió:
> 
> > > > The launcher is set up to wake up in autovacuum_naptime seconds at
> > > > most.
> 
> Imho the fix is usually to have a sleep loop.

This is what we have.  The sleep time depends on the schedule of next
vacuum for the closest database in time.  If naptime is high, the sleep
time will be high (depending on number of databases needing attention).

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Autovacuum launcher doesn't notice death of postmaster immediately

From
Matthew O'Connor
Date:
Michael Paesold wrote:
> Matthew T. O'Connor schrieb:
>> Do we need a configurable autovacuum naptime at all?  I know I put it 
>> in the original contrib autovacuum because I had no idea what knobs 
>> might be needed.  I can't see a good reason to ever have a naptime 
>> longer than the default 60 seconds, but I suppose one might want a 
>> smaller naptime for a very active system?
> 
> A PostgreSQL database on my laptop for testing. It should use as little 
> resources as possible while being idle. That would be a scenario for 
> naptime greater than 60 seconds, wouldn't it?

Perhaps, but that isn't the use case PostgresSQL is being designed for.  If that is what you really need, then you
shouldprobably disable 
 
autovacuum.  Also a very long naptime means that autovacuum will still 
wake up at random times and to do the work.  At least with short 
naptime, it will do the work shortly after you updated your tables.


Re: Autovacuum launcher doesn't notice death of postmaster immediately

From
"Zeugswetter Andreas ADI SD"
Date:
> > > > > The launcher is set up to wake up in autovacuum_naptime
seconds
> > > > > at most.
> >
> > Imho the fix is usually to have a sleep loop.
>
> This is what we have.  The sleep time depends on the schedule
> of next vacuum for the closest database in time.  If naptime
> is high, the sleep time will be high (depending on number of
> databases needing attention).

No, I meant a "while (sleep 1(or 10) and counter < longtime) check for
exit" instead of "sleep longtime".

Andreas


Re: Autovacuum launcher doesn't notice death of postmaster immediately

From
Alvaro Herrera
Date:
Zeugswetter Andreas ADI SD escribió:
> 
> > > > > > The launcher is set up to wake up in autovacuum_naptime
> seconds 
> > > > > > at most.
> > > 
> > > Imho the fix is usually to have a sleep loop.
> > 
> > This is what we have.  The sleep time depends on the schedule 
> > of next vacuum for the closest database in time.  If naptime 
> > is high, the sleep time will be high (depending on number of 
> > databases needing attention).
> 
> No, I meant a "while (sleep 1(or 10) and counter < longtime) check for
> exit" instead of "sleep longtime".

Ah; yes, what I was proposing (or thought about proposing, not sure if I
posted it or not) was putting a upper limit of 10 seconds in the sleep
(bgwriter sleeps 10 seconds if configured to not do anything).  Though
10 seconds may seem like an eternity for systems like the ones Peter was
talking about, where there is a script trying to restart the server as
soon as the postmaster dies.

-- 
Alvaro Herrera                          Developer, http://www.PostgreSQL.org/
"Limítate a mirar... y algun día veras"


Re: Autovacuum launcher doesn't notice death of postmaster immediately

From
"Jim C. Nasby"
Date:
On Fri, Jun 08, 2007 at 09:49:56AM -0400, Matthew O'Connor wrote:
> Michael Paesold wrote:
> >Matthew T. O'Connor schrieb:
> >>Do we need a configurable autovacuum naptime at all?  I know I put it
> >>in the original contrib autovacuum because I had no idea what knobs
> >>might be needed.  I can't see a good reason to ever have a naptime
> >>longer than the default 60 seconds, but I suppose one might want a
> >>smaller naptime for a very active system?
> >
> >A PostgreSQL database on my laptop for testing. It should use as little
> >resources as possible while being idle. That would be a scenario for
> >naptime greater than 60 seconds, wouldn't it?
>
> Perhaps, but that isn't the use case PostgresSQL is being designed for.
>  If that is what you really need, then you should probably disable
> autovacuum.  Also a very long naptime means that autovacuum will still
> wake up at random times and to do the work.  At least with short
> naptime, it will do the work shortly after you updated your tables.

Agreed. Maybe 10 minutes might make sense, but the overhead of checking
to see if anything needs vacuuming is pretty tiny.

There *is* reason to allow setting the naptime smaller, though (or at
least there was; perhaps Alvero's recent changes negate this need):
clusters that have a large number of databases. I've worked with folks
who are in a hosted environment and give each customer their own
database; it's not hard to get a couple hundred databases that way.
Setting the naptime higher than a second in such an environment would
mean it could be hours before a database is checked for vacuuming.
--
Jim Nasby                                      decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Autovacuum launcher doesn't notice death of postmaster immediately

From
"Jim C. Nasby"
Date:
On Thu, Jun 07, 2007 at 12:13:09PM -0700, Andrew Hammond wrote:
> On 6/7/07, Jim C. Nasby <decibel@decibel.org> wrote:
> >On Mon, Jun 04, 2007 at 11:04:26AM -0400, Alvaro Herrera wrote:
> >> The launcher is set up to wake up in autovacuum_naptime seconds at most.
> >> So if the user configures a ridiculuos time (for example 86400 seconds,
> >> which I've seen) then the launcher would not detect the postmaster death
>
> Is there some threshold after which we should have PostgreSQL emit a
> warning to the effect of "autovacuum_naptime is very large. Are you
> sure you know what you're doing?"
>
> >Yeah, I've seen people set that up with the intention of "now autovacuum
> >will only run during our slow time!". I'm thinking it'd be worth
> >mentioning in the docs that this won't work, and instead suggesting that
> >they run vacuumdb -a or equivalent at that time instead. Thoughts?
>
> Hmmm... it seems to me that points new users towards not using
> autovacuum, which doesn't seem like the best idea. I think it'd be

I think we could easily word it so that it's clear that just letting
autovacuum do it's thing is preferred.

> better to say that setting the naptime really high is a Bad Idea.
> Instead, if they want to shift maintenances to "off hours" they should
> consider using a cron job that bonks around the
> pg_autovacuum.vac_base_thresh or vac_scale_factor values for tables
> they don't want vacuumed during "operational hours" (set them really
> high at the start of operational hours, then to normal during off
> hours). Tweaking the enable column would work too, but they presumably
> don't want to disable ANALYZE, although it's entirely likely that new
> users don't know what ANALYZE does, in which case they _really_ don't
> want to disable it.
That sounds like a rather ugly solution, and one that would be hard to
implement; not something to be putting in the docs.
--
Jim Nasby                                      decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Autovacuum launcher doesn't notice death of postmaster immediately

From
Alvaro Herrera
Date:
Jim C. Nasby escribió:

> There *is* reason to allow setting the naptime smaller, though (or at
> least there was; perhaps Alvero's recent changes negate this need):
> clusters that have a large number of databases. I've worked with folks
> who are in a hosted environment and give each customer their own
> database; it's not hard to get a couple hundred databases that way.
> Setting the naptime higher than a second in such an environment would
> mean it could be hours before a database is checked for vacuuming.

Yes, the code in HEAD is different -- each database will be considered
separately.  So the huge database taking all day to vacuum will not stop
the tiny databases from being vacuumed in a timely manner.

And the very huge table in that database will not stop the other tables
in the database from being vacuumed either.  There can be more than one
worker in a single database.

The limit is autovacuum_max_workers.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Autovacuum launcher doesn't notice death of postmaster immediately

From
"Matthew T. O'Connor"
Date:
Alvaro Herrera wrote:
> Jim C. Nasby escribió:
>> There *is* reason to allow setting the naptime smaller, though (or at
>> least there was; perhaps Alvero's recent changes negate this need):
>> clusters that have a large number of databases. I've worked with folks
>> who are in a hosted environment and give each customer their own
>> database; it's not hard to get a couple hundred databases that way.
>> Setting the naptime higher than a second in such an environment would
>> mean it could be hours before a database is checked for vacuuming.
> 
> Yes, the code in HEAD is different -- each database will be considered
> separately.  So the huge database taking all day to vacuum will not stop
> the tiny databases from being vacuumed in a timely manner.
> 
> And the very huge table in that database will not stop the other tables
> in the database from being vacuumed either.  There can be more than one
> worker in a single database.

Ok, but I think the question posed is that in say a virtual hosting 
environment there might be say 1,000 databases in the cluster. Am I 
still going to have to wait a long time for my database to get vacuumed?  I don't think this has changed much no?

(If default naptime is 1 minute, then autovacuum won't even look at a 
given database but once every 1,000 minutes (16.67 hours) assuming that 
there isn't enough work to keep all the workers busy.)


Re: Autovacuum launcher doesn't notice death of postmaster immediately

From
Alvaro Herrera
Date:
Matthew T. O'Connor escribió:

> Ok, but I think the question posed is that in say a virtual hosting 
> environment there might be say 1,000 databases in the cluster. Am I 
> still going to have to wait a long time for my database to get vacuumed? 
>  I don't think this has changed much no?

Depends on how much time it takes to vacuum the other 999 databases.
The default max workers is 3.

> (If default naptime is 1 minute, then autovacuum won't even look at a 
> given database but once every 1,000 minutes (16.67 hours) assuming that 
> there isn't enough work to keep all the workers busy.)

The naptime is per database.  Which means if you have 1000 databases and
a naptime of 60 seconds, the launcher is going to wake up every 100
milliseconds to check things up.  (This results from 60000 / 1000 = 60
ms, but there is a minimum of 100 ms just to keep things sane).

If there are 3 workers and each of the 1000 databases in average takes
10 seconds to vacuum, there will be around 3000 seconds between autovac
runs of your database assuming my math is right.

I hope those 1000 databases you put in your shared hosting are not very
big.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Autovacuum launcher doesn't notice death of postmaster immediately

From
"Joshua D. Drake"
Date:
Alvaro Herrera wrote:
> Matthew T. O'Connor escribió:
>
>> Ok, but I think the question posed is that in say a virtual hosting
>> environment there might be say 1,000 databases in the cluster.

That is uhmmm insane... 1000 databases?

Joshua D. Drake

 Am I
>> still going to have to wait a long time for my database to get vacuumed?
>>  I don't think this has changed much no?
>
> Depends on how much time it takes to vacuum the other 999 databases.
> The default max workers is 3.
>
>> (If default naptime is 1 minute, then autovacuum won't even look at a
>> given database but once every 1,000 minutes (16.67 hours) assuming that
>> there isn't enough work to keep all the workers busy.)
>
> The naptime is per database.  Which means if you have 1000 databases and
> a naptime of 60 seconds, the launcher is going to wake up every 100
> milliseconds to check things up.  (This results from 60000 / 1000 = 60
> ms, but there is a minimum of 100 ms just to keep things sane).
>
> If there are 3 workers and each of the 1000 databases in average takes
> 10 seconds to vacuum, there will be around 3000 seconds between autovac
> runs of your database assuming my math is right.
>
> I hope those 1000 databases you put in your shared hosting are not very
> big.
>


--
      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/




Re: Autovacuum launcher doesn't notice death of postmaster immediately

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Joshua D. Drake
> Sent: Friday, June 08, 2007 10:49 PM
> To: Alvaro Herrera
> Cc: Matthew T. O'Connor; Jim C. Nasby; Michael Paesold; Tom Lane; Andrew
> Hammond; Peter Eisentraut; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Autovacuum launcher doesn't notice death of
> postmaster immediately
>
> Alvaro Herrera wrote:
> > Matthew T. O'Connor escribió:
> >
> >> Ok, but I think the question posed is that in say a virtual hosting
> >> environment there might be say 1,000 databases in the cluster.
>
> That is uhmmm insane... 1000 databases?

Not in a test environment.  We have several hundred databases here.  Of course, only a few dozen (or at most ~100) are
ofany one type, but I can imagine that under certain circumstances 1000 databases would not be unreasonable. 

[snip]



Re: Autovacuum launcher doesn't notice death of postmaster immediately

From
ITAGAKI Takahiro
Date:
Alvaro Herrera <alvherre@commandprompt.com> wrote:

> > No, I meant a "while (sleep 1(or 10) and counter < longtime) check for
> > exit" instead of "sleep longtime".
>
> Ah; yes, what I was proposing (or thought about proposing, not sure if I
> posted it or not) was putting a upper limit of 10 seconds in the sleep
> (bgwriter sleeps 10 seconds if configured to not do anything).  Though
> 10 seconds may seem like an eternity for systems like the ones Peter was
> talking about, where there is a script trying to restart the server as
> soon as the postmaster dies.

Here is a patch for split-sleep of autovacuum_naptime.

There are some other issues in CVS HEAD; We use the calculation
{autovacuum_naptime * 1000000} in launcher_determine_sleep().
The result will be corrupted if we set autovacuum_naptime to >2147.

In another place, we use {autovacuum_naptime * 1000}, so we should
set the upper bound to INT_MAX/1000 instead of INT_MAX.
Incidentally, we've already had the same protections for
log_min_duration_statement and log_autovacuum.

I hope this patch could fix those large-autovacuum_naptime problems.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


Attachment

Re: Autovacuum launcher doesn't notice death of postmaster immediately

From
Zdenek Kotala
Date:
Alvaro Herrera wrote:
> Zeugswetter Andreas ADI SD escribió:
>>>>>>> The launcher is set up to wake up in autovacuum_naptime
>> seconds 
>>>>>>> at most.
>>>> Imho the fix is usually to have a sleep loop.
>>> This is what we have.  The sleep time depends on the schedule 
>>> of next vacuum for the closest database in time.  If naptime 
>>> is high, the sleep time will be high (depending on number of 
>>> databases needing attention).
>> No, I meant a "while (sleep 1(or 10) and counter < longtime) check for
>> exit" instead of "sleep longtime".
> 
> Ah; yes, what I was proposing (or thought about proposing, not sure if I
> posted it or not) was putting a upper limit of 10 seconds in the sleep
> (bgwriter sleeps 10 seconds if configured to not do anything).  Though
> 10 seconds may seem like an eternity for systems like the ones Peter was
> talking about, where there is a script trying to restart the server as
> soon as the postmaster dies.

There is also one "wild" solution. Postmaster and bgwriter will connect  with socket/pipe and select command will be
usedinstead sleep. If 
 
connection unexpectedly fails, select finish immediately and we are able 
to handle this issue asap. This socket should be used also in some 
special case when we need wake up it faster.

    Zdenek



Re: Autovacuum launcher doesn't notice death of postmaster immediately

From
Magnus Hagander
Date:
On Tue, Jun 12, 2007 at 12:23:50PM +0200, Zdenek Kotala wrote:
> Alvaro Herrera wrote:
> >Zeugswetter Andreas ADI SD escribió:
> >>>>>>>The launcher is set up to wake up in autovacuum_naptime
> >>seconds 
> >>>>>>>at most.
> >>>>Imho the fix is usually to have a sleep loop.
> >>>This is what we have.  The sleep time depends on the schedule 
> >>>of next vacuum for the closest database in time.  If naptime 
> >>>is high, the sleep time will be high (depending on number of 
> >>>databases needing attention).
> >>No, I meant a "while (sleep 1(or 10) and counter < longtime) check for
> >>exit" instead of "sleep longtime".
> >
> >Ah; yes, what I was proposing (or thought about proposing, not sure if I
> >posted it or not) was putting a upper limit of 10 seconds in the sleep
> >(bgwriter sleeps 10 seconds if configured to not do anything).  Though
> >10 seconds may seem like an eternity for systems like the ones Peter was
> >talking about, where there is a script trying to restart the server as
> >soon as the postmaster dies.
> 
> There is also one "wild" solution. Postmaster and bgwriter will connect 
>  with socket/pipe and select command will be used instead sleep. If 
> connection unexpectedly fails, select finish immediately and we are able 
> to handle this issue asap. This socket should be used also in some 
> special case when we need wake up it faster.

Given the amount of problems we've had with pipes on win32, let's try to
avoid adding extra ones unless they're really necessary. If split-sleep
works, that seems a safer bet.

//Magnus


Re: Autovacuum launcher doesn't notice death of postmaster immediately

From
Zdenek Kotala
Date:
Magnus Hagander wrote:
> On Tue, Jun 12, 2007 at 12:23:50PM +0200, Zdenek Kotala wrote:
>> Alvaro Herrera wrote:
>>> Zeugswetter Andreas ADI SD escribió:
>>>>>>>>> The launcher is set up to wake up in autovacuum_naptime
>>>> seconds 
>>>>>>>>> at most.
>>>>>> Imho the fix is usually to have a sleep loop.
>>>>> This is what we have.  The sleep time depends on the schedule 
>>>>> of next vacuum for the closest database in time.  If naptime 
>>>>> is high, the sleep time will be high (depending on number of 
>>>>> databases needing attention).
>>>> No, I meant a "while (sleep 1(or 10) and counter < longtime) check for
>>>> exit" instead of "sleep longtime".
>>> Ah; yes, what I was proposing (or thought about proposing, not sure if I
>>> posted it or not) was putting a upper limit of 10 seconds in the sleep
>>> (bgwriter sleeps 10 seconds if configured to not do anything).  Though
>>> 10 seconds may seem like an eternity for systems like the ones Peter was
>>> talking about, where there is a script trying to restart the server as
>>> soon as the postmaster dies.
>> There is also one "wild" solution. Postmaster and bgwriter will connect 
>>  with socket/pipe and select command will be used instead sleep. If 
>> connection unexpectedly fails, select finish immediately and we are able 
>> to handle this issue asap. This socket should be used also in some 
>> special case when we need wake up it faster.
> 
> Given the amount of problems we've had with pipes on win32, let's try to
> avoid adding extra ones unless they're really necessary. If split-sleep
> works, that seems a safer bet.

Ok It should be problem. But I'm afraid split-sleep is not good solution 
as well. It should generate a lot of race condition in start/stop 
scripts and monitoring tools. Much better should be improve pg_ctl to 
perform clean up ("pg_ctl cleanup) when postmaster fails.

I think we must offer deterministic way to packagers integrator how to 
handle this issue.
    Zdenek


ITAGAKI Takahiro wrote:
>
> Alvaro Herrera <alvherre@commandprompt.com> wrote:
>
> > > No, I meant a "while (sleep 1(or 10) and counter < longtime) check for
> > > exit" instead of "sleep longtime".
> >
> > Ah; yes, what I was proposing (or thought about proposing, not sure if I
> > posted it or not) was putting a upper limit of 10 seconds in the sleep
> > (bgwriter sleeps 10 seconds if configured to not do anything).  Though
> > 10 seconds may seem like an eternity for systems like the ones Peter was
> > talking about, where there is a script trying to restart the server as
> > soon as the postmaster dies.
>
> Here is a patch for split-sleep of autovacuum_naptime.
>
> There are some other issues in CVS HEAD; We use the calculation
> {autovacuum_naptime * 1000000} in launcher_determine_sleep().
> The result will be corrupted if we set autovacuum_naptime to >2147.

Ugh.  How about this patch; this avoids the overflow issue altogether.
I am not sure that this works on Win32 but it seems we are already using
struct timeval elsewhere, so I don't see why it wouldn't work.


> In another place, we use {autovacuum_naptime * 1000}, so we should
> set the upper bound to INT_MAX/1000 instead of INT_MAX.
> Incidentally, we've already had the same protections for
> log_min_duration_statement and log_autovacuum.

Hmm, yes, the naptime should have an upper bound of INT_MAX/1000.  It
doesn't seem worth the trouble of changing those places, when we know
that such a high value of naptime is uselessly high.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Attachment
Alvaro Herrera wrote:

> > Ah; yes, what I was proposing (or thought about proposing, not sure if I
> > posted it or not) was putting a upper limit of 10 seconds in the sleep
> > (bgwriter sleeps 10 seconds if configured to not do anything).  Though
> > 10 seconds may seem like an eternity for systems like the ones Peter was
> > talking about, where there is a script trying to restart the server as
> > soon as the postmaster dies.

Peter, is 10 seconds good enough for you?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.