Thread: Hot standby stops after a few days of inactivity (i.e. no new WAL)

Hot standby stops after a few days of inactivity (i.e. no new WAL)

From
Marc Schablewski
Date:
Hi,

we are running a PostgreSQL 8.3.3 on a Linux box (SuSE 10.3, 2.6.22
kernel) as a hot standby. After some maintenances work the WAL files
couldn't be shipped to that system (which had nothing to do with
postgres, as we found out later). The problem was not noticed for about
a week. When looking for a reason why the WAL weren't shipped, we found
the following error message:

2008-10-31 17:07:52 CET 9162LOG:  received smart shutdown request
2008-10-31 17:07:52 CET 9178FATAL:  could not restore file
"000000010000008600000018" from archive: return code 15
2008-10-31 17:07:52 CET 9162LOG:  startup process (PID 9178) exited with
exit code 1
2008-10-31 17:07:52 CET 9162LOG:  aborting startup due to startup
process failure

This message occurred about 3 1/2 days after the last log was shipped. I
searched the postgres docs and Google for the meaning of "return code
15" but couldn't find anything.

After copying the missing WAL from our master system and restarting
postgres, everything worked fine again, but I'm still curious what made
postgres stop waiting for WAL. It seems to me that there is some kind of
timeout that triggers if there are no new WAL for a couple of days, but
that would seem a bit strange. I'd expect postgres to wait forever if it
is not told to wake up from recovery mode manually. The manual's
"Recovery Settings" section didn't help either. I'm not sure if it is a
bug, at least it's strange.

Regards,
    Marc



Re: Hot standby stops after a few days of inactivity (i.e. no new WAL)

From
Alvaro Herrera
Date:
Marc Schablewski wrote:
> Hi,
>
> we are running a PostgreSQL 8.3.3 on a Linux box (SuSE 10.3, 2.6.22
> kernel) as a hot standby. After some maintenances work the WAL files
> couldn't be shipped to that system (which had nothing to do with
> postgres, as we found out later). The problem was not noticed for about
> a week. When looking for a reason why the WAL weren't shipped, we found
> the following error message:
>
> 2008-10-31 17:07:52 CET 9162LOG:  received smart shutdown request
> 2008-10-31 17:07:52 CET 9178FATAL:  could not restore file
> "000000010000008600000018" from archive: return code 15

This server was stopped intentionally by someone or something, external
to Postgres itself.  "Smart shutdown" means the postmaster got SIGTERM.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Hot standby stops after a few days of inactivity (i.e. no new WAL)

From
"Merlin Moncure"
Date:
On Tue, Nov 4, 2008 at 5:50 AM, Marc Schablewski <ms@clickware.de> wrote:
> Hi,
>
> we are running a PostgreSQL 8.3.3 on a Linux box (SuSE 10.3, 2.6.22
> kernel) as a hot standby. After some maintenances work the WAL files

I'm assuming you meant 'warm standby'...hot standby servers can be
served for queries.  This feature is proposed for PostgreSQL 8.4

merlin

Re: Hot standby stops after a few days of inactivity (i.e. no new WAL)

From
Tom Lane
Date:
Marc Schablewski <ms@clickware.de> writes:
> ... When looking for a reason why the WAL weren't shipped, we found
> the following error message:

> 2008-10-31 17:07:52 CET 9162LOG:  received smart shutdown request
> 2008-10-31 17:07:52 CET 9178FATAL:  could not restore file
> "000000010000008600000018" from archive: return code 15

Something sent SIGTERM to both your postmaster (hence the "smart
shutdown" message) and the recovery_command script (causing it to
exit with code 15, which is probably SIGTERM though you might want
to check kill -l to be sure).  You need to find out what's doing that
and make it stop.

            regards, tom lane

Re: Hot standby stops after a few days of inactivity (i.e. no new WAL)

From
Marc Schablewski
Date:
Ah, ok. I somehow missed the first line of the message an the rest of it
left the impression that "something" must be wrong with replication.

I guess one of my colleagues might have shut down the database by
accident and forgot to tell me.

Anyway, thanks for your reply.

Marc


Alvaro Herrera wrote:
> Marc Schablewski wrote:
>
>> Hi,
>>
>> we are running a PostgreSQL 8.3.3 on a Linux box (SuSE 10.3, 2.6.22
>> kernel) as a hot standby. After some maintenances work the WAL files
>> couldn't be shipped to that system (which had nothing to do with
>> postgres, as we found out later). The problem was not noticed for about
>> a week. When looking for a reason why the WAL weren't shipped, we found
>> the following error message:
>>
>> 2008-10-31 17:07:52 CET 9162LOG:  received smart shutdown request
>> 2008-10-31 17:07:52 CET 9178FATAL:  could not restore file
>> "000000010000008600000018" from archive: return code 15
>>
>
> This server was stopped intentionally by someone or something, external
> to Postgres itself.  "Smart shutdown" means the postmaster got SIGTERM.
>
>



Re: Hot standby stops after a few days of inactivity (i.e. no new WAL)

From
Marc Schablewski
Date:
Yes, 'warm standby' was what I intended to write. This must have been
some kind of wishful thinking. ;)
But I'd really appreciate 'hot standby' in a future version of postgres.

Marc

Merlin Moncure wrote:
> On Tue, Nov 4, 2008 at 5:50 AM, Marc Schablewski <ms@clickware.de> wrote:
>
>> Hi,
>>
>> we are running a PostgreSQL 8.3.3 on a Linux box (SuSE 10.3, 2.6.22
>> kernel) as a hot standby. After some maintenances work the WAL files
>>
>
> I'm assuming you meant 'warm standby'...hot standby servers can be
> served for queries.  This feature is proposed for PostgreSQL 8.4
>
> merlin
>
>
>