On 14.01.2011 08:45, Fujii Masao wrote:
> On Fri, Jan 14, 2011 at 4:13 AM, Magnus Hagander<magnus@hagander.net> wrote:
>>> At the end of the backup by walsender, it forces a switch to a new
>>> WAL file and waits until the last WAL file has been archived. So we
>>> should change postmaster so that it doesn't cause the archiver to
>>> end before walsender ends when shutdown is requested?
>>
>> Um. I have to admit I'm not entirely following what you mean enough to
>> confirm it, but it *sounds* correct :-)
>>
>> What scenario exactly is the problematic one?
>
> 1. Smart shutdown is requested while walsender is sending a backup.
> 2. Shutdown causes archiver to end.
> (Though shutdown sends SIGUSR2 to walsender to exit, walsender
> running backup doesn't respond for now)
> 3. At the end of backup, walsender calls do_pg_stop_backup, which
> forces a switch to a new WAL file and waits until the last WAL file has
> been archived.
> *BUT*, since archiver has already been dead, walsender waits for
> that forever.
Not only does it wait forever, but it writes the end-of-backup WAL
record after bgwriter has already exited and written the shutdown
checkpoint record.
I think postmaster should treat a walsender as a regular backend, until
it has started streaming.
We can achieve that by starting up the child as PM_CHILD_ACTIVE, and
changing the state to PM_CHILD_WALSENDER later, when streaming is
started. Looking at the postmaster.c, that should be safe, postmaster
will treat a backend as a regular backend anyway until it has connected
to shared memory. It is *not* safe to switch a walsender back to a
regular process, but we have no need to do that.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com