Home > mailing lists

Re: pg_stop_backup does not complete - Mailing list pgsql-hackers

From	Greg Smith
Subject	Re: pg_stop_backup does not complete
Date	February 24, 2010 16:57:05
Msg-id	4B85926E.7010109@2ndquadrant.com Whole thread Raw
In response to	Re: pg_stop_backup does not complete (Josh Berkus <josh@agliodbs.com>)
Responses	Re: pg_stop_backup does not complete
List	pgsql-hackers

Tree view

Josh Berkus wrote:
>> pg_stop_backup() doesn't complete until all the WAL segments needed to
>> restore from the backup are archived. If archive_command is failing,
>> that never happens.
>>     
>
> OK, so we need a way out of that cycle if the user is issuing
> pg_stop_backup because they *already know* that archive_command is
> failing.  Right now, there's no way out other than a fast shutdown,
> which is a bit user-hostile.
>   
gsmith=# select name,context from pg_settings where name like 'archive%';     name       |  context  
-----------------+------------archive_command | sighuparchive_mode    | postmasterarchive_timeout | sighup

I expect for your particular bad situation, you can replace the 
archive_command with a corrected one, use "pg_ctl reload" to send a 
SIGHUP to make that fix active, and escape from this.  That's the only 
right way out of this situation.  You can't just abort a backup someone 
has asked for just because archives are failing and allow the server to 
shutdown cleanly in this situation.  That's the wrong thing to do for 
production setups; the last thing you want for a system with archiving 
issues is to be stopped normally if it's interfering with an explicit 
admin requested backup.

Not necessarily any reason that backup even needs to fail, and no reason 
for the server to get restarted in this situation at all.  If the 
archive_command never returned false information, and in fact just 
returned a valid error code, all of the segments needed to make the 
backup consistent will be queued up waiting for the problem to be 
fixed.  Put the fixed archive_command in place, and you're off and 
running again.  If that's impossible, because the archive_command was 
really screwed up, we can just tell people to swap to an archive_command 
that just returns success, and let the queued up segments to be archived 
all get tossed away.  That backup will be bad, they fix the 
archive_command, send SIGHUP, and start over with a new backup.

There's some doc patches that could guide how to handle this situation 
better for sure, but I don't see any code changes needed.  Everything 
working as designed, optimized for production use at the expense of some 
confusion on how to recover if you configure things badly.

I suggested a patch a few weeks ago to make "what is the archiver 
doing?" behavior easier to monitor, got the impression people felt it 
was redundant given SR was the preferred path moving forward and 
eventually this whole archive_command bit would be going away.  I could 
revive that work if you feel this is such a bad issue that we need a 
better way to watch what the archiver is doing.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

pgsql-hackers by date:

From: "David E. Wheeler"
Date: 24 February 2010, 16:56:48
Subject: Re: pg_stop_backup does not complete

From: Tom Lane
Date: 24 February 2010, 17:01:26
Subject: Re: pg_stop_backup does not complete

Re: pg_stop_backup does not complete - Mailing list pgsql-hackers

Previous

Next