Re: problem with archive_command as suggested by documentation - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: problem with archive_command as suggested by documentation
Date
Msg-id 4978BC96.6090404@enterprisedb.com
Whole thread Raw
In response to problem with archive_command as suggested by documentation  ("Albe Laurenz" <laurenz.albe@wien.gv.at>)
Responses Re: problem with archive_command as suggested by documentation  ("Albe Laurenz" <laurenz.albe@wien.gv.at>)
List pgsql-hackers
Albe Laurenz wrote:
> The documentation states in
> http://www.postgresql.org/docs/current/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL
> 
> "The archive command should generally be designed to refuse to overwrite any pre-existing archive file."
> 
> and suggests an archive_command like "test ! -f .../%f && cp %p .../%f".
> 
> We ran into (small) problems with an archive_command similar to this
> as follows:
> 
> The server received a fast shutdown request while a WAL segment was being archived.
> The archiver stopped and left behind a half-written archive file.

Hmm, if I'm reading the code correctly, a fast shutdown request 
shouldn't kill an ongoing archive command.

> Now when the server was restarted, the archiver tried to archive the same
> WAL segment again and got an error because the destination file already
> existed.
> 
> That means that WAL archiving is stuck until somebody manually removes
> the partial archived file.

Yeah, that's a good point. Even if it turns out that the reason for your  partial write wasn't the fast shutdown
request,the archive_command 
 
could be interrupted for some other reason and leave behind a partially 
written file behind.

> I suggest that the documentation be changed so that it does not
> recommend this setup. WAL segment names are unique anyway.

Well, the documentation states the reason to do that:

> This is an important safety feature to preserve the integrity of your archive in case of administrator error (such as
sendingthe output of two different servers to the same archive directory)
 

which seems like a reasonable concern too. Perhaps it should suggest 
something like:

test ! -f .../%f && cp %p .../%f.tmp && mv .../%f.tmp .../%f

ie. copy under a different filename first, and rename the file in place 
after it's completely written, assuming that mv is atomic. It gets a bit 
complicated, though.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Pluggable Indexes (was Re: rmgr hooks (v2))
Next
From: Heikki Linnakangas
Date:
Subject: Re: Visibility map and freezing