Re: Serious problem: media recovery fails after system or PostgreSQL crash - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Serious problem: media recovery fails after system or PostgreSQL crash
Date
Msg-id 50CDF8F1.6040808@fuzzy.cz
Whole thread Raw
In response to Re: Serious problem: media recovery fails after system or PostgreSQL crash  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Serious problem: media recovery fails after system or PostgreSQL crash
List pgsql-hackers
On 8.12.2012 03:08, Jeff Janes wrote:
> On Thu, Dec 6, 2012 at 3:52 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>> Hi,
>>
>> On 6.12.2012 23:45, MauMau wrote:
>>> From: "Tom Lane" <tgl@sss.pgh.pa.us>
>>>> Well, that's unfortunate, but it's not clear that automatic recovery is
>>>> possible.  The only way out of it would be if an undamaged copy of the
>>>> segment was in pg_xlog/ ... but if I recall the logic correctly, we'd
>>>> not even be trying to fetch from the archive if we had a local copy.
>>>
>>> No, PG will try to fetch the WAL file from pg_xlog when it cannot get it
>>> from archive.  XLogFileReadAnyTLI() does that.  Also, PG manual contains
>>> the following description:
>>>
>>> http://www.postgresql.org/docs/9.1/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL
>>>
>>>
>>> WAL segments that cannot be found in the archive will be sought in
>>> pg_xlog/; this allows use of recent un-archived segments. However,
>>> segments that are available from the archive will be used in preference
>>> to files in pg_xlog/.
>>
>> So why don't you use an archive command that does not create such
>> incomplete files? I mean something like this:
>>
>> archive_command = 'cp %p /arch/%f.tmp && mv /arch/%f.tmp /arch/%f'
>>
>> Until the file is renamed, it's considered 'incomplete'.
> 
> Wouldn't having the incomplete file be preferable over having none of it at all?
> 
> It seems to me you need considerable expertise to figure out how to do
> optimal recovery (i.e. losing the least transactions) in this
> situation, and that that expertise cannot be automated.  Do you trust
> a partial file from a good hard drive, or a complete file from a
> partially melted pg_xlog?

It clearly is a rather complex issue, no doubt about that. And yes,
reliability of the devices with pg_xlog on them is an important detail.
Alghough if the WAL is not written in a reliable way, you're hosed
anyway I guess.

The recommended archive command is based on the assumption that the
local pg_xlog is intact (e.g. because it's located on a reliable RAID1
array), which seems to be the assumption of the OP too.

In my opinion it's more likely to meet an incomplete copy of WAL in the
archive than a corrupted local WAL. And if it really is corrupted, it
would be identified during replay.

Tomas



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Set visibility map bit after HOT prune
Next
From: Andres Freund
Date:
Subject: Re: Set visibility map bit after HOT prune