Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves) - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
Date
Msg-id 544E6EEA.4080204@vmware.com
Whole thread Raw
In response to Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
List pgsql-hackers
On 10/27/2014 02:12 PM, Fujii Masao wrote:
> On Fri, Oct 24, 2014 at 10:05 PM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> On 10/23/2014 11:09 AM, Heikki Linnakangas wrote:
>>>
>>> At least for master, we should consider changing the way the archiving
>>> works so that we only archive WAL that was generated in the same server.
>>> I.e. we should never try to archive WAL files belonging to another
>>> timeline.
>>>
>>> I just remembered that we discussed a different problem related to this
>>> some time ago, at
>>>
>>> http://www.postgresql.org/message-id/20131212.110002.204892575.horiguchi.kyotaro@lab.ntt.co.jp.
>>> The conclusion of that was that at promotion, we should not archive the
>>> last, partial, segment from the old timeline.
>>
>>
>> So, this is what I came up with for master. Does anyone see a problem with
>> it?
>
> What about the problem that I raised upthread? This is, the patch
> prevents the last, partial, WAL file of the old timeline from being archived.
> So we can never PITR the database to the point that the last, partial WAL
> file has.

A partial WAL file is never archived in the master server to begin with, 
so if it's ever used in archive recovery, the administrator must have 
performed some manual action to copy the partial WAL file from the 
original server. When he does that, he can also copy it manually to the 
archive, or whatever he wants to do with it.

Note that the same applies to any complete, but not-yet archived WAL 
files. But we've never had any mechanism in place to archive those in 
the new instance, after PITR.

> Isn't this problem? For example, please imagine the
> following scenario.
>
> 1. The important data was deleted but no one noticed that. This deletion was
>      logged in last, partial WAL file.
> 2. The server crashed and DBA started an archive recovery from old backup.
> 3. After recovery, all WAL files of the old timeline were recycled.
> 4. Finally DBA noticed the loss of important data and tried to do PITR
> to the point
>      where the data was deleted.
>
> HOWEVER, the WAL file containing that deletion operation no longer exists.
> So DBA will never be able to recover that important data ....

I think you're missing a step above:

1.5: The administrator copies the last, partial WAL file (and any 
complete but not yet-archived files) to the new server's pg_xlog directory.

Without that, it won't be available for PITR anyway, and the new server 
won't see it or try to archive it, no matter what.

- Heikki




pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: What exactly is our CRC algorithm?
Next
From: Heikki Linnakangas
Date:
Subject: Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT