Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves) - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
Date
Msg-id 544F9E9A.9020808@vmware.com
Whole thread Raw
In response to Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
List pgsql-hackers
On 10/27/2014 06:12 PM, Heikki Linnakangas wrote:
> On 10/27/2014 02:12 PM, Fujii Masao wrote:
>> >On Fri, Oct 24, 2014 at 10:05 PM, Heikki Linnakangas
>> ><hlinnakangas@vmware.com>  wrote:
>>> >>On 10/23/2014 11:09 AM, Heikki Linnakangas wrote:
>>>> >>>
>>>> >>>At least for master, we should consider changing the way the archiving
>>>> >>>works so that we only archive WAL that was generated in the same server.
>>>> >>>I.e. we should never try to archive WAL files belonging to another
>>>> >>>timeline.
>>>> >>>
>>>> >>>I just remembered that we discussed a different problem related to this
>>>> >>>some time ago, at
>>>> >>>
>>>> >>>http://www.postgresql.org/message-id/20131212.110002.204892575.horiguchi.kyotaro@lab.ntt.co.jp.
>>>> >>>The conclusion of that was that at promotion, we should not archive the
>>>> >>>last, partial, segment from the old timeline.
>>> >>
>>> >>
>>> >>So, this is what I came up with for master. Does anyone see a problem with
>>> >>it?
>> >
>> >What about the problem that I raised upthread? This is, the patch
>> >prevents the last, partial, WAL file of the old timeline from being archived.
>> >So we can never PITR the database to the point that the last, partial WAL
>> >file has.
> A partial WAL file is never archived in the master server to begin with,
> so if it's ever used in archive recovery, the administrator must have
> performed some manual action to copy the partial WAL file from the
> original server. When he does that, he can also copy it manually to the
> archive, or whatever he wants to do with it.
>
> Note that the same applies to any complete, but not-yet archived WAL
> files. But we've never had any mechanism in place to archive those in
> the new instance, after PITR.

Actually, I'll take back what I said above. I had misunderstood the 
current behavior. Currently, a server *does* archive any files that you 
copy manually to pg_xlog, after PITR has finished. Eventually. We don't 
create a .ready file for them until they're old enough to be recycled. 
We do create a .ready file for the last, partial, segment, but it's 
pretty weird to do it just for that, and not any other, complete, 
segments that might've been copied to pg_xlog. So what happens is that 
the last partial segment gets archived immediately after promotion, but 
any older segments will linger unarchived until much later.

The special treatment of the last partial segment still makes no sense. 
If we want the segments from the old timeline to be archived after PITR, 
we should archive them all immediately after end of recovery, not just 
the partial one. The exception for just the last partial segment is silly.

Now, the bigger question is whether we want the server after PITR to be 
responsible for archiving the segments from the old timeline at all. If 
we do, then we should remove the special treatment of the last, partial 
segment, and create the .ready files for all the complete segments too. 
And actually, I think we should *not* archive the partial segment. We 
don't normally archive partial segments, and all the WAL required to 
restore the server to new timeline is copied to the file with the new 
TLI. If the old timeline is still live, i.e. there's a server somewhere 
still writing new WAL on the old timeline, the partial segment will 
clash with a complete segment that the other server will archive later.

Yet another consideration is that we currently don't archive files 
streamed from the master. If we think that the standby server is 
responsible for archiving old segments after recovery, why is it not 
responsible for archiving the streamed segments? It's because in most 
cases, the master will archive the file, and we don't want two servers 
to archive the same file, but there is actually no guarantee on that. It 
might well be that the archiver runs a little bit behind in the master, 
and after crash the archive will miss some of the segments required. 
That's not good either.

I'm not sure what to do here. The current behavior is inconsistent, and 
there are a some nasty gotchas that would be nice to fix. I think 
someone needs to sit down and write a high-level design of how this all 
should work.

- Heikki




pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: superuser() shortcuts
Next
From: Andres Freund
Date:
Subject: Re: superuser() shortcuts