Re: Streaming replication and WAL archive interactions - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Streaming replication and WAL archive interactions
Date
Msg-id 54949108.3030109@vmware.com
Whole thread Raw
In response to Re: Streaming replication and WAL archive interactions  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Streaming replication and WAL archive interactions
Re: Streaming replication and WAL archive interactions
List pgsql-hackers
On 12/18/2014 12:32 PM, Fujii Masao wrote:
> On Wed, Dec 17, 2014 at 4:11 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> On 12/16/2014 10:24 AM, Borodin Vladimir wrote:
>>>
>>> 12 дек. 2014 г., в 16:46, Heikki Linnakangas
>>> <hlinnakangas@vmware.com> написал(а):
>>>
>>>> There have been a few threads on the behavior of WAL archiving,
>>>> after a standby server is promoted [1] [2]. In short, it doesn't
>>>> work as you might expect. The standby will start archiving after
>>>> it's promoted, but it will not archive files that were replicated
>>>> from the old master via streaming replication. If those files were
>>>> not already archived in the master before the promotion, they are
>>>> not archived at all. That's not good if you wanted to restore from
>>>> a base backup + the WAL archive later.
>>>>
>>>> The basic setup is a master server, a standby, a WAL archive that's
>>>> shared by both, and streaming replication between the master and
>>>> standby. This should be a very common setup in the field, so how
>>>> are people doing it in practice? Just live with the wisk that you
>>>> might miss some files in the archive if you promote? Don't even
>>>> realize there's a problem? Something else?
>>>
>>>
>>> Yes, I do live like that (with streaming replication and shared
>>> archive between master and replicas) and don’t even realize there’s a
>>> problem :( And I think I’m not the only one. Maybe at least a note
>>> should be added to the documentation?
>>
>>
>> Let's try to figure out a way to fix this in master, but yeah, a note in the
>> documentation is in order.
>
> +1
>
>>>> And how would we like it to work?
>>
>>
>> Here's a plan:
>>
>> Have a mechanism in the standby, to track how far the master has archived
>> its WAL, and don't throw away WAL in the standby that hasn't been archived
>> in the master yet. This is similar to the physical replication slots, which
>> prevent the master from recycling WAL that a standby hasn't received yet,
>> but in reverse. I think we can use the .done and .ready files for this.
>> Whenever a file is streamed (completely) from the master, create a .ready
>> file for it. When we get an acknowledgement from the master that it has
>> archived it, create a .done file for it. To get the information from the
>> master, add the "last archived WAL segment" e.g. in the streaming
>> replication keep-alive message, or invent a new message type for it.
>
> Sounds OK to me.
>
> How does this work in cascade replication case? The cascading walsender
> just relays the archive location to the downstream standby?

Hmm. Yeah, I guess so.

> What happens when WAL streaming is terminated and the startup process starts to
> read the WAL file from the archive? After reading the WAL file from the archive,
> probably we would need to change .ready files of every older WAL files to .done.

I suppose. Although there's no big harm in leaving them in .ready state.
As soon as you reconnect, the primary will tell if they were archived.
If the server is promoted before reconnecting, it will try to archive
the files and archive_command will see that they are already in the
archive. It has to be prepared for that situation anyway, so that's OK too.

Here's a first cut at this. It includes the changes from your
standby_wal_archiving_v1.patch, so you get that behaviour if you set
archive_mode='always', and the new behaviour I wanted with
archive_mode='shared'. I wrote it on top of the other patch I posted
recently to not archive bogus recycled WAL segments after promotion
(http://www.postgresql.org/message-id/549489FA.4010304@vmware.com), but
it seems to apply without it too.

I suggest reading the documentation changes first, it hopefully explains
pretty well how to use this. The code should work too, and comments on
that are welcome too, but I haven't tested it much. I'll do more testing
next week.

- Heikki


Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Bogus WAL segments archived after promotion
Next
From: "Joshua D. Drake"
Date:
Subject: Re: Commitfest problems