Thread: Refactoring standby mode logic

Refactoring standby mode logic

From
Heikki Linnakangas
Date:
The code that reads the WAL from the archive, from pg_xlog, and from a
master server via walreceiver, is quite complicated. I'm talking about
the WaitForWALToBecomeAvailable() function in xlog.c. I got frustrated
with that while working on the "switching timeline over streaming
replication" patch.

Attached is a patch to refactor that logic into a more straightforward
state machine. It's always been a kind of a state machine, but it's been
hard to see, as the code wasn't explicitly written that way. Any objections?

The only user-visible effect is that this slightly changes the order
that recovery tries to read files from the archive, and pg_xlog, in the
presence of multiple timelines. At the moment, if recovery fails to find
a file on current timeline in the archive, it then tries to find it in
pg_xlog. If it's not found there either, it checks if the file on next
timeline exists in the archive, and then checks if exists in pg_xlog.
For example, if we're currently recovering timeline 2, and target
timeline is 4, and we're looking for WAL file A, the files are searched
for in this order:

1. File 00000004000000000000000A in archive
2. File 00000004000000000000000A in pg_xlog
3. File 00000003000000000000000A in archive
4. File 00000003000000000000000A in pg_xlog
5. File 00000002000000000000000A in archive
6. File 00000002000000000000000A in pg_xlog

With this patch, the order is:

1. File 00000004000000000000000A in archive
2. File 00000003000000000000000A in archive
3. File 00000002000000000000000A in archive
4. File 00000004000000000000000A in pg_xlog
5. File 00000003000000000000000A in pg_xlog
6. File 00000002000000000000000A in pg_xlog

This change should have no effect in normal restore scenarios. It'd only
make a difference if some files in the middle of the sequence of WAL
files are missing from the archive, but have been copied to pg_xlog
manually, and only if that file contains a timeline switch. Even then, I
think I like the new order better; it's easier to explain if nothing else.

- Heikki

Attachment

Re: Refactoring standby mode logic

From
Dimitri Fontaine
Date:
Hi,

Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> Attached is a patch to refactor that logic into a more straightforward state
> machine. It's always been a kind of a state machine, but it's been hard to
> see, as the code wasn't explicitly written that way. Any objections?

On a quick glance over, looks good to me. Making that code simpler to
read and reason about seems a good goal.

> This change should have no effect in normal restore scenarios. It'd only
> make a difference if some files in the middle of the sequence of WAL files
> are missing from the archive, but have been copied to pg_xlog manually, and
> only if that file contains a timeline switch. Even then, I think I like the
> new order better; it's easier to explain if nothing else.

I'm not understanding the sequence difference well enough to comment
here, but I think some people are currently playing tricks in their
failover scripts with moving files directly to the pg_xlog of the server
to be promoted.

Is it possible for your refactoring to keep the old sequence?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support




Re: Refactoring standby mode logic

From
Simon Riggs
Date:
On 29 November 2012 09:06, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> The code that reads the WAL from the archive, from pg_xlog, and from a
> master server via walreceiver, is quite complicated. I'm talking about the
> WaitForWALToBecomeAvailable() function in xlog.c. I got frustrated with that
> while working on the "switching timeline over streaming replication" patch.
>
> Attached is a patch to refactor that logic into a more straightforward state
> machine. It's always been a kind of a state machine, but it's been hard to
> see, as the code wasn't explicitly written that way. Any objections?
>
> The only user-visible effect is that this slightly changes the order that
> recovery tries to read files from the archive, and pg_xlog, in the presence
> of multiple timelines. At the moment, if recovery fails to find a file on
> current timeline in the archive, it then tries to find it in pg_xlog. If
> it's not found there either, it checks if the file on next timeline exists
> in the archive, and then checks if exists in pg_xlog. For example, if we're
> currently recovering timeline 2, and target timeline is 4, and we're looking
> for WAL file A, the files are searched for in this order:
>
> 1. File 00000004000000000000000A in archive
> 2. File 00000004000000000000000A in pg_xlog
> 3. File 00000003000000000000000A in archive
> 4. File 00000003000000000000000A in pg_xlog
> 5. File 00000002000000000000000A in archive
> 6. File 00000002000000000000000A in pg_xlog
>
> With this patch, the order is:
>
> 1. File 00000004000000000000000A in archive
> 2. File 00000003000000000000000A in archive
> 3. File 00000002000000000000000A in archive
> 4. File 00000004000000000000000A in pg_xlog
> 5. File 00000003000000000000000A in pg_xlog
> 6. File 00000002000000000000000A in pg_xlog
>
> This change should have no effect in normal restore scenarios. It'd only
> make a difference if some files in the middle of the sequence of WAL files
> are missing from the archive, but have been copied to pg_xlog manually, and
> only if that file contains a timeline switch. Even then, I think I like the
> new order better; it's easier to explain if nothing else.

Sorry, forgot to say "fine by me".

This probably helps the avoidance of shutdown checkpoints, since for
that, we need to skip retrieving from archive once we're up.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Refactoring standby mode logic

From
Heikki Linnakangas
Date:
On 30.11.2012 13:11, Dimitri Fontaine wrote:
> Hi,
>
> Heikki Linnakangas<hlinnakangas@vmware.com>  writes:
>> Attached is a patch to refactor that logic into a more straightforward state
>> machine. It's always been a kind of a state machine, but it's been hard to
>> see, as the code wasn't explicitly written that way. Any objections?
>
> On a quick glance over, looks good to me. Making that code simpler to
> read and reason about seems a good goal.

Thanks.

>> This change should have no effect in normal restore scenarios. It'd only
>> make a difference if some files in the middle of the sequence of WAL files
>> are missing from the archive, but have been copied to pg_xlog manually, and
>> only if that file contains a timeline switch. Even then, I think I like the
>> new order better; it's easier to explain if nothing else.
>
> I'm not understanding the sequence difference well enough to comment
> here, but I think some people are currently playing tricks in their
> failover scripts with moving files directly to the pg_xlog of the server
> to be promoted.

That's still perfectly ok. It's only if you have a diverged timeline 
history, and you have files from one timeline in the archive and files 
from another in pg_xlog that you'll see a difference. But in such a 
split situation, it's quite arbitrary which timeline recovery will 
follow anyway, I don't think anyone can sanely rely on either behavior.

> Is it possible for your refactoring to keep the old sequence?

Hmm, perhaps, but I think it would complicate the logic a bit. Doesn't 
seem worth it.

Committed..

- Heikki



Re: Refactoring standby mode logic

From
Dimitri Fontaine
Date:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
>> I'm not understanding the sequence difference well enough to comment
>> here, but I think some people are currently playing tricks in their
>> failover scripts with moving files directly to the pg_xlog of the server
>> to be promoted.
>
> That's still perfectly ok. It's only if you have a diverged timeline
> history, and you have files from one timeline in the archive and files from
> another in pg_xlog that you'll see a difference. But in such a split
> situation, it's quite arbitrary which timeline recovery will follow anyway,
> I don't think anyone can sanely rely on either behavior.

Fair enough.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support