Thread: Warm-Standby using WAL archiving / Seperate pg_restorelog application

Warm-Standby using WAL archiving / Seperate pg_restorelog application

From
"Florian G. Pflug"
Date:
Hi

I've now setup a warm-standby machine by using wal archiving. The restore_command on the
warm-standby machine loops until the wal requested by postgres appears, instead of
returning 1. Additionally, restore_command check for two special flag-files "abort"
and "take_online". If "take_online" exists, then it exists with code 1 in case of a
non-existant wal - this allows me to take the slave online if the master fails.

This methods seems to work, but it is neither particularly fool-proof nor
administrator friendly. It's not possible e.g. to reboot the slave without postgres
abortint the recovery, and therefor processing all wals generated since the last
backup all over again.

Monitoring this system is hard too, since there is no easy way to detect errors
while restoring a particular wal.

I think that all those problems could be solved if postgres provided a standalone application
that could restore one wal into a specified data-dir. It should be possible to call this
application repeatedly to restore wals as they are received from the master. Since "pg_restorelog"
would be call seperately for every wal, I'd be easy to detect errors recovering a specific wal.

Do you think this idea is feaseable? How hard would it be to turn the current archived-wal-recovery-code
into a standalone executable (That of course needs to be called when postgres is _not_ running.)

greetings, Florian Pflug




Re: Warm-Standby using WAL archiving / Seperate pg_restorelog application

From
"Merlin Moncure"
Date:
On 7/10/06, Florian G. Pflug <fgp@phlo.org> wrote:
> This methods seems to work, but it is neither particularly fool-proof nor
> administrator friendly. It's not possible e.g. to reboot the slave without postgres
> abortint the recovery, and therefor processing all wals generated since the last
> backup all over again.
>
> Monitoring this system is hard too, since there is no easy way to detect errors
> while restoring a particular wal.

what I would really like to see is to have the postmaster start up in
a special read only mode where it could auto-restore wal files placed
there by an external process but not generate any of its own.  This
would be a step towards a pitr based simple replication method.

merlin


Re: Warm-Standby using WAL archiving / Seperate pg_restorelog

From
"Florian G. Pflug"
Date:
Merlin Moncure wrote:
> On 7/10/06, Florian G. Pflug <fgp@phlo.org> wrote:
>> This methods seems to work, but it is neither particularly fool-proof nor
>> administrator friendly. It's not possible e.g. to reboot the slave 
>> without postgres
>> abortint the recovery, and therefor processing all wals generated 
>> since the last
>> backup all over again.
>>
>> Monitoring this system is hard too, since there is no easy way to 
>> detect errors
>> while restoring a particular wal.
> 
> what I would really like to see is to have the postmaster start up in
> a special read only mode where it could auto-restore wal files placed
> there by an external process but not generate any of its own.  This
> would be a step towards a pitr based simple replication method.

I didn't dare to ask for being able to actually _access_ a wal-shipping
based slaved (in read only mode) - from how I interpret the code, it's
a _long_ way to get that working. So I figured a stand-alone executable
that just recovers _one_ archived wal would at least remove that administrative
burden that my current solution brings. And it would be easy to monitor
the slave - much easier than with any automatic pickup of wals.

greetings, Florian Pflug


Re: Warm-Standby using WAL archiving / Seperate

From
Andrew Rawnsley
Date:
Just having a standby mode that survived shutdown/startup would be a nice
start...

I also do the blocking-restore-command technique, which although workable,
has a bit of a house-of-cards feel to it sometimes.



On 7/10/06 5:40 PM, "Florian G. Pflug" <fgp@phlo.org> wrote:

> Merlin Moncure wrote:
>> On 7/10/06, Florian G. Pflug <fgp@phlo.org> wrote:
>>> This methods seems to work, but it is neither particularly fool-proof nor
>>> administrator friendly. It's not possible e.g. to reboot the slave
>>> without postgres
>>> abortint the recovery, and therefor processing all wals generated
>>> since the last
>>> backup all over again.
>>> 
>>> Monitoring this system is hard too, since there is no easy way to
>>> detect errors
>>> while restoring a particular wal.
>> 
>> what I would really like to see is to have the postmaster start up in
>> a special read only mode where it could auto-restore wal files placed
>> there by an external process but not generate any of its own.  This
>> would be a step towards a pitr based simple replication method.
> 
> I didn't dare to ask for being able to actually _access_ a wal-shipping
> based slaved (in read only mode) - from how I interpret the code, it's
> a _long_ way to get that working. So I figured a stand-alone executable
> that just recovers _one_ archived wal would at least remove that
> administrative
> burden that my current solution brings. And it would be easy to monitor
> the Y&




Re: Warm-Standby using WAL archiving / Seperate

From
Simon Riggs
Date:
On Mon, 2006-07-10 at 19:34 +0200, Florian G. Pflug wrote:

> This methods seems to work, but it is neither particularly fool-proof nor
> administrator friendly. It's not possible e.g. to reboot the slave without postgres
> abortint the recovery, and therefor processing all wals generated since the last
> backup all over again.

Just submitted a patch to allow restartable recovery, which addresses
this concern.

> Monitoring this system is hard too, since there is no easy way to detect errors
> while restoring a particular wal.

What do you mean? 

If there is an ERROR in the WAL file, it stops.
If the restore of the WAL file fails, it retries a few times before
giving up.

--  Simon Riggs EnterpriseDB          http://www.enterprisedb.com



Re: Warm-Standby using WAL archiving / Seperate

From
Hannu Krosing
Date:
Ühel kenal päeval, T, 2006-07-11 kell 08:38, kirjutas Andrew Rawnsley:
> Just having a standby mode that survived shutdown/startup would be a nice
> start...

I think that Simon Riggs did some work on this at the code sprint
yesterday.

> I also do the blocking-restore-command technique, which although workable,
> has a bit of a house-of-cards feel to it sometimes.
> 

-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com