Thread: Recovery continually requests new WAL files

Recovery continually requests new WAL files

From
Alex Good
Date:
Hey!

I have a simple setup with one master and one backup server. I have an
issue where I have performed a backup and copied it to the data
directory for the slave, written a recovery.conf and copied in the
backup_label file and then started the server, it happily restores
everything up until and including the WAL file mentioned in the
backup_label and then attempts to obtain the next archive file which has
not yet been archived. I can't for the life of me figure out what is
going on.

Here's a break down of what I do

call pg_start_backup('label')

tar -zcf backup.tar.gz base global pg_clog pg_multixact pg_notify
pg_serial pg_subtrans pg_tblspc pg_twophase backup_label

call pg_stop_backup()

scp pgsql.tar.gz slave_hostname:/var/lib/postgresql/9.1/main

move to slave server

rm -rf global base pg_clog pg_multixact pg_notify pg_serial pg_subtrans
pg_tblspc pg_twophase pg_xlog/*
mkdir pg_xlog/archive_status
tar -xvf backup.tar.gz

restart postgresql

----------------
recovery.conf
-----------------
restore_command = 'scp
master-hostname:/var/lib/postgresql/9.1/main/wal_archives/%f %p'
standby_mode=on




And here's what I'm seeing in the logs on the recovering server

2012-06-12 16:31:26 UTC FATAL:  the database system is starting up
2012-06-12 16:31:27 UTC FATAL:  the database system is starting up
2012-06-12 16:31:27 UTC FATAL:  the database system is starting up
2012-06-12 16:31:27 UTC LOG:  incomplete startup packet
2012-06-12 16:31:30 UTC LOG:  restored log file
"00000001000000000000000A" from archive
2012-06-12 16:31:30 UTC LOG:  redo starts at 0/A000078
2012-06-12 16:31:30 UTC LOG:  consistent recovery state reached at 0/B000000
scp: /var/lib/postgresql/9.1/main/wal_archives/00000001000000000000000B:
No such file or directory
scp: /var/lib/postgresql/9.1/main/wal_archives/00000001000000000000000B:
No such file or directory
scp: /var/lib/postgresql/9.1/main/wal_archives/00000001000000000000000B:
No such file or directory


I'm confused by this because the 00000001000000000000000B archive wasn't
created until after the pg_stop_backup call so why is it needed?

Any help would be appreciated, I've been banging my head against this
one for a while.

Thanks
Alex



Re: Recovery continually requests new WAL files

From
"Albe Laurenz"
Date:
Alex Good wrote:
> I have a simple setup with one master and one backup server. I have an
> issue where I have performed a backup and copied it to the data
> directory for the slave, written a recovery.conf and copied in the
> backup_label file and then started the server, it happily restores
> everything up until and including the WAL file mentioned in the
> backup_label and then attempts to obtain the next archive file which
has
> not yet been archived. I can't for the life of me figure out what is
> going on.

What else would you expect?

Are you planning to use streaming replication?

If yes, what are your configuration parameters for replication?

Yours,
Laurenz Albe

Re: Recovery continually requests new WAL files

From
"Albe Laurenz"
Date:
Alex Good wrote:
> What I expected to see was the server requesting each WAL file up
until
> the one which was archived during pg_stop_backup and then the server
> would consider itself to be recovered. Clearly I have misunderstood
> something here.
>
> These two servers are actually sat behind pgpool which is in
replication
> mode (so I don't have streaming replication set up) which I chose
> beccause it gives me synchronous replication as well as automatic
> failover. I am trying to understand the recovery process so I can use
it
> to set up pgpools' online recovery feature.

Oh, you didn't say that it is about pgpool.

You might try to ask their mailing lists:
http://www.pgpool.net/mediawiki/index.php/Mailing_lists

Yours,
Laurenz Albe

Re: Recovery continually requests new WAL files

From
Alex Good
Date:
On 13/06/12 10:29, Albe Laurenz wrote:
> Alex Good wrote:
>> What I expected to see was the server requesting each WAL file up
> until
>> the one which was archived during pg_stop_backup and then the server
>> would consider itself to be recovered. Clearly I have misunderstood
>> something here.
>>
>> These two servers are actually sat behind pgpool which is in
> replication
>> mode (so I don't have streaming replication set up) which I chose
>> beccause it gives me synchronous replication as well as automatic
>> failover. I am trying to understand the recovery process so I can use
> it
>> to set up pgpools' online recovery feature.
> Oh, you didn't say that it is about pgpool.
>
> You might try to ask their mailing lists:
> http://www.pgpool.net/mediawiki/index.php/Mailing_lists
>
> Yours,
> Laurenz Albe
Although pgpool is involved this isn't actually about pgpool, I've been
running through the recovery process manually to try and understand what
needs to be done in order to get onlinve recovery working with pgpool.
Pgpool isn't actually running at the moment.

Anyway, I think what I had misunderstood was the meaning of the
'standby_mode' parameter in recovery.conf. If I remove that then the
process behaves as I expect it to except that the restoring server ends
up restoring to a new timeline, I would prefer that it be on the same
timeline as the master, I have set recovery_target_timeline = 'latest'
in recovery.conf but this still increments the timeline. Is there any
way to get the recovery to stay on the same timeline other than
explicitly specifying the timeline?

Thanks
Alex



Re: Recovery continually requests new WAL files

From
"Albe Laurenz"
Date:
Alex Good wrote:
> Although pgpool is involved this isn't actually about pgpool, I've
been
> running through the recovery process manually to try and understand
what
> needs to be done in order to get onlinve recovery working with pgpool.
> Pgpool isn't actually running at the moment.

Oh, I see.

> Anyway, I think what I had misunderstood was the meaning of the
> 'standby_mode' parameter in recovery.conf. If I remove that then the
> process behaves as I expect it to except that the restoring server
ends
> up restoring to a new timeline, I would prefer that it be on the same
> timeline as the master, I have set recovery_target_timeline = 'latest'
> in recovery.conf but this still increments the timeline. Is there any
> way to get the recovery to stay on the same timeline other than
> explicitly specifying the timeline?

That's why I asked if this is about streaming replication.

It is by design that a new timeline is opened after recovery.
This is to tell the WAL sequence from before and after recovery apart.
Is it a problem for you?

Yours,
Laurenz Albe

Re: Recovery continually requests new WAL files

From
Alex Good
Date:
On 13/06/12 11:10, Albe Laurenz wrote:
> Alex Good wrote:
>> Although pgpool is involved this isn't actually about pgpool, I've
> been
>> running through the recovery process manually to try and understand
> what
>> needs to be done in order to get onlinve recovery working with pgpool.
>> Pgpool isn't actually running at the moment.
> Oh, I see.
>
>> Anyway, I think what I had misunderstood was the meaning of the
>> 'standby_mode' parameter in recovery.conf. If I remove that then the
>> process behaves as I expect it to except that the restoring server
> ends
>> up restoring to a new timeline, I would prefer that it be on the same
>> timeline as the master, I have set recovery_target_timeline = 'latest'
>> in recovery.conf but this still increments the timeline. Is there any
>> way to get the recovery to stay on the same timeline other than
>> explicitly specifying the timeline?
> That's why I asked if this is about streaming replication.
>
> It is by design that a new timeline is opened after recovery.
> This is to tell the WAL sequence from before and after recovery apart.
> Is it a problem for you?
>
> Yours,
> Laurenz Albe
Well I had assumed that it was a bad thing as the way I am intending to
use the recovery procedure is to add backup servers to the pgpool
cluster and it seemed to make more sense that they all be on the same
timeline.

Having thought about it though I don't think it matters, thanks very
much for your help, I've been banging my head against this for a while.

Thanks
Alex Good

Re: Recovery continually requests new WAL files

From
Alex Good
Date:
On 13/06/12 09:10, Albe Laurenz wrote:
> Alex Good wrote:
>> I have a simple setup with one master and one backup server. I have an
>> issue where I have performed a backup and copied it to the data
>> directory for the slave, written a recovery.conf and copied in the
>> backup_label file and then started the server, it happily restores
>> everything up until and including the WAL file mentioned in the
>> backup_label and then attempts to obtain the next archive file which
> has
>> not yet been archived. I can't for the life of me figure out what is
>> going on.
> What else would you expect?
>
> Are you planning to use streaming replication?
>
> If yes, what are your configuration parameters for replication?
>
> Yours,
> Laurenz Albe
What I expected to see was the server requesting each WAL file up until
the one which was archived during pg_stop_backup and then the server
would consider itself to be recovered. Clearly I have misunderstood
something here.

These two servers are actually sat behind pgpool which is in replication
mode (so I don't have streaming replication set up) which I chose
beccause it gives me synchronous replication as well as automatic
failover. I am trying to understand the recovery process so I can use it
to set up pgpools' online recovery feature.

Thanks
Alex Good

Alex